s3cmd-1.6.1/0000775000175000017500000000000012647747124013656 5ustar mdomschmdomsch00000000000000s3cmd-1.6.1/INSTALL0000664000175000017500000000606312647745544014720 0ustar mdomschmdomsch00000000000000Installation of s3cmd package ============================= Copyright: TGRMN Software and contributors S3tools / S3cmd project homepage: http://s3tools.org !!! !!! Please consult README file for setup, usage and examples! !!! Package formats --------------- S3cmd is distributed in two formats: 1) Prebuilt RPM file - should work on most RPM-based distributions 2) Source .tar.gz package Installation of RPM package --------------------------- As user "root" run: rpm -ivh s3cmd-X.Y.Z.noarch.rpm where X.Y.Z is the most recent s3cmd release version. You may be informed about missing dependencies on Python or some libraries. Please consult your distribution documentation on ways to solve the problem. Installation from PyPA (Python Package Authority) --------------------- S3cmd can be installed from the PyPA using PIP (the recommended tool for PyPA). 1) Confirm you have PIP installed. PIP home page is here: https://pypi.python.org/pypi/pip Example install on a RHEL yum based machine sudo yum install python-pip 2) Install with pip sudo pip install s3cmd Installation from zip file -------------------------- There are three options to run s3cmd from source tarball: 1) The S3cmd program, as distributed in s3cmd-X.Y.Z.tar.gz on SourceForge or in master.zip on GitHub, can be run directly from where you unzipped the package. 2) Or you may want to move "s3cmd" file and "S3" subdirectory to some other path. Make sure that "S3" subdirectory ends up in the same place where you move the "s3cmd" file. For instance if you decide to move s3cmd to you $HOME/bin you will have $HOME/bin/s3cmd file and $HOME/bin/S3 directory with a number of support files. 3) The cleanest and most recommended approach is to unzip the package and then just run: python setup.py install You will however need Python "distutils" module for this to work. It is often part of the core python package (e.g. in OpenSuse Python 2.5 package) or it can be installed using your package manager, e.g. in Debian use apt-get install python-setuptools Again, consult your distribution documentation on how to find out the actual package name and how to install it then. Note that on Linux, if you are not "root" already, you may need to run: sudo python setup.py install instead. Note to distributions package maintainers ---------------------------------------- Define shell environment variable S3CMD_PACKAGING=yes if you don't want setup.py to install manpages and doc files. You'll have to install them manually in your .spec or similar package build scripts. On the other hand if you want setup.py to install manpages and docs, but to other than default path, define env variables $S3CMD_INSTPATH_MAN and $S3CMD_INSTPATH_DOC. Check out setup.py for details and default values. Where to get help ----------------- If in doubt, or if something doesn't work as expected, get back to us via mailing list: s3tools-general@lists.sourceforge.net or visit the S3cmd / S3tools homepage at: http://s3tools.org s3cmd-1.6.1/setup.cfg0000664000175000017500000000013012647747124015471 0ustar mdomschmdomsch00000000000000[sdist] formats = gztar,zip [egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 s3cmd-1.6.1/S3/0000775000175000017500000000000012647747123014142 5ustar mdomschmdomsch00000000000000s3cmd-1.6.1/S3/Crypto.py0000664000175000017500000001637412647745544016014 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import sys import hmac import base64 import Config from logging import debug from Utils import encode_to_s3, time_to_epoch, deunicodise import datetime import urllib # hashlib backported to python 2.4 / 2.5 is not compatible with hmac! if sys.version_info[0] == 2 and sys.version_info[1] < 6: import sha as sha1 from Crypto.Hash import SHA256 as sha256 else: from hashlib import sha1, sha256 __all__ = [] ### AWS Version 2 signing def sign_string_v2(string_to_sign): """Sign a string with the secret key, returning base64 encoded results. By default the configured secret key is used, but may be overridden as an argument. Useful for REST authentication. See http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html """ signature = base64.encodestring(hmac.new(Config.Config().secret_key, string_to_sign, sha1).digest()).strip() return signature __all__.append("sign_string_v2") def sign_url_v2(url_to_sign, expiry): """Sign a URL in s3://bucket/object form with the given expiry time. The object will be accessible via the signed URL until the AWS key and secret are revoked or the expiry time is reached, even if the object is otherwise private. See: http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html """ return sign_url_base_v2( bucket = url_to_sign.bucket(), object = url_to_sign.object(), expiry = expiry ) __all__.append("sign_url_v2") def sign_url_base_v2(**parms): """Shared implementation of sign_url methods. Takes a hash of 'bucket', 'object' and 'expiry' as args.""" content_disposition=Config.Config().content_disposition content_type=Config.Config().content_type parms['expiry']=time_to_epoch(parms['expiry']) parms['access_key']=Config.Config().access_key parms['host_base']=Config.Config().host_base debug("Expiry interpreted as epoch time %s", parms['expiry']) signtext = 'GET\n\n\n%(expiry)d\n/%(bucket)s/%(object)s' % parms param_separator = '?' if content_disposition is not None: signtext += param_separator + 'response-content-disposition=' + content_disposition param_separator = '&' if content_type is not None: signtext += param_separator + 'response-content-type=' + content_type param_separator = '&' debug("Signing plaintext: %r", signtext) parms['sig'] = urllib.quote_plus(sign_string_v2(signtext)) debug("Urlencoded signature: %s", parms['sig']) url = "http://%(bucket)s.%(host_base)s/%(object)s?AWSAccessKeyId=%(access_key)s&Expires=%(expiry)d&Signature=%(sig)s" % parms if content_disposition is not None: url += "&response-content-disposition=" + urllib.quote_plus(content_disposition) if content_type is not None: url += "&response-content-type=" + urllib.quote_plus(content_type) return url def sign(key, msg): return hmac.new(key, encode_to_s3(msg), sha256).digest() def getSignatureKey(key, dateStamp, regionName, serviceName): kDate = sign(encode_to_s3('AWS4' + key), dateStamp) kRegion = sign(kDate, regionName) kService = sign(kRegion, serviceName) kSigning = sign(kService, 'aws4_request') return kSigning def sign_string_v4(method='GET', host='', canonical_uri='/', params={}, region='us-east-1', cur_headers={}, body=''): service = 's3' cfg = Config.Config() access_key = cfg.access_key secret_key = cfg.secret_key t = datetime.datetime.utcnow() amzdate = t.strftime('%Y%m%dT%H%M%SZ') datestamp = t.strftime('%Y%m%d') canonical_querystring = '&'.join(['%s=%s' % (urllib.quote_plus(p), quote_param(params[p])) for p in sorted(params.keys())]) splits = canonical_uri.split('?') canonical_uri = quote_param(splits[0], quote_backslashes=False) canonical_querystring += '&'.join([('%s' if '=' in qs else '%s=') % qs for qs in splits[1:]]) if type(body) == type(sha256('')): payload_hash = body.hexdigest() else: payload_hash = sha256(body).hexdigest() canonical_headers = {'host' : host, 'x-amz-content-sha256': payload_hash, 'x-amz-date' : amzdate } signed_headers = 'host;x-amz-content-sha256;x-amz-date' for header in cur_headers.keys(): # avoid duplicate headers and previous Authorization if header == 'Authorization' or header in signed_headers.split(';'): continue canonical_headers[header.strip()] = str(cur_headers[header]).strip() signed_headers += ';' + header.strip() # sort headers into a string canonical_headers_str = '' for k, v in sorted(canonical_headers.items()): canonical_headers_str += k + ":" + v + "\n" canonical_headers = canonical_headers_str debug(u"canonical_headers = %s" % canonical_headers) signed_headers = ';'.join(sorted(signed_headers.split(';'))) canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers + '\n' + signed_headers + '\n' + payload_hash debug('Canonical Request:\n%s\n----------------------' % canonical_request) algorithm = 'AWS4-HMAC-SHA256' credential_scope = datestamp + '/' + region + '/' + service + '/' + 'aws4_request' string_to_sign = algorithm + '\n' + amzdate + '\n' + credential_scope + '\n' + sha256(canonical_request).hexdigest() signing_key = getSignatureKey(secret_key, datestamp, region, service) signature = hmac.new(signing_key, encode_to_s3(string_to_sign), sha256).hexdigest() authorization_header = algorithm + ' ' + 'Credential=' + access_key + '/' + credential_scope + ',' + 'SignedHeaders=' + signed_headers + ',' + 'Signature=' + signature headers = dict(cur_headers.items() + {'x-amz-date':amzdate, 'Authorization':authorization_header, 'x-amz-content-sha256': payload_hash}.items()) debug("signature-v4 headers: %s" % headers) return headers def quote_param(param, quote_backslashes=True): # As stated by Amazon the '/' in the filename should stay unquoted and %20 should be used for space instead of '+' quoted = urllib.quote_plus(urllib.unquote_plus(param), safe='~').replace('+', '%20') if not quote_backslashes: quoted = quoted.replace('%2F', '/') return quoted def checksum_sha256_file(filename, offset=0, size=None): try: hash = sha256() except: # fallback to Crypto SHA256 module hash = sha256.new() with open(deunicodise(filename),'rb') as f: if size is None: for chunk in iter(lambda: f.read(8192), b''): hash.update(chunk) else: f.seek(offset) size_left = size while size_left > 0: chunk = f.read(min(8192, size_left)) size_left -= len(chunk) hash.update(chunk) return hash def checksum_sha256_buffer(buffer, offset=0, size=None): try: hash = sha256() except: # fallback to Crypto SHA256 module hash = sha256.new() if size is None: hash.update(buffer) else: hash.update(buffer[offset:offset+size]) return hash s3cmd-1.6.1/S3/FileLists.py0000664000175000017500000006162112647745544016425 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Create and compare lists of files/objects ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors from S3 import S3 from Config import Config from S3Uri import S3Uri from FileDict import FileDict from Utils import * from Exceptions import ParameterError from HashCache import HashCache from logging import debug, info, warning import os import sys import glob import re import errno __all__ = ["fetch_local_list", "fetch_remote_list", "compare_filelists"] def _os_walk_unicode(top): ''' Reimplementation of python's os.walk to nicely support unicode in input as in output. ''' try: names = os.listdir(deunicodise(top)) except: return dirs, nondirs = [], [] for name in names: name = unicodise(name) if os.path.isdir(deunicodise(os.path.join(top, name))): if not handle_exclude_include_walk_dir(top, name): dirs.append(name) else: nondirs.append(name) yield top, dirs, nondirs for name in dirs: new_path = os.path.join(top, name) if not os.path.islink(deunicodise(new_path)): for x in _os_walk_unicode(new_path): yield x def handle_exclude_include_walk_dir(root, dirname): ''' Should this root/dirname directory be excluded? (otherwise included by default) Exclude dir matches in the current directory This prevents us from recursing down trees we know we want to ignore return True for including, and False for excluding ''' cfg = Config() d = os.path.join(root, dirname, '') debug(u"CHECK: %r" % d) excluded = False for r in cfg.exclude: # python versions end their patterns (from globs) differently, test for both styles. if not (r.pattern.endswith(u'\\/$') or r.pattern.endswith(u'\\/\\Z(?ms)')): continue # we only check for directory patterns here if r.search(d): excluded = True debug(u"EXCL-MATCH: '%s'" % (cfg.debug_exclude[r])) break if excluded: ## No need to check for --include if not excluded for r in cfg.include: # python versions end their patterns (from globs) differently, test for both styles. if not (r.pattern.endswith(u'\\/$') or r.pattern.endswith(u'\\/\\Z(?ms)')): continue # we only check for directory patterns here debug(u"INCL-TEST: %s ~ %s" % (d, r.pattern)) if r.search(d): excluded = False debug(u"INCL-MATCH: '%s'" % (cfg.debug_include[r])) break if excluded: ## Still excluded - ok, action it debug(u"EXCLUDE: %r" % d) else: debug(u"PASS: %r" % d) return excluded def _fswalk_follow_symlinks(path): ''' Walk filesystem, following symbolic links (but without recursion), on python2.4 and later If a symlink directory loop is detected, emit a warning and skip. E.g.: dir1/dir2/sym-dir -> ../dir2 ''' assert os.path.isdir(deunicodise(path)) # only designed for directory argument walkdirs = set([path]) for dirpath, dirnames, filenames in _os_walk_unicode(path): real_dirpath = unicodise(os.path.realpath(deunicodise(dirpath))) for dirname in dirnames: current = os.path.join(dirpath, dirname) real_current = unicodise(os.path.realpath(deunicodise(current))) if os.path.islink(deunicodise(current)): if (real_dirpath == real_current or real_dirpath.startswith(real_current + os.path.sep)): warning("Skipping recursively symlinked directory %s" % dirname) else: walkdirs.add(current) for walkdir in walkdirs: for dirpath, dirnames, filenames in _os_walk_unicode(walkdir): yield (dirpath, dirnames, filenames) def _fswalk_no_symlinks(path): ''' Directory tree generator path (str) is the root of the directory tree to walk ''' for dirpath, dirnames, filenames in _os_walk_unicode(path): yield (dirpath, dirnames, filenames) def filter_exclude_include(src_list): debug(u"Applying --exclude/--include") cfg = Config() exclude_list = FileDict(ignore_case = False) for file in src_list.keys(): debug(u"CHECK: %s" % file) excluded = False for r in cfg.exclude: if r.search(file): excluded = True debug(u"EXCL-MATCH: '%s'" % (cfg.debug_exclude[r])) break if excluded: ## No need to check for --include if not excluded for r in cfg.include: if r.search(file): excluded = False debug(u"INCL-MATCH: '%s'" % (cfg.debug_include[r])) break if excluded: ## Still excluded - ok, action it debug(u"EXCLUDE: %s" % file) exclude_list[file] = src_list[file] del(src_list[file]) continue else: debug(u"PASS: %r" % (file)) return src_list, exclude_list def _get_filelist_from_file(cfg, local_path): def _append(d, key, value): if key not in d: d[key] = [value] else: d[key].append(value) filelist = {} for fname in cfg.files_from: if fname == u'-': f = sys.stdin else: try: f = open(deunicodise(fname), 'r') except IOError, e: warning(u"--files-from input file %s could not be opened for reading (%s), skipping." % (fname, e.strerror)) continue for line in f: line = unicodise(line).strip() line = os.path.normpath(os.path.join(local_path, line)) dirname = unicodise(os.path.dirname(deunicodise(line))) basename = unicodise(os.path.basename(deunicodise(line))) _append(filelist, dirname, basename) if f != sys.stdin: f.close() # reformat to match os.walk() result = [] keys = filelist.keys() keys.sort() for key in keys: values = filelist[key] values.sort() result.append((key, [], values)) return result def fetch_local_list(args, is_src = False, recursive = None): def _fetch_local_list_info(loc_list): len_loc_list = len(loc_list) total_size = 0 info(u"Running stat() and reading/calculating MD5 values on %d files, this may take some time..." % len_loc_list) counter = 0 for relative_file in loc_list: counter += 1 if counter % 1000 == 0: info(u"[%d/%d]" % (counter, len_loc_list)) if relative_file == '-': continue full_name = loc_list[relative_file]['full_name'] try: sr = os.stat_result(os.stat(deunicodise(full_name))) except OSError, e: if e.errno == errno.ENOENT: # file was removed async to us getting the list continue else: raise loc_list[relative_file].update({ 'size' : sr.st_size, 'mtime' : sr.st_mtime, 'dev' : sr.st_dev, 'inode' : sr.st_ino, 'uid' : sr.st_uid, 'gid' : sr.st_gid, 'sr': sr # save it all, may need it in preserve_attrs_list ## TODO: Possibly more to save here... }) total_size += sr.st_size if 'md5' in cfg.sync_checks: md5 = cache.md5(sr.st_dev, sr.st_ino, sr.st_mtime, sr.st_size) if md5 is None: try: md5 = loc_list.get_md5(relative_file) # this does the file I/O except IOError: continue cache.add(sr.st_dev, sr.st_ino, sr.st_mtime, sr.st_size, md5) loc_list.record_hardlink(relative_file, sr.st_dev, sr.st_ino, md5, sr.st_size) return total_size def _get_filelist_local(loc_list, local_uri, cache): info(u"Compiling list of local files...") if local_uri.basename() == "-": try: uid = os.geteuid() gid = os.getegid() except: uid = 0 gid = 0 loc_list["-"] = { 'full_name' : '-', 'size' : -1, 'mtime' : -1, 'uid' : uid, 'gid' : gid, 'dev' : 0, 'inode': 0, } return loc_list, True if local_uri.isdir(): local_base = local_uri.basename() local_path = local_uri.path() if is_src and len(cfg.files_from): filelist = _get_filelist_from_file(cfg, local_path) single_file = False else: if cfg.follow_symlinks: filelist = _fswalk_follow_symlinks(local_path) else: filelist = _fswalk_no_symlinks(local_path) single_file = False else: local_base = "" local_path = local_uri.dirname() filelist = [( local_path, [], [local_uri.basename()] )] single_file = True for root, dirs, files in filelist: rel_root = root.replace(local_path, local_base, 1) for f in files: full_name = os.path.join(root, f) if not os.path.isfile(deunicodise(full_name)): if os.path.exists(deunicodise(full_name)): warning(u"Skipping over non regular file: %s" % full_name) continue if os.path.islink(deunicodise(full_name)): if not cfg.follow_symlinks: warning(u"Skipping over symbolic link: %s" % full_name) continue relative_file = os.path.join(rel_root, f) if os.path.sep != "/": # Convert non-unix dir separators to '/' relative_file = "/".join(relative_file.split(os.path.sep)) if cfg.urlencoding_mode == "normal": relative_file = replace_nonprintables(relative_file) if relative_file.startswith('./'): relative_file = relative_file[2:] loc_list[relative_file] = { 'full_name' : full_name, } return loc_list, single_file def _maintain_cache(cache, local_list): # if getting the file list from files_from, it is going to be # a subset of the actual tree. We should not purge content # outside of that subset as we don't know if it's valid or # not. Leave it to a non-files_from run to purge. if cfg.cache_file and len(cfg.files_from) == 0: cache.mark_all_for_purge() for i in local_list.keys(): cache.unmark_for_purge(local_list[i]['dev'], local_list[i]['inode'], local_list[i]['mtime'], local_list[i]['size']) cache.purge() cache.save(cfg.cache_file) cfg = Config() cache = HashCache() if cfg.cache_file: try: cache.load(cfg.cache_file) except IOError: info(u"No cache file found, creating it.") local_uris = [] local_list = FileDict(ignore_case = False) single_file = False if type(args) not in (list, tuple, set): args = [args] if recursive == None: recursive = cfg.recursive for arg in args: uri = S3Uri(arg) if not uri.type == 'file': raise ParameterError("Expecting filename or directory instead of: %s" % arg) if uri.isdir() and not recursive: raise ParameterError("Use --recursive to upload a directory: %s" % arg) local_uris.append(uri) for uri in local_uris: list_for_uri, single_file = _get_filelist_local(local_list, uri, cache) ## Single file is True if and only if the user ## specified one local URI and that URI represents ## a FILE. Ie it is False if the URI was of a DIR ## and that dir contained only one FILE. That's not ## a case of single_file==True. if len(local_list) > 1: single_file = False local_list, exclude_list = filter_exclude_include(local_list) total_size = _fetch_local_list_info(local_list) _maintain_cache(cache, local_list) return local_list, single_file, exclude_list, total_size def fetch_remote_list(args, require_attribs = False, recursive = None, uri_params = {}): def _get_remote_attribs(uri, remote_item): response = S3(cfg).object_info(uri) if not response.get('headers'): return remote_item.update({ 'size': int(response['headers']['content-length']), 'md5': response['headers']['etag'].strip('"\''), 'timestamp' : dateRFC822toUnix(response['headers']['last-modified']) }) try: md5 = response['s3cmd-attrs']['md5'] remote_item.update({'md5': md5}) debug(u"retreived md5=%s from headers" % md5) except KeyError: pass def _get_filelist_remote(remote_uri, recursive = True): ## If remote_uri ends with '/' then all remote files will have ## the remote_uri prefix removed in the relative path. ## If, on the other hand, the remote_uri ends with something else ## (probably alphanumeric symbol) we'll use the last path part ## in the relative path. ## ## Complicated, eh? See an example: ## _get_filelist_remote("s3://bckt/abc/def") may yield: ## { 'def/file1.jpg' : {}, 'def/xyz/blah.txt' : {} } ## _get_filelist_remote("s3://bckt/abc/def/") will yield: ## { 'file1.jpg' : {}, 'xyz/blah.txt' : {} } ## Furthermore a prefix-magic can restrict the return list: ## _get_filelist_remote("s3://bckt/abc/def/x") yields: ## { 'xyz/blah.txt' : {} } info(u"Retrieving list of remote files for %s ..." % remote_uri) empty_fname_re = re.compile(r'\A\s*\Z') total_size = 0 s3 = S3(Config()) response = s3.bucket_list(remote_uri.bucket(), prefix = remote_uri.object(), recursive = recursive, uri_params = uri_params) rem_base_original = rem_base = remote_uri.object() remote_uri_original = remote_uri if rem_base != '' and rem_base[-1] != '/': rem_base = rem_base[:rem_base.rfind('/')+1] remote_uri = S3Uri(u"s3://%s/%s" % (remote_uri.bucket(), rem_base)) rem_base_len = len(rem_base) rem_list = FileDict(ignore_case = False) break_now = False for object in response['list']: if object['Key'] == rem_base_original and object['Key'][-1] != "/": ## We asked for one file and we got that file :-) key = unicodise(os.path.basename(deunicodise(object['Key']))) object_uri_str = remote_uri_original.uri() break_now = True rem_list = FileDict(ignore_case = False) ## Remove whatever has already been put to rem_list else: key = object['Key'][rem_base_len:] ## Beware - this may be '' if object['Key']==rem_base !! object_uri_str = remote_uri.uri() + key if empty_fname_re.match(key): # Objects may exist on S3 with empty names (''), which don't map so well to common filesystems. warning(u"Empty object name on S3 found, ignoring.") continue rem_list[key] = { 'size' : int(object['Size']), 'timestamp' : dateS3toUnix(object['LastModified']), ## Sadly it's upload time, not our lastmod time :-( 'md5' : object['ETag'].strip('"\''), 'object_key' : object['Key'], 'object_uri_str' : object_uri_str, 'base_uri' : remote_uri, 'dev' : None, 'inode' : None, } if '-' in rem_list[key]['md5']: # always get it for multipart uploads _get_remote_attribs(S3Uri(object_uri_str), rem_list[key]) md5 = rem_list[key]['md5'] rem_list.record_md5(key, md5) total_size += int(object['Size']) if break_now: break return rem_list, total_size cfg = Config() remote_uris = [] remote_list = FileDict(ignore_case = False) if type(args) not in (list, tuple, set): args = [args] if recursive == None: recursive = cfg.recursive for arg in args: uri = S3Uri(arg) if not uri.type == 's3': raise ParameterError("Expecting S3 URI instead of '%s'" % arg) remote_uris.append(uri) total_size = 0 if recursive: for uri in remote_uris: objectlist, tmp_total_size = _get_filelist_remote(uri, recursive = True) total_size += tmp_total_size for key in objectlist: remote_list[key] = objectlist[key] remote_list.record_md5(key, objectlist.get_md5(key)) else: for uri in remote_uris: uri_str = uri.uri() ## Wildcards used in remote URI? ## If yes we'll need a bucket listing... wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1) if len(wildcard_split_result) == 2: # wildcards found prefix, rest = wildcard_split_result ## Only request recursive listing if the 'rest' of the URI, ## i.e. the part after first wildcard, contains '/' need_recursion = '/' in rest objectlist, tmp_total_size = _get_filelist_remote(S3Uri(prefix), recursive = need_recursion) total_size += tmp_total_size for key in objectlist: ## Check whether the 'key' matches the requested wildcards if glob.fnmatch.fnmatch(objectlist[key]['object_uri_str'], uri_str): remote_list[key] = objectlist[key] else: ## No wildcards - simply append the given URI to the list key = unicodise(os.path.basename(deunicodise(uri.object()))) if not key: raise ParameterError(u"Expecting S3 URI with a filename or --recursive: %s" % uri.uri()) remote_item = { 'base_uri': uri, 'object_uri_str': uri.uri(), 'object_key': uri.object() } if require_attribs: _get_remote_attribs(uri, remote_item) remote_list[key] = remote_item md5 = remote_item.get('md5') if md5: remote_list.record_md5(key, md5) total_size += remote_item.get('size', 0) remote_list, exclude_list = filter_exclude_include(remote_list) return remote_list, exclude_list, total_size def compare_filelists(src_list, dst_list, src_remote, dst_remote): def __direction_str(is_remote): return is_remote and "remote" or "local" def _compare(src_list, dst_lst, src_remote, dst_remote, file): """Return True if src_list[file] matches dst_list[file], else False""" attribs_match = True if not (src_list.has_key(file) and dst_list.has_key(file)): info(u"%s: does not exist in one side or the other: src_list=%s, dst_list=%s" % (file, src_list.has_key(file), dst_list.has_key(file))) return False ## check size first if 'size' in cfg.sync_checks: if 'size' in dst_list[file] and 'size' in src_list[file]: if dst_list[file]['size'] != src_list[file]['size']: debug(u"xfer: %s (size mismatch: src=%s dst=%s)" % (file, src_list[file]['size'], dst_list[file]['size'])) attribs_match = False ## check md5 compare_md5 = 'md5' in cfg.sync_checks # Multipart-uploaded files don't have a valid md5 sum - it ends with "...-nn" if compare_md5: if (src_remote == True and '-' in src_list[file]['md5']) or (dst_remote == True and '-' in dst_list[file]['md5']): compare_md5 = False info(u"disabled md5 check for %s" % file) if attribs_match and compare_md5: try: src_md5 = src_list.get_md5(file) dst_md5 = dst_list.get_md5(file) except (IOError,OSError): # md5 sum verification failed - ignore that file altogether debug(u"IGNR: %s (disappeared)" % (file)) warning(u"%s: file disappeared, ignoring." % (file)) raise if src_md5 != dst_md5: ## checksums are different. attribs_match = False debug(u"XFER: %s (md5 mismatch: src=%s dst=%s)" % (file, src_md5, dst_md5)) return attribs_match # we don't support local->local sync, use 'rsync' or something like that instead ;-) assert(not(src_remote == False and dst_remote == False)) info(u"Verifying attributes...") cfg = Config() ## Items left on src_list will be transferred ## Items left on update_list will be transferred after src_list ## Items left on copy_pairs will be copied from dst1 to dst2 update_list = FileDict(ignore_case = False) ## Items left on dst_list will be deleted copy_pairs = [] debug("Comparing filelists (direction: %s -> %s)" % (__direction_str(src_remote), __direction_str(dst_remote))) for relative_file in src_list.keys(): debug(u"CHECK: %s" % (relative_file)) if dst_list.has_key(relative_file): ## Was --skip-existing requested? if cfg.skip_existing: debug(u"IGNR: %s (used --skip-existing)" % (relative_file)) del(src_list[relative_file]) del(dst_list[relative_file]) continue try: same_file = _compare(src_list, dst_list, src_remote, dst_remote, relative_file) except (IOError,OSError): debug(u"IGNR: %s (disappeared)" % (relative_file)) warning(u"%s: file disappeared, ignoring." % (relative_file)) del(src_list[relative_file]) del(dst_list[relative_file]) continue if same_file: debug(u"IGNR: %s (transfer not needed)" % relative_file) del(src_list[relative_file]) del(dst_list[relative_file]) else: # look for matching file in src try: md5 = src_list.get_md5(relative_file) except IOError: md5 = None if md5 is not None and dst_list.by_md5.has_key(md5): # Found one, we want to copy dst1 = list(dst_list.by_md5[md5])[0] debug(u"DST COPY src: %s -> %s" % (dst1, relative_file)) copy_pairs.append((src_list[relative_file], dst1, relative_file)) del(src_list[relative_file]) del(dst_list[relative_file]) else: # record that we will get this file transferred to us (before all the copies), so if we come across it later again, # we can copy from _this_ copy (e.g. we only upload it once, and copy thereafter). dst_list.record_md5(relative_file, md5) update_list[relative_file] = src_list[relative_file] del src_list[relative_file] del dst_list[relative_file] else: # dst doesn't have this file # look for matching file elsewhere in dst try: md5 = src_list.get_md5(relative_file) except IOError: md5 = None dst1 = dst_list.find_md5_one(md5) if dst1 is not None: # Found one, we want to copy debug(u"DST COPY dst: %s -> %s" % (dst1, relative_file)) copy_pairs.append((src_list[relative_file], dst1, relative_file)) del(src_list[relative_file]) else: # we don't have this file, and we don't have a copy of this file elsewhere. Get it. # record that we will get this file transferred to us (before all the copies), so if we come across it later again, # we can copy from _this_ copy (e.g. we only upload it once, and copy thereafter). dst_list.record_md5(relative_file, md5) for f in dst_list.keys(): if src_list.has_key(f) or update_list.has_key(f): # leave only those not on src_list + update_list del dst_list[f] return src_list, dst_list, update_list, copy_pairs # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/__init__.py0000664000175000017500000000003012647745544016251 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- s3cmd-1.6.1/S3/S3.py0000664000175000017500000020226512647745544015015 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import sys import os import time import errno import base64 import mimetypes from xml.sax import saxutils from logging import debug, info, warning, error from stat import ST_SIZE from urllib import quote_plus try: from hashlib import md5 except ImportError: from md5 import md5 from Utils import * from SortedDict import SortedDict from AccessLog import AccessLog from ACL import ACL, GranteeLogDelivery from BidirMap import BidirMap from Config import Config from Exceptions import * from MultiPart import MultiPartUpload from S3Uri import S3Uri from ConnMan import ConnMan, CertificateError from Crypto import sign_string_v2, sign_string_v4, checksum_sha256_file, checksum_sha256_buffer try: from ctypes import ArgumentError import magic try: ## https://github.com/ahupp/python-magic ## Always expect unicode for python 2 ## (has Magic class but no "open()" function) magic_ = magic.Magic(mime=True) def mime_magic_file(file): return magic_.from_file(file) except TypeError: try: ## file-5.11 built-in python bindings ## Sources: http://www.darwinsys.com/file/ ## Expects unicode since version 5.19, encoded strings before ## we can't tell if a given copy of the magic library will take a ## filesystem-encoded string or a unicode value, so try first ## with the unicode, then with the encoded string. ## (has Magic class and "open()" function) magic_ = magic.open(magic.MAGIC_MIME) magic_.load() def mime_magic_file(file): try: return magic_.file(file) except (UnicodeDecodeError, UnicodeEncodeError, ArgumentError): return magic_.file(deunicodise(file)) except AttributeError: ## http://pypi.python.org/pypi/filemagic ## Accept gracefully both unicode and encoded ## (has Magic class but not "mime" argument and no "open()" function ) magic_ = magic.Magic(flags=magic.MAGIC_MIME) def mime_magic_file(file): return magic_.id_filename(file) except AttributeError: ## Older python-magic versions doesn't have a "Magic" method ## Only except encoded strings ## (has no Magic class but "open()" function) magic_ = magic.open(magic.MAGIC_MIME) magic_.load() def mime_magic_file(file): return magic_.file(deunicodise(file)) except ImportError, e: if 'magic' in str(e): magic_message = "Module python-magic is not available." else: magic_message = "Module python-magic can't be used (%s)." % e.message magic_message += " Guessing MIME types based on file extensions." magic_warned = False def mime_magic_file(file): global magic_warned if (not magic_warned): warning(magic_message) magic_warned = True return mimetypes.guess_type(file)[0] def mime_magic(file): ## NOTE: So far in the code, "file" var is already unicode def _mime_magic(file): magictype = mime_magic_file(file) return magictype result = _mime_magic(file) if result is not None: if isinstance(result, str): if ';' in result: mimetype, charset = result.split(';') charset = charset[len('charset'):] result = (mimetype, charset) else: result = (result, None) if result is None: result = (None, None) return result __all__ = [] class S3Request(object): region_map = {} def __init__(self, s3, method_string, resource, headers, body, params = {}): self.s3 = s3 self.headers = SortedDict(headers or {}, ignore_case = True) if len(self.s3.config.access_token)>0: self.s3.config.role_refresh() self.headers['x-amz-security-token']=self.s3.config.access_token self.resource = resource self.method_string = method_string self.params = params self.body = body self.requester_pays() def requester_pays(self): if self.s3.config.requester_pays and self.method_string in ("GET", "POST", "PUT"): self.headers['x-amz-request-payer'] = 'requester' def update_timestamp(self): if self.headers.has_key("date"): del(self.headers["date"]) self.headers["x-amz-date"] = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime()) def format_param_str(self): """ Format URL parameters from self.params and returns ?parm1=val1&parm2=val2 or an empty string if there are no parameters. Output of this function should be appended directly to self.resource['uri'] """ param_str = "" for param in self.params: if self.params[param] not in (None, ""): param_str += "&%s=%s" % (param, self.params[param]) else: param_str += "&%s" % param return param_str and "?" + param_str[1:] def use_signature_v2(self): if self.s3.endpoint_requires_signature_v4: return False # in case of bad DNS name due to bucket name v2 will be used # this way we can still use capital letters in bucket names for the older regions if self.resource['bucket'] is None or not check_bucket_name_dns_conformity(self.resource['bucket']) or self.s3.config.signature_v2 or self.s3.fallback_to_signature_v2: return True return False def sign(self): if self.use_signature_v2(): h = self.method_string + "\n" h += self.headers.get("content-md5", "")+"\n" h += self.headers.get("content-type", "")+"\n" h += self.headers.get("date", "")+"\n" for header in sorted(self.headers.keys()): if header.startswith("x-amz-"): h += header+":"+str(self.headers[header])+"\n" if header.startswith("x-emc-"): h += header+":"+str(self.headers[header])+"\n" if self.resource['bucket']: h += "/" + self.resource['bucket'] h += self.resource['uri'] debug("Using signature v2") debug("SignHeaders: " + repr(h)) signature = sign_string_v2(h) self.headers["Authorization"] = "AWS "+self.s3.config.access_key+":"+signature else: debug("Using signature v4") hostname = self.s3.get_hostname(self.resource['bucket']) ## Default to bucket part of DNS. resource_uri = self.resource['uri'] ## If bucket is not part of DNS assume path style to complete the request. if not check_bucket_name_dns_support(self.s3.config.host_bucket, self.resource['bucket']): if self.resource['bucket']: resource_uri = "/" + self.resource['bucket'] + self.resource['uri'] bucket_region = S3Request.region_map.get(self.resource['bucket'], Config().bucket_location) ## Sign the data. self.headers = sign_string_v4(self.method_string, hostname, resource_uri, self.params, bucket_region, self.headers, self.body) def get_triplet(self): self.update_timestamp() self.sign() resource = dict(self.resource) ## take a copy resource['uri'] += self.format_param_str() return (self.method_string, resource, self.headers) class S3(object): http_methods = BidirMap( GET = 0x01, PUT = 0x02, HEAD = 0x04, DELETE = 0x08, POST = 0x10, MASK = 0x1F, ) targets = BidirMap( SERVICE = 0x0100, BUCKET = 0x0200, OBJECT = 0x0400, BATCH = 0x0800, MASK = 0x0700, ) operations = BidirMap( UNDFINED = 0x0000, LIST_ALL_BUCKETS = targets["SERVICE"] | http_methods["GET"], BUCKET_CREATE = targets["BUCKET"] | http_methods["PUT"], BUCKET_LIST = targets["BUCKET"] | http_methods["GET"], BUCKET_DELETE = targets["BUCKET"] | http_methods["DELETE"], OBJECT_PUT = targets["OBJECT"] | http_methods["PUT"], OBJECT_GET = targets["OBJECT"] | http_methods["GET"], OBJECT_HEAD = targets["OBJECT"] | http_methods["HEAD"], OBJECT_DELETE = targets["OBJECT"] | http_methods["DELETE"], OBJECT_POST = targets["OBJECT"] | http_methods["POST"], BATCH_DELETE = targets["BATCH"] | http_methods["POST"], ) codes = { "NoSuchBucket" : "Bucket '%s' does not exist", "AccessDenied" : "Access to bucket '%s' was denied", "BucketAlreadyExists" : "Bucket '%s' already exists", } ## S3 sometimes sends HTTP-307 response redir_map = {} ## Maximum attempts of re-issuing failed requests _max_retries = 5 def __init__(self, config): self.config = config self.fallback_to_signature_v2 = False self.endpoint_requires_signature_v4 = False def storage_class(self): # Note - you cannot specify GLACIER here # https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html cls = 'STANDARD' if self.config.storage_class != "": return self.config.storage_class if self.config.reduced_redundancy: cls = 'REDUCED_REDUNDANCY' return cls def get_hostname(self, bucket): if bucket and check_bucket_name_dns_support(self.config.host_bucket, bucket): if self.redir_map.has_key(bucket): host = self.redir_map[bucket] else: host = getHostnameFromBucket(bucket) else: host = self.config.host_base debug('get_hostname(%s): %s' % (bucket, host)) return host def set_hostname(self, bucket, redir_hostname): self.redir_map[bucket] = redir_hostname def format_uri(self, resource): if resource['bucket'] and not check_bucket_name_dns_support(self.config.host_bucket, resource['bucket']): uri = "/%s%s" % (resource['bucket'], resource['uri']) else: uri = resource['uri'] if self.config.proxy_host != "": uri = "http://%s%s" % (self.get_hostname(resource['bucket']), uri) debug('format_uri(): ' + uri) return uri ## Commands / Actions def list_all_buckets(self): request = self.create_request("LIST_ALL_BUCKETS") response = self.send_request(request) response["list"] = getListFromXml(response["data"], "Bucket") return response def bucket_list(self, bucket, prefix = None, recursive = None, uri_params = {}): item_list = [] prefixes = [] for dirs, objects in self.bucket_list_streaming(bucket, prefix, recursive, uri_params): item_list.extend(objects) prefixes.extend(dirs) response = {} response['list'] = item_list response['common_prefixes'] = prefixes return response def bucket_list_streaming(self, bucket, prefix = None, recursive = None, uri_params = {}): """ Generator that produces , pairs of groups of content of a specified bucket. """ def _list_truncated(data): ## can either be "true" or "false" or be missing completely is_truncated = getTextFromXml(data, ".//IsTruncated") or "false" return is_truncated.lower() != "false" def _get_contents(data): return getListFromXml(data, "Contents") def _get_common_prefixes(data): return getListFromXml(data, "CommonPrefixes") uri_params = uri_params.copy() truncated = True prefixes = [] while truncated: response = self.bucket_list_noparse(bucket, prefix, recursive, uri_params) current_list = _get_contents(response["data"]) current_prefixes = _get_common_prefixes(response["data"]) truncated = _list_truncated(response["data"]) if truncated: if current_list: uri_params['marker'] = self.urlencode_string(current_list[-1]["Key"]) else: uri_params['marker'] = self.urlencode_string(current_prefixes[-1]["Prefix"]) debug("Listing continues after '%s'" % uri_params['marker']) yield current_prefixes, current_list def bucket_list_noparse(self, bucket, prefix = None, recursive = None, uri_params = {}): if prefix: uri_params['prefix'] = self.urlencode_string(prefix) if not self.config.recursive and not recursive: uri_params['delimiter'] = "/" request = self.create_request("BUCKET_LIST", bucket = bucket, **uri_params) response = self.send_request(request) #debug(response) return response def bucket_create(self, bucket, bucket_location = None): headers = SortedDict(ignore_case = True) body = "" if bucket_location and bucket_location.strip().upper() != "US" and bucket_location.strip().lower() != "us-east-1": bucket_location = bucket_location.strip() if bucket_location.upper() == "EU": bucket_location = bucket_location.upper() else: bucket_location = bucket_location.lower() body = "" body += bucket_location body += "" debug("bucket_location: " + body) check_bucket_name(bucket, dns_strict = True) else: check_bucket_name(bucket, dns_strict = False) if self.config.acl_public: headers["x-amz-acl"] = "public-read" request = self.create_request("BUCKET_CREATE", bucket = bucket, headers = headers, body = body) response = self.send_request(request) return response def bucket_delete(self, bucket): request = self.create_request("BUCKET_DELETE", bucket = bucket) response = self.send_request(request) return response def get_bucket_location(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?location") response = self.send_request(request) location = getTextFromXml(response['data'], "LocationConstraint") if not location or location in [ "", "US" ]: location = "us-east-1" elif location == "EU": location = "eu-west-1" return location def get_bucket_requester_pays(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?requestPayment") response = self.send_request(request) payer = getTextFromXml(response['data'], "Payer") return payer def bucket_info(self, uri): response = {} response['bucket-location'] = self.get_bucket_location(uri) try: response['requester-pays'] = self.get_bucket_requester_pays(uri) except S3Error as e: response['requester-pays'] = 'none' return response def website_info(self, uri, bucket_location = None): bucket = uri.bucket() request = self.create_request("BUCKET_LIST", bucket = bucket, extra="?website") try: response = self.send_request(request) response['index_document'] = getTextFromXml(response['data'], ".//IndexDocument//Suffix") response['error_document'] = getTextFromXml(response['data'], ".//ErrorDocument//Key") response['website_endpoint'] = self.config.website_endpoint % { "bucket" : uri.bucket(), "location" : self.get_bucket_location(uri)} return response except S3Error, e: if e.status == 404: debug("Could not get /?website - website probably not configured for this bucket") return None raise def website_create(self, uri, bucket_location = None): bucket = uri.bucket() body = '' body += ' ' body += (' %s' % self.config.website_index) body += ' ' if self.config.website_error: body += ' ' body += (' %s' % self.config.website_error) body += ' ' body += '' request = self.create_request("BUCKET_CREATE", bucket = bucket, extra="?website", body = body) response = self.send_request(request) debug("Received response '%s'" % (response)) return response def website_delete(self, uri, bucket_location = None): bucket = uri.bucket() request = self.create_request("BUCKET_DELETE", bucket = bucket, extra="?website") response = self.send_request(request) debug("Received response '%s'" % (response)) if response['status'] != 204: raise S3ResponseError("Expected status 204: %s" % response) return response def expiration_info(self, uri, bucket_location = None): bucket = uri.bucket() request = self.create_request("BUCKET_LIST", bucket = bucket, extra="?lifecycle") try: response = self.send_request(request) response['prefix'] = getTextFromXml(response['data'], ".//Rule//Prefix") response['date'] = getTextFromXml(response['data'], ".//Rule//Expiration//Date") response['days'] = getTextFromXml(response['data'], ".//Rule//Expiration//Days") return response except S3Error, e: if e.status == 404: debug("Could not get /?lifecycle - lifecycle probably not configured for this bucket") return None raise def expiration_set(self, uri, bucket_location = None): if self.config.expiry_date and self.config.expiry_days: raise ParameterError("Expect either --expiry-day or --expiry-date") if not (self.config.expiry_date or self.config.expiry_days): if self.config.expiry_prefix: raise ParameterError("Expect either --expiry-day or --expiry-date") debug("del bucket lifecycle") bucket = uri.bucket() request = self.create_request("BUCKET_DELETE", bucket = bucket, extra="?lifecycle") else: request = self._expiration_set(uri) response = self.send_request(request) debug("Received response '%s'" % (response)) return response def _expiration_set(self, uri): debug("put bucket lifecycle") body = '' body += ' ' body += (' %s' % self.config.expiry_prefix) body += (' Enabled') body += (' ') if self.config.expiry_date: body += (' %s' % self.config.expiry_date) elif self.config.expiry_days: body += (' %s' % self.config.expiry_days) body += (' ') body += ' ' body += '' headers = SortedDict(ignore_case = True) headers['content-md5'] = compute_content_md5(body) bucket = uri.bucket() request = self.create_request("BUCKET_CREATE", bucket = bucket, headers = headers, extra="?lifecycle", body = body) return (request) def _guess_content_type(self, filename): content_type = self.config.default_mime_type content_charset = None if filename == "-" and not self.config.default_mime_type: raise ParameterError("You must specify --mime-type or --default-mime-type for files uploaded from stdin.") if self.config.guess_mime_type: if self.config.use_mime_magic: (content_type, content_charset) = mime_magic(filename) else: (content_type, content_charset) = mimetypes.guess_type(filename) if not content_type: content_type = self.config.default_mime_type return (content_type, content_charset) def stdin_content_type(self): content_type = self.config.mime_type if content_type == '': content_type = self.config.default_mime_type content_type += "; charset=" + self.config.encoding.upper() return content_type def content_type(self, filename=None): # explicit command line argument always wins content_type = self.config.mime_type content_charset = None if filename == u'-': return self.stdin_content_type() if not content_type: (content_type, content_charset) = self._guess_content_type(filename) ## add charset to content type if not content_charset: content_charset = self.config.encoding.upper() if self.add_encoding(filename, content_type) and content_charset is not None: content_type = content_type + "; charset=" + content_charset return content_type def add_encoding(self, filename, content_type): if 'charset=' in content_type: return False exts = self.config.add_encoding_exts.split(',') if exts[0]=='': return False parts = filename.rsplit('.',2) if len(parts) < 2: return False ext = parts[1] if ext in exts: return True else: return False def object_put(self, filename, uri, extra_headers = None, extra_label = ""): # TODO TODO # Make it consistent with stream-oriented object_get() if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) if filename != "-" and not os.path.isfile(deunicodise(filename)): raise InvalidFileError(u"Not a regular file") try: if filename == "-": file = sys.stdin size = 0 else: file = open(deunicodise(filename), "rb") size = os.stat(deunicodise(filename))[ST_SIZE] except (IOError, OSError), e: raise InvalidFileError(u"%s" % e.strerror) headers = SortedDict(ignore_case = True) if extra_headers: headers.update(extra_headers) ## Set server side encryption if self.config.server_side_encryption: headers["x-amz-server-side-encryption"] = "AES256" ## Set kms headers if self.config.kms_key: headers['x-amz-server-side-encryption'] = 'aws:kms' headers['x-amz-server-side-encryption-aws-kms-key-id'] = self.config.kms_key ## MIME-type handling headers["content-type"] = self.content_type(filename=filename) ## Other Amazon S3 attributes if self.config.acl_public: headers["x-amz-acl"] = "public-read" headers["x-amz-storage-class"] = self.storage_class() ## Multipart decision multipart = False if not self.config.enable_multipart and filename == "-": raise ParameterError("Multi-part upload is required to upload from stdin") if self.config.enable_multipart: if size > self.config.multipart_chunk_size_mb * 1024 * 1024 or filename == "-": multipart = True if size > self.config.multipart_max_chunks * self.config.multipart_chunk_size_mb * 1024 * 1024: raise ParameterError("Chunk size %d MB results in more than %d chunks. Please increase --multipart-chunk-size-mb" % \ (self.config.multipart_chunk_size_mb, self.config.multipart_max_chunks)) if multipart: # Multipart requests are quite different... drop here return self.send_file_multipart(file, headers, uri, size, extra_label) ## Not multipart... if self.config.put_continue: # Note, if input was stdin, we would be performing multipart upload. # So this will always work as long as the file already uploaded was # not uploaded via MultiUpload, in which case its ETag will not be # an md5. try: info = self.object_info(uri) except: info = None if info is not None: remote_size = long(info['headers']['content-length']) remote_checksum = info['headers']['etag'].strip('"\'') if size == remote_size: checksum = calculateChecksum('', file, 0, size, self.config.send_chunk) if remote_checksum == checksum: warning("Put: size and md5sum match for %s, skipping." % uri) return else: warning("MultiPart: checksum (%s vs %s) does not match for %s, reuploading." % (remote_checksum, checksum, uri)) else: warning("MultiPart: size (%d vs %d) does not match for %s, reuploading." % (remote_size, size, uri)) headers["content-length"] = str(size) request = self.create_request("OBJECT_PUT", uri = uri, headers = headers) labels = { 'source' : filename, 'destination' : uri.uri(), 'extra' : extra_label } response = self.send_file(request, file, labels) return response def object_get(self, uri, stream, dest_name, start_position = 0, extra_label = ""): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) request = self.create_request("OBJECT_GET", uri = uri) labels = { 'source' : uri.uri(), 'destination' : dest_name, 'extra' : extra_label } response = self.recv_file(request, stream, labels, start_position) return response def object_batch_delete(self, remote_list): """ Batch delete given a remote_list """ uris = [remote_list[item]['object_uri_str'] for item in remote_list] self.object_batch_delete_uri_strs(uris) def object_batch_delete_uri_strs(self, uris): """ Batch delete given a list of object uris """ def compose_batch_del_xml(bucket, key_list): body = u"" for key in key_list: uri = S3Uri(key) if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) if not uri.has_object(): raise ValueError("URI '%s' has no object" % key) if uri.bucket() != bucket: raise ValueError("The batch should contain keys from the same bucket") object = saxutils.escape(uri.object()) body += u"%s" % object body += u"" body = encode_to_s3(body) return body batch = uris if len(batch) == 0: raise ValueError("Key list is empty") bucket = S3Uri(batch[0]).bucket() request_body = compose_batch_del_xml(bucket, batch) md5_hash = md5() md5_hash.update(request_body) headers = {'content-md5': base64.b64encode(md5_hash.digest()), 'content-type': 'application/xml'} request = self.create_request("BATCH_DELETE", bucket = bucket, extra = '?delete', headers = headers, body = request_body) response = self.send_request(request) return response def object_delete(self, uri): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) request = self.create_request("OBJECT_DELETE", uri = uri) response = self.send_request(request) return response def object_restore(self, uri): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) body = '' body += (' %s' % self.config.restore_days) body += '' request = self.create_request("OBJECT_POST", uri = uri, extra = "?restore", body = body) response = self.send_request(request) debug("Received response '%s'" % (response)) return response def _sanitize_headers(self, headers): to_remove = [ # from http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html 'date', 'content-length', 'last-modified', 'content-md5', 'x-amz-version-id', 'x-amz-delete-marker', # other headers returned from object_info() we don't want to send 'accept-ranges', 'etag', 'server', 'x-amz-id-2', 'x-amz-request-id', ] for h in to_remove + self.config.remove_headers: if h.lower() in headers: del headers[h.lower()] return headers def object_copy(self, src_uri, dst_uri, extra_headers = None): if src_uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % src_uri.type) if dst_uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % dst_uri.type) if self.config.acl_public is None: acl = self.get_acl(src_uri) headers = SortedDict(ignore_case = True) headers['x-amz-copy-source'] = encode_to_s3("/%s/%s" % (src_uri.bucket(), self.urlencode_string(src_uri.object()))) headers['x-amz-metadata-directive'] = "COPY" if self.config.acl_public: headers["x-amz-acl"] = "public-read" headers["x-amz-storage-class"] = self.storage_class() ## Set server side encryption if self.config.server_side_encryption: headers["x-amz-server-side-encryption"] = "AES256" ## Set kms headers if self.config.kms_key: headers['x-amz-server-side-encryption'] = 'aws:kms' headers['x-amz-server-side-encryption-aws-kms-key-id'] = self.config.kms_key if extra_headers: headers.update(extra_headers) request = self.create_request("OBJECT_PUT", uri = dst_uri, headers = headers) response = self.send_request(request) if response["data"] and getRootTagName(response["data"]) == "Error": #http://doc.s3.amazonaws.com/proposals/copy.html # Error during copy, status will be 200, so force error code 500 response["status"] = 500 error("Server error during the COPY operation. Overwrite response status to 500") raise S3Error(response) if self.config.acl_public is None: try: self.set_acl(dst_uri, acl) except S3Error as exc: # Ignore the exception and don't fail the copy # if the server doesn't support setting ACLs if exc.status != 501: raise exc return response def object_modify(self, src_uri, dst_uri, extra_headers = None): if src_uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % src_uri.type) if dst_uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % dst_uri.type) info_response = self.object_info(src_uri) headers = info_response['headers'] headers = self._sanitize_headers(headers) acl = self.get_acl(src_uri) headers['x-amz-copy-source'] = encode_to_s3("/%s/%s" % (src_uri.bucket(), self.urlencode_string(src_uri.object()))) headers['x-amz-metadata-directive'] = "REPLACE" # cannot change between standard and reduced redundancy with a REPLACE. ## Set server side encryption if self.config.server_side_encryption: headers["x-amz-server-side-encryption"] = "AES256" ## Set kms headers if self.config.kms_key: headers['x-amz-server-side-encryption'] = 'aws:kms' headers['x-amz-server-side-encryption-aws-kms-key-id'] = self.config.kms_key if extra_headers: headers.update(extra_headers) if self.config.mime_type: headers["content-type"] = self.config.mime_type request = self.create_request("OBJECT_PUT", uri = src_uri, headers = headers) response = self.send_request(request) if response["data"] and getRootTagName(response["data"]) == "Error": #http://doc.s3.amazonaws.com/proposals/copy.html # Error during modify, status will be 200, so force error code 500 response["status"] = 500 error("Server error during the MODIFY operation. Overwrite response status to 500") raise S3Error(response) self.set_acl(src_uri, acl) return response def object_move(self, src_uri, dst_uri, extra_headers = None): response_copy = self.object_copy(src_uri, dst_uri, extra_headers) debug("Object %s copied to %s" % (src_uri, dst_uri)) if not response_copy["data"] or getRootTagName(response_copy["data"]) == "CopyObjectResult": self.object_delete(src_uri) debug("Object %s deleted" % src_uri) return response_copy def object_info(self, uri): request = self.create_request("OBJECT_HEAD", uri = uri) response = self.send_request(request) return response def get_acl(self, uri): if uri.has_object(): request = self.create_request("OBJECT_GET", uri = uri, extra = "?acl") else: request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?acl") response = self.send_request(request) acl = ACL(response['data']) return acl def set_acl(self, uri, acl): # dreamhost doesn't support set_acl properly if 'objects.dreamhost.com' in self.config.host_base: return { 'status' : 501 } # not implemented body = str(acl) debug(u"set_acl(%s): acl-xml: %s" % (uri, body)) headers = {'content-type': 'application/xml'} if uri.has_object(): request = self.create_request("OBJECT_PUT", uri = uri, extra = "?acl", headers = headers, body = body) else: request = self.create_request("BUCKET_CREATE", bucket = uri.bucket(), extra = "?acl", headers = headers, body = body) response = self.send_request(request) return response def get_policy(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?policy") response = self.send_request(request) return response['data'] def set_policy(self, uri, policy): headers = {} # TODO check policy is proper json string headers['content-type'] = 'application/json' request = self.create_request("BUCKET_CREATE", uri = uri, extra = "?policy", headers=headers, body = policy) response = self.send_request(request) return response def delete_policy(self, uri): request = self.create_request("BUCKET_DELETE", uri = uri, extra = "?policy") debug(u"delete_policy(%s)" % uri) response = self.send_request(request) return response def get_cors(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?cors") response = self.send_request(request) return response['data'] def set_cors(self, uri, cors): headers = {} # TODO check cors is proper json string headers['content-type'] = 'application/xml' headers['content-md5'] = compute_content_md5(cors) request = self.create_request("BUCKET_CREATE", uri = uri, extra = "?cors", headers=headers, body = cors) response = self.send_request(request) return response def delete_cors(self, uri): request = self.create_request("BUCKET_DELETE", uri = uri, extra = "?cors") debug(u"delete_cors(%s)" % uri) response = self.send_request(request) return response def set_lifecycle_policy(self, uri, policy): headers = SortedDict(ignore_case = True) headers['content-md5'] = compute_content_md5(policy) request = self.create_request("BUCKET_CREATE", uri = uri, extra = "?lifecycle", headers=headers, body = policy) debug(u"set_lifecycle_policy(%s): policy-xml: %s" % (uri, policy)) response = self.send_request(request) return response def set_payer(self, uri): headers = {} headers['content-type'] = 'application/xml' body = '\n' if self.config.requester_pays: body += 'Requester\n' else: body += 'BucketOwner\n' body += '\n' request = self.create_request("BUCKET_CREATE", uri = uri, extra = "?requestPayment", body = body) response = self.send_request(request) return response def delete_lifecycle_policy(self, uri): request = self.create_request("BUCKET_DELETE", uri = uri, extra = "?lifecycle") debug(u"delete_lifecycle_policy(%s)" % uri) response = self.send_request(request) return response def get_multipart(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?uploads") response = self.send_request(request) return response def abort_multipart(self, uri, id): request = self.create_request("OBJECT_DELETE", uri=uri, extra = ("?uploadId=%s" % id)) response = self.send_request(request) return response def list_multipart(self, uri, id): request = self.create_request("OBJECT_GET", uri=uri, extra = ("?uploadId=%s" % id)) response = self.send_request(request) return response def get_accesslog(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), extra = "?logging") response = self.send_request(request) accesslog = AccessLog(response['data']) return accesslog def set_accesslog_acl(self, uri): acl = self.get_acl(uri) debug("Current ACL(%s): %s" % (uri.uri(), str(acl))) acl.appendGrantee(GranteeLogDelivery("READ_ACP")) acl.appendGrantee(GranteeLogDelivery("WRITE")) debug("Updated ACL(%s): %s" % (uri.uri(), str(acl))) self.set_acl(uri, acl) def set_accesslog(self, uri, enable, log_target_prefix_uri = None, acl_public = False): accesslog = AccessLog() if enable: accesslog.enableLogging(log_target_prefix_uri) accesslog.setAclPublic(acl_public) else: accesslog.disableLogging() body = str(accesslog) debug(u"set_accesslog(%s): accesslog-xml: %s" % (uri, body)) request = self.create_request("BUCKET_CREATE", bucket = uri.bucket(), extra = "?logging", body = body) try: response = self.send_request(request) except S3Error, e: if e.info['Code'] == "InvalidTargetBucketForLogging": info("Setting up log-delivery ACL for target bucket.") self.set_accesslog_acl(S3Uri(u"s3://%s" % log_target_prefix_uri.bucket())) response = self.send_request(request) else: raise return accesslog, response ## Low level methods def urlencode_string(self, string, urlencoding_mode = None): if type(string) == unicode: string = string.encode("utf-8") if urlencoding_mode is None: urlencoding_mode = self.config.urlencoding_mode if urlencoding_mode == "verbatim": ## Don't do any pre-processing return string encoded = quote_plus(string, safe="~/") debug("String '%s' encoded to '%s'" % (string, encoded)) return encoded def create_request(self, operation, uri = None, bucket = None, object = None, headers = None, extra = None, body = "", **params): resource = { 'bucket' : None, 'uri' : "/" } if uri and (bucket or object): raise ValueError("Both 'uri' and either 'bucket' or 'object' parameters supplied") ## If URI is given use that instead of bucket/object parameters if uri: bucket = uri.bucket() object = uri.has_object() and uri.object() or None if bucket: resource['bucket'] = str(bucket) if object: resource['uri'] = "/" + self.urlencode_string(object) if extra: resource['uri'] += extra method_string = S3.http_methods.getkey(S3.operations[operation] & S3.http_methods["MASK"]) request = S3Request(self, method_string, resource, headers, body, params) debug("CreateRequest: resource[uri]=" + resource['uri']) return request def _fail_wait(self, retries): # Wait a few seconds. The more it fails the more we wait. return (self._max_retries - retries + 1) * 3 def _http_400_handler(self, request, response, fn, *args, **kwargs): # AWS response AuthorizationHeaderMalformed means we sent the request to the wrong region # get the right region out of the response and send it there. message = 'Unknown error' if 'data' in response and len(response['data']) > 0: failureCode = getTextFromXml(response['data'], 'Code') message = getTextFromXml(response['data'], 'Message') if failureCode == 'AuthorizationHeaderMalformed': # we sent the request to the wrong region region = getTextFromXml(response['data'], 'Region') if region is not None: S3Request.region_map[request.resource['bucket']] = region info('Forwarding request to %s' % region) return fn(*args, **kwargs) else: message = u'Could not determine bucket location. Please consider using --region parameter.' elif failureCode == 'InvalidRequest': if message == 'The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.': debug(u'Endpoint requires signature v4') self.endpoint_requires_signature_v4 = True return fn(*args, **kwargs) elif failureCode == 'InvalidArgument': # returned by DreamObjects on send_request and send_file, # which doesn't support signature v4. Retry with signature v2 if not request.use_signature_v2() and not self.fallback_to_signature_v2: # have not tried with v2 yet debug(u'Falling back to signature v2') self.fallback_to_signature_v2 = True return fn(*args, **kwargs) else: # returned by DreamObjects on recv_file, which doesn't support signature v4. Retry with signature v2 if not request.use_signature_v2() and not self.fallback_to_signature_v2: # have not tried with v2 yet debug(u'Falling back to signature v2') self.fallback_to_signature_v2 = True return fn(*args, **kwargs) raise S3Error(response) def _http_403_handler(self, request, response, fn, *args, **kwargs): message = 'Unknown error' if 'data' in response and len(response['data']) > 0: failureCode = getTextFromXml(response['data'], 'Code') message = getTextFromXml(response['data'], 'Message') if failureCode == 'AccessDenied': # traditional HTTP 403 if message == 'AWS authentication requires a valid Date or x-amz-date header': # message from an Eucalyptus walrus server if not request.use_signature_v2() and not self.fallback_to_signature_v2: # have not tried with v2 yet debug(u'Falling back to signature v2') self.fallback_to_signature_v2 = True return fn(*args, **kwargs) raise S3Error(response) def send_request(self, request, retries = _max_retries): method_string, resource, headers = request.get_triplet() debug("Processing request, please wait...") try: conn = ConnMan.get(self.get_hostname(resource['bucket'])) uri = self.format_uri(resource) debug("Sending request method_string=%r, uri=%r, headers=%r, body=(%i bytes)" % (method_string, uri, headers, len(request.body or ""))) conn.c.request(method_string, uri, request.body, headers) response = {} http_response = conn.c.getresponse() response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertTupleListToDict(http_response.getheaders()) response["data"] = http_response.read() if response["headers"].has_key("x-amz-meta-s3cmd-attrs"): attrs = parse_attrs_header(response["headers"]["x-amz-meta-s3cmd-attrs"]) response["s3cmd-attrs"] = attrs debug("Response: " + str(response)) ConnMan.put(conn) except ParameterError, e: raise except OSError: raise except CertificateError: raise except (IOError, Exception), e: if hasattr(e, 'errno') and e.errno != errno.EPIPE: raise # close the connection and re-establish conn.counter = ConnMan.conn_max_counter ConnMan.put(conn) if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(request, retries - 1) else: raise S3RequestError("Request failed for: %s" % resource['uri']) if response["status"] == 400: return self._http_400_handler(request, response, self.send_request, request) if response["status"] == 403: return self._http_403_handler(request, response, self.send_request, request) if response["status"] == 405: # Method Not Allowed. Don't retry. raise S3Error(response) if response["status"] == 307: ## RedirectPermanent redir_bucket = getTextFromXml(response['data'], ".//Bucket") redir_hostname = getTextFromXml(response['data'], ".//Endpoint") self.set_hostname(redir_bucket, redir_hostname) info("Redirected to: %s" % (redir_hostname)) return self.send_request(request) if response["status"] >= 500: e = S3Error(response) if response["status"] == 501: ## NotImplemented server error - no need to retry retries = 0 if retries: warning(u"Retrying failed request: %s" % resource['uri']) warning(unicode(e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(request, retries - 1) else: raise e if response["status"] < 200 or response["status"] > 299: raise S3Error(response) return response def send_file(self, request, file, labels, buffer = '', throttle = 0, retries = _max_retries, offset = 0, chunk_size = -1): method_string, resource, headers = request.get_triplet() if S3Request.region_map.get(request.resource['bucket'], Config().bucket_location) is None: s3_uri = S3Uri(u's3://' + request.resource['bucket']) region = self.get_bucket_location(s3_uri) if region is not None: S3Request.region_map[request.resource['bucket']] = region size_left = size_total = long(headers["content-length"]) filename = unicodise(file.name) if self.config.progress_meter: labels[u'action'] = u'upload' progress = self.config.progress_class(labels, size_total) else: info("Sending file '%s', please wait..." % filename) timestamp_start = time.time() if buffer: sha256_hash = checksum_sha256_buffer(buffer, offset, size_total) else: sha256_hash = checksum_sha256_file(filename, offset, size_total) request.body = sha256_hash method_string, resource, headers = request.get_triplet() try: conn = ConnMan.get(self.get_hostname(resource['bucket'])) conn.c.putrequest(method_string, self.format_uri(resource)) for header in headers.keys(): conn.c.putheader(header, str(headers[header])) conn.c.endheaders() except ParameterError, e: raise except Exception, e: if self.config.progress_meter: progress.done("failed") if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.send_file(request, file, labels, buffer, throttle, retries - 1, offset, chunk_size) else: raise S3UploadError("Upload failed for: %s" % resource['uri']) if buffer == '': file.seek(offset) md5_hash = md5() try: while (size_left > 0): #debug("SendFile: Reading up to %d bytes from '%s' - remaining bytes: %s" % (self.config.send_chunk, filename, size_left)) l = min(self.config.send_chunk, size_left) if buffer == '': data = file.read(l) else: data = buffer if self.config.limitrate > 0: start_time = time.time() md5_hash.update(data) conn.c.send(data) if self.config.progress_meter: progress.update(delta_position = len(data)) size_left -= len(data) #throttle if self.config.limitrate > 0: real_duration = time.time() - start_time expected_duration = float(l)/self.config.limitrate throttle = max(expected_duration - real_duration, throttle) if throttle: time.sleep(throttle) md5_computed = md5_hash.hexdigest() response = {} http_response = conn.c.getresponse() response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertTupleListToDict(http_response.getheaders()) response["data"] = http_response.read() response["size"] = size_total ConnMan.put(conn) debug(u"Response: %s" % response) except ParameterError, e: raise except Exception, e: if self.config.progress_meter: progress.done("failed") if retries: if retries < self._max_retries: throttle = throttle and throttle * 5 or 0.01 warning("Upload failed: %s (%s)" % (resource['uri'], e)) warning("Retrying on lower speed (throttle=%0.2f)" % throttle) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.send_file(request, file, labels, buffer, throttle, retries - 1, offset, chunk_size) else: debug("Giving up on '%s' %s" % (filename, e)) raise S3UploadError("Upload failed for: %s" % resource['uri']) timestamp_end = time.time() response["elapsed"] = timestamp_end - timestamp_start response["speed"] = response["elapsed"] and float(response["size"]) / response["elapsed"] or float(-1) if self.config.progress_meter: ## Finalising the upload takes some time -> update() progress meter ## to correct the average speed. Otherwise people will complain that ## 'progress' and response["speed"] are inconsistent ;-) progress.update() progress.done("done") if response["status"] == 307: ## RedirectPermanent redir_bucket = getTextFromXml(response['data'], ".//Bucket") redir_hostname = getTextFromXml(response['data'], ".//Endpoint") self.set_hostname(redir_bucket, redir_hostname) info("Redirected to: %s" % (redir_hostname)) return self.send_file(request, file, labels, buffer, offset = offset, chunk_size = chunk_size) if response["status"] == 400: return self._http_400_handler(request, response, self.send_file, request, file, labels, buffer, offset = offset, chunk_size = chunk_size) if response["status"] == 403: return self._http_403_handler(request, response, self.send_file, request, file, labels, buffer, offset = offset, chunk_size = chunk_size) # S3 from time to time doesn't send ETag back in a response :-( # Force re-upload here. if not response['headers'].has_key('etag'): response['headers']['etag'] = '' if response["status"] < 200 or response["status"] > 299: try_retry = False if response["status"] >= 500: ## AWS internal error - retry try_retry = True elif response["status"] >= 400: err = S3Error(response) ## Retriable client error? if err.code in [ 'BadDigest', 'OperationAborted', 'TokenRefreshRequired', 'RequestTimeout' ]: try_retry = True if try_retry: if retries: warning("Upload failed: %s (%s)" % (resource['uri'], S3Error(response))) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_file(request, file, labels, buffer, throttle, retries - 1, offset, chunk_size) else: warning("Too many failures. Giving up on '%s'" % (filename)) raise S3UploadError ## Non-recoverable error raise S3Error(response) debug("MD5 sums: computed=%s, received=%s" % (md5_computed, response["headers"]["etag"])) ## when using KMS encryption, MD5 etag value will not match if (response["headers"]["etag"].strip('"\'') != md5_hash.hexdigest()) and response["headers"].get("x-amz-server-side-encryption") != 'aws:kms': warning("MD5 Sums don't match!") if retries: warning("Retrying upload of %s" % (filename)) return self.send_file(request, file, labels, buffer, throttle, retries - 1, offset, chunk_size) else: warning("Too many failures. Giving up on '%s'" % (filename)) raise S3UploadError return response def send_file_multipart(self, file, headers, uri, size, extra_label = ""): timestamp_start = time.time() upload = MultiPartUpload(self, file, uri, headers) upload.upload_all_parts(extra_label) response = upload.complete_multipart_upload() timestamp_end = time.time() response["elapsed"] = timestamp_end - timestamp_start response["size"] = size response["speed"] = response["elapsed"] and float(response["size"]) / response["elapsed"] or float(-1) if response["data"] and getRootTagName(response["data"]) == "Error": #http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html # Error Complete Multipart UPLOAD, status may be 200 # raise S3UploadError raise S3UploadError(getTextFromXml(response["data"], 'Message')) return response def recv_file(self, request, stream, labels, start_position = 0, retries = _max_retries): method_string, resource, headers = request.get_triplet() filename = unicodise(stream.name) if self.config.progress_meter: labels[u'action'] = u'download' progress = self.config.progress_class(labels, 0) else: info("Receiving file '%s', please wait..." % filename) timestamp_start = time.time() try: conn = ConnMan.get(self.get_hostname(resource['bucket'])) conn.c.putrequest(method_string, self.format_uri(resource)) for header in headers.keys(): conn.c.putheader(header, str(headers[header])) if start_position > 0: debug("Requesting Range: %d .. end" % start_position) conn.c.putheader("Range", "bytes=%d-" % start_position) conn.c.endheaders() response = {} http_response = conn.c.getresponse() response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertTupleListToDict(http_response.getheaders()) if response["headers"].has_key("x-amz-meta-s3cmd-attrs"): attrs = parse_attrs_header(response["headers"]["x-amz-meta-s3cmd-attrs"]) response["s3cmd-attrs"] = attrs debug("Response: %s" % response) except ParameterError, e: raise except OSError, e: raise except (IOError, Exception), e: if self.config.progress_meter: progress.done("failed") if hasattr(e, 'errno') and e.errno != errno.EPIPE: raise # close the connection and re-establish conn.counter = ConnMan.conn_max_counter ConnMan.put(conn) if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.recv_file(request, stream, labels, start_position, retries - 1) else: raise S3DownloadError("Download failed for: %s" % resource['uri']) if response["status"] == 307: ## RedirectPermanent response['data'] = http_response.read() redir_bucket = getTextFromXml(response['data'], ".//Bucket") redir_hostname = getTextFromXml(response['data'], ".//Endpoint") self.set_hostname(redir_bucket, redir_hostname) info("Redirected to: %s" % (redir_hostname)) return self.recv_file(request, stream, labels, start_position) if response["status"] == 400: return self._http_400_handler(request, response, self.recv_file, request, stream, labels, start_position) if response["status"] == 403: return self._http_403_handler(request, response, self.recv_file, request, stream, labels, start_position) if response["status"] == 405: # Method Not Allowed. Don't retry. raise S3Error(response) if response["status"] < 200 or response["status"] > 299: raise S3Error(response) if start_position == 0: # Only compute MD5 on the fly if we're downloading from beginning # Otherwise we'd get a nonsense. md5_hash = md5() size_left = long(response["headers"]["content-length"]) size_total = start_position + size_left current_position = start_position if self.config.progress_meter: progress.total_size = size_total progress.initial_position = current_position progress.current_position = current_position try: # Fix for issue #432. Even when content size is 0, httplib expect the response to be read. if size_left == 0: data = http_response.read(1) # It is not supposed to be some data returned in that case assert(len(data) == 0) while (current_position < size_total): this_chunk = size_left > self.config.recv_chunk and self.config.recv_chunk or size_left if self.config.limitrate > 0: start_time = time.time() data = http_response.read(this_chunk) if len(data) == 0: raise S3ResponseError("EOF from S3!") #throttle if self.config.limitrate > 0: real_duration = time.time() - start_time expected_duration = float(this_chunk)/self.config.limitrate if expected_duration > real_duration: time.sleep(expected_duration - real_duration) stream.write(data) if start_position == 0: md5_hash.update(data) current_position += len(data) ## Call progress meter from here... if self.config.progress_meter: progress.update(delta_position = len(data)) ConnMan.put(conn) except OSError: raise except (IOError, Exception), e: if self.config.progress_meter: progress.done("failed") if hasattr(e, 'errno') and e.errno != errno.EPIPE: raise # close the connection and re-establish conn.counter = ConnMan.conn_max_counter ConnMan.put(conn) if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.recv_file(request, stream, labels, current_position, retries - 1) else: raise S3DownloadError("Download failed for: %s" % resource['uri']) stream.flush() timestamp_end = time.time() if self.config.progress_meter: ## The above stream.flush() may take some time -> update() progress meter ## to correct the average speed. Otherwise people will complain that ## 'progress' and response["speed"] are inconsistent ;-) progress.update() progress.done("done") md5_from_s3 = response["headers"]["etag"].strip('"') if not 'x-amz-meta-s3tools-gpgenc' in response["headers"]: # we can't trust our stored md5 because we # encrypted the file after calculating it but before # uploading it. try: md5_from_s3 = response["s3cmd-attrs"]["md5"] except KeyError: pass # we must have something to compare against to bother with the calculation if '-' not in md5_from_s3: if start_position == 0: # Only compute MD5 on the fly if we were downloading from the beginning response["md5"] = md5_hash.hexdigest() else: # Otherwise try to compute MD5 of the output file try: response["md5"] = hash_file_md5(filename) except IOError, e: if e.errno != errno.ENOENT: warning("Unable to open file: %s: %s" % (filename, e)) warning("Unable to verify MD5. Assume it matches.") response["md5match"] = response.get("md5") == md5_from_s3 response["elapsed"] = timestamp_end - timestamp_start response["size"] = current_position response["speed"] = response["elapsed"] and float(response["size"]) / response["elapsed"] or float(-1) if response["size"] != start_position + long(response["headers"]["content-length"]): warning("Reported size (%s) does not match received size (%s)" % ( start_position + long(response["headers"]["content-length"]), response["size"])) debug("ReceiveFile: Computed MD5 = %s" % response.get("md5")) # avoid ETags from multipart uploads that aren't the real md5 if ('-' not in md5_from_s3 and not response["md5match"]) and (response["headers"].get("x-amz-server-side-encryption") != 'aws:kms'): warning("MD5 signatures do not match: computed=%s, received=%s" % ( response.get("md5"), md5_from_s3)) return response __all__.append("S3") def parse_attrs_header(attrs_header): attrs = {} for attr in attrs_header.split("/"): key, val = attr.split(":") attrs[key] = val return attrs def compute_content_md5(body): m = md5(body) base64md5 = base64.encodestring(m.digest()) if base64md5[-1] == '\n': base64md5 = base64md5[0:-1] return base64md5 # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/BidirMap.py0000664000175000017500000000216412647745544016213 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors class BidirMap(object): def __init__(self, **map): self.k2v = {} self.v2k = {} for key in map: self.__setitem__(key, map[key]) def __setitem__(self, key, value): if self.v2k.has_key(value): if self.v2k[value] != key: raise KeyError("Value '"+str(value)+"' already in use with key '"+str(self.v2k[value])+"'") try: del(self.v2k[self.k2v[key]]) except KeyError: pass self.k2v[key] = value self.v2k[value] = key def __getitem__(self, key): return self.k2v[key] def __str__(self): return self.v2k.__str__() def getkey(self, value): return self.v2k[value] def getvalue(self, key): return self.k2v[key] def keys(self): return [key for key in self.k2v] def values(self): return [value for value in self.v2k] # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/Exceptions.py0000664000175000017500000001005612647745544016644 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager - Exceptions library ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors from Utils import getTreeFromXml, unicodise, deunicodise from logging import debug, error import ExitCodes try: from xml.etree.ElementTree import ParseError as XmlParseError except ImportError: # ParseError was only added in python2.7, before ET was raising ExpatError from xml.parsers.expat import ExpatError as XmlParseError class S3Exception(Exception): def __init__(self, message = ""): self.message = unicodise(message) def __str__(self): ## Call unicode(self) instead of self.message because ## __unicode__() method could be overridden in subclasses! return deunicodise(unicode(self)) def __unicode__(self): return self.message ## (Base)Exception.message has been deprecated in Python 2.6 def _get_message(self): return self._message def _set_message(self, message): self._message = message message = property(_get_message, _set_message) class S3Error (S3Exception): def __init__(self, response): self.status = response["status"] self.reason = response["reason"] self.info = { "Code" : "", "Message" : "", "Resource" : "" } debug("S3Error: %s (%s)" % (self.status, self.reason)) if response.has_key("headers"): for header in response["headers"]: debug("HttpHeader: %s: %s" % (header, response["headers"][header])) if response.has_key("data") and response["data"]: try: tree = getTreeFromXml(response["data"]) except XmlParseError: debug("Not an XML response") else: try: self.info.update(self.parse_error_xml(tree)) except Exception, e: error("Error parsing xml: %s. ErrorXML: %s" % (e, response["data"])) self.code = self.info["Code"] self.message = self.info["Message"] self.resource = self.info["Resource"] def __unicode__(self): retval = u"%d " % (self.status) retval += (u"(%s)" % (self.info.has_key("Code") and self.info["Code"] or self.reason)) error_msg = self.info.get("Message") if error_msg: retval += (u": %s" % error_msg) return retval def get_error_code(self): if self.status in [301, 307]: return ExitCodes.EX_SERVERMOVED elif self.status in [400, 405, 411, 416, 501, 504]: return ExitCodes.EX_SERVERERROR elif self.status == 403: return ExitCodes.EX_ACCESSDENIED elif self.status == 404: return ExitCodes.EX_NOTFOUND elif self.status == 409: return ExitCodes.EX_CONFLICT elif self.status == 412: return ExitCodes.EX_PRECONDITION elif self.status == 500: return ExitCodes.EX_SOFTWARE elif self.status == 503: return ExitCodes.EX_SERVICE else: return ExitCodes.EX_SOFTWARE @staticmethod def parse_error_xml(tree): info = {} error_node = tree if not error_node.tag == "Error": error_node = tree.find(".//Error") if error_node is not None: for child in error_node.getchildren(): if child.text != "": debug("ErrorXML: " + child.tag + ": " + repr(child.text)) info[child.tag] = child.text else: raise S3ResponseError("Malformed error XML returned from remote server.") return info class CloudFrontError(S3Error): pass class S3UploadError(S3Exception): pass class S3DownloadError(S3Exception): pass class S3RequestError(S3Exception): pass class S3ResponseError(S3Exception): pass class InvalidFileError(S3Exception): pass class ParameterError(S3Exception): pass # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/FileDict.py0000664000175000017500000000461612647745544016213 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import logging from SortedDict import SortedDict import Utils import Config zero_length_md5 = "d41d8cd98f00b204e9800998ecf8427e" cfg = Config.Config() class FileDict(SortedDict): def __init__(self, mapping = {}, ignore_case = True, **kwargs): SortedDict.__init__(self, mapping = mapping, ignore_case = ignore_case, **kwargs) self.hardlinks = dict() # { dev: { inode : {'md5':, 'relative_files':}}} self.by_md5 = dict() # {md5: set(relative_files)} def record_md5(self, relative_file, md5): if md5 is None: return if md5 == zero_length_md5: return if md5 not in self.by_md5: self.by_md5[md5] = set() self.by_md5[md5].add(relative_file) def find_md5_one(self, md5): if md5 is None: return None try: return list(self.by_md5.get(md5, set()))[0] except: return None def get_md5(self, relative_file): """returns md5 if it can, or raises IOError if file is unreadable""" md5 = None if 'md5' in self[relative_file]: return self[relative_file]['md5'] md5 = self.get_hardlink_md5(relative_file) if md5 is None and 'md5' in cfg.sync_checks: logging.debug(u"doing file I/O to read md5 of %s" % relative_file) md5 = Utils.hash_file_md5(self[relative_file]['full_name']) self.record_md5(relative_file, md5) self[relative_file]['md5'] = md5 return md5 def record_hardlink(self, relative_file, dev, inode, md5, size): if md5 is None: return if size == 0: return # don't record 0-length files if dev == 0 or inode == 0: return # Windows if dev not in self.hardlinks: self.hardlinks[dev] = dict() if inode not in self.hardlinks[dev]: self.hardlinks[dev][inode] = dict(md5=md5, relative_files=set()) self.hardlinks[dev][inode]['relative_files'].add(relative_file) def get_hardlink_md5(self, relative_file): md5 = None try: dev = self[relative_file]['dev'] inode = self[relative_file]['inode'] md5 = self.hardlinks[dev][inode]['md5'] except KeyError: pass return md5 s3cmd-1.6.1/S3/Config.py0000664000175000017500000003525012647745544015733 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import logging from logging import debug, warning, error import re import os import sys import Progress from SortedDict import SortedDict import httplib import locale try: import json except ImportError: pass class Config(object): _instance = None _parsed_files = [] _doc = {} access_key = "" secret_key = "" access_token = "" host_base = "s3.amazonaws.com" host_bucket = "%(bucket)s.s3.amazonaws.com" kms_key = "" #can't set this and Server Side Encryption at the same time simpledb_host = "sdb.amazonaws.com" cloudfront_host = "cloudfront.amazonaws.com" verbosity = logging.WARNING progress_meter = sys.stdout.isatty() progress_class = Progress.ProgressCR send_chunk = 64 * 1024 recv_chunk = 64 * 1024 list_md5 = False long_listing = False human_readable_sizes = False extra_headers = SortedDict(ignore_case = True) force = False server_side_encryption = False enable = None get_continue = False put_continue = False upload_id = None skip_existing = False recursive = False restore_days = 1 acl_public = None acl_grants = [] acl_revokes = [] proxy_host = "" proxy_port = 3128 encrypt = False dry_run = False add_encoding_exts = "" preserve_attrs = True preserve_attrs_list = [ 'uname', # Verbose owner Name (e.g. 'root') 'uid', # Numeric user ID (e.g. 0) 'gname', # Group name (e.g. 'users') 'gid', # Numeric group ID (e.g. 100) 'atime', # Last access timestamp 'mtime', # Modification timestamp 'ctime', # Creation timestamp 'mode', # File mode (e.g. rwxr-xr-x = 755) 'md5', # File MD5 (if known) #'acl', # Full ACL (not yet supported) ] delete_removed = False delete_after = False delete_after_fetch = False max_delete = -1 _doc['delete_removed'] = "[sync] Remove remote S3 objects when local file has been deleted" delay_updates = False # OBSOLETE gpg_passphrase = "" gpg_command = "" gpg_encrypt = "%(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s" gpg_decrypt = "%(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s" use_https = True ca_certs_file = "" check_ssl_certificate = True check_ssl_hostname = True bucket_location = "US" default_mime_type = "binary/octet-stream" guess_mime_type = True use_mime_magic = True mime_type = "" enable_multipart = True multipart_chunk_size_mb = 15 # MB multipart_max_chunks = 10000 # Maximum chunks on AWS S3, could be different on other S3-compatible APIs # List of checks to be performed for 'sync' sync_checks = ['size', 'md5'] # 'weak-timestamp' # List of compiled REGEXPs exclude = [] include = [] # Dict mapping compiled REGEXPs back to their textual form debug_exclude = {} debug_include = {} encoding = locale.getpreferredencoding() or "UTF-8" urlencoding_mode = "normal" log_target_prefix = "" reduced_redundancy = False storage_class = "" follow_symlinks = False socket_timeout = 300 invalidate_on_cf = False # joseprio: new flags for default index invalidation invalidate_default_index_on_cf = False invalidate_default_index_root_on_cf = True website_index = "index.html" website_error = "" website_endpoint = "http://%(bucket)s.s3-website-%(location)s.amazonaws.com/" additional_destinations = [] files_from = [] cache_file = "" add_headers = "" remove_headers = [] expiry_days = "" expiry_date = "" expiry_prefix = "" signature_v2 = False limitrate = 0 requester_pays = False stop_on_error = False content_disposition = None content_type = None stats = False ## Creating a singleton def __new__(self, configfile = None, access_key=None, secret_key=None): if self._instance is None: self._instance = object.__new__(self) return self._instance def __init__(self, configfile = None, access_key=None, secret_key=None): if configfile: try: self.read_config_file(configfile) except IOError: if 'AWS_CREDENTIAL_FILE' in os.environ: self.env_config() # override these if passed on the command-line if access_key and secret_key: self.access_key = access_key self.secret_key = secret_key if len(self.access_key)==0: env_access_key = os.environ.get("AWS_ACCESS_KEY", None) or os.environ.get("AWS_ACCESS_KEY_ID", None) env_secret_key = os.environ.get("AWS_SECRET_KEY", None) or os.environ.get("AWS_SECRET_ACCESS_KEY", None) if env_access_key: self.access_key = env_access_key self.secret_key = env_secret_key else: self.role_config() #TODO check KMS key is valid if self.kms_key and self.server_side_encryption == True: warning('Cannot have server_side_encryption (S3 SSE) and KMS_key set (S3 KMS). KMS encryption will be used. Please set server_side_encryption to False') if self.kms_key and self.signature_v2 == True: raise Exception('KMS encryption requires signature v4. Please set signature_v2 to False') def role_config(self): if sys.version_info[0] * 10 + sys.version_info[1] < 26: error("IAM authentication requires Python 2.6 or newer") raise if not 'json' in sys.modules: error("IAM authentication not available -- missing module json") raise try: conn = httplib.HTTPConnection(host='169.254.169.254', timeout = 2) conn.request('GET', "/latest/meta-data/iam/security-credentials/") resp = conn.getresponse() files = resp.read() if resp.status == 200 and len(files)>1: conn.request('GET', "/latest/meta-data/iam/security-credentials/%s"%files) resp=conn.getresponse() if resp.status == 200: creds=json.load(resp) Config().update_option('access_key', creds['AccessKeyId'].encode('ascii')) Config().update_option('secret_key', creds['SecretAccessKey'].encode('ascii')) Config().update_option('access_token', creds['Token'].encode('ascii')) else: raise IOError else: raise IOError except: raise def role_refresh(self): try: self.role_config() except: warning("Could not refresh role") def env_config(self): cred_content = "" try: cred_file = open(os.environ['AWS_CREDENTIAL_FILE'],'r') cred_content = cred_file.read() except IOError, e: debug("Error %d accessing credentials file %s" % (e.errno,os.environ['AWS_CREDENTIAL_FILE'])) r_data = re.compile("^\s*(?P\w+)\s*=\s*(?P.*)") r_quotes = re.compile("^\"(.*)\"\s*$") if len(cred_content)>0: for line in cred_content.splitlines(): is_data = r_data.match(line) if is_data: data = is_data.groupdict() if r_quotes.match(data["value"]): data["value"] = data["value"][1:-1] if data["orig_key"] == "AWSAccessKeyId" \ or data["orig_key"] == "aws_access_key_id": data["key"] = "access_key" elif data["orig_key"]=="AWSSecretKey" \ or data["orig_key"]=="aws_secret_access_key": data["key"] = "secret_key" else: debug("env_config: key = %r will be ignored", data["orig_key"]) if "key" in data: Config().update_option(data["key"], data["value"]) if data["key"] in ("access_key", "secret_key", "gpg_passphrase"): print_value = ("%s...%d_chars...%s") % (data["value"][:2], len(data["value"]) - 3, data["value"][-1:]) else: print_value = data["value"] debug("env_Config: %s->%s" % (data["key"], print_value)) def option_list(self): retval = [] for option in dir(self): ## Skip attributes that start with underscore or are not string, int or bool option_type = type(getattr(Config, option)) if option.startswith("_") or \ not (option_type in ( type("string"), # str type(42), # int type(True))): # bool continue retval.append(option) return retval def read_config_file(self, configfile): cp = ConfigParser(configfile) for option in self.option_list(): _option = cp.get(option) if _option is not None: _option = _option.strip() self.update_option(option, _option) # allow acl_public to be set from the config file too, even though by # default it is set to None, and not present in the config file. if cp.get('acl_public'): self.update_option('acl_public', cp.get('acl_public')) if cp.get('add_headers'): for option in cp.get('add_headers').split(","): (key, value) = option.split(':') self.extra_headers[key.replace('_', '-').strip()] = value.strip() self._parsed_files.append(configfile) def dump_config(self, stream): ConfigDumper(stream).dump("default", self) def update_option(self, option, value): if value is None: return #### Handle environment reference if str(value).startswith("$"): return self.update_option(option, os.getenv(str(value)[1:])) #### Special treatment of some options ## verbosity must be known to "logging" module if option == "verbosity": # support integer verboisities try: value = int(value) except ValueError: try: # otherwise it must be a key known to the logging module value = logging._levelNames[value] except KeyError: error("Config: verbosity level '%s' is not valid" % value) return elif option == "limitrate": #convert kb,mb to bytes if value.endswith("k") or value.endswith("K"): shift = 10 elif value.endswith("m") or value.endswith("M"): shift = 20 else: shift = 0 try: value = shift and int(value[:-1]) << shift or int(value) except: error("Config: value of option %s must have suffix m, k, or nothing, not '%s'" % (option, value)) return ## allow yes/no, true/false, on/off and 1/0 for boolean options elif type(getattr(Config, option)) is type(True): # bool if str(value).lower() in ("true", "yes", "on", "1"): value = True elif str(value).lower() in ("false", "no", "off", "0"): value = False else: error("Config: value of option '%s' must be Yes or No, not '%s'" % (option, value)) return elif type(getattr(Config, option)) is type(42): # int try: value = int(value) except ValueError: error("Config: value of option '%s' must be an integer, not '%s'" % (option, value)) return setattr(Config, option, value) class ConfigParser(object): def __init__(self, file, sections = []): self.cfg = {} self.parse_file(file, sections) def parse_file(self, file, sections = []): debug("ConfigParser: Reading file '%s'" % file) if type(sections) != type([]): sections = [sections] in_our_section = True f = open(file, "r") r_comment = re.compile("^\s*#.*") r_empty = re.compile("^\s*$") r_section = re.compile("^\[([^\]]+)\]") r_data = re.compile("^\s*(?P\w+)\s*=\s*(?P.*)") r_quotes = re.compile("^\"(.*)\"\s*$") for line in f: if r_comment.match(line) or r_empty.match(line): continue is_section = r_section.match(line) if is_section: section = is_section.groups()[0] in_our_section = (section in sections) or (len(sections) == 0) continue is_data = r_data.match(line) if is_data and in_our_section: data = is_data.groupdict() if r_quotes.match(data["value"]): data["value"] = data["value"][1:-1] self.__setitem__(data["key"], data["value"]) if data["key"] in ("access_key", "secret_key", "gpg_passphrase"): print_value = ("%s...%d_chars...%s") % (data["value"][:2], len(data["value"]) - 3, data["value"][-1:]) else: print_value = data["value"] debug("ConfigParser: %s->%s" % (data["key"], print_value)) continue warning("Ignoring invalid line in '%s': %s" % (file, line)) def __getitem__(self, name): return self.cfg[name] def __setitem__(self, name, value): self.cfg[name] = value def get(self, name, default = None): if self.cfg.has_key(name): return self.cfg[name] return default class ConfigDumper(object): def __init__(self, stream): self.stream = stream def dump(self, section, config): self.stream.write("[%s]\n" % section) for option in config.option_list(): value = getattr(config, option) if option == "verbosity": # we turn level numbers back into strings if possible if isinstance(value,int) and value in logging._levelNames: value = logging._levelNames[value] self.stream.write("%s = %s\n" % (option, value)) # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/PkgInfo.py0000664000175000017500000000123112647746170016047 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors package = "s3cmd" version = "1.6.1" url = "http://s3tools.org" license = "GNU GPL v2+" short_description = "Command line tool for managing Amazon S3 and CloudFront services" long_description = """ S3cmd lets you copy files from/to Amazon S3 (Simple Storage Service) using a simple to use command line client. Supports rsync-like backup, GPG encryption, and more. Also supports management of Amazon's CloudFront content delivery network. """ # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/MultiPart.py0000664000175000017500000002162412647745544016447 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 Multipart upload support ## Author: Jerome Leclanche ## License: GPL Version 2 import os import sys from stat import ST_SIZE from logging import debug, info, warning, error from Utils import getTextFromXml, getTreeFromXml, formatSize, unicodise, deunicodise, calculateChecksum, parseNodes, encode_to_s3 class MultiPartUpload(object): MIN_CHUNK_SIZE_MB = 5 # 5MB MAX_CHUNK_SIZE_MB = 5120 # 5GB MAX_FILE_SIZE = 42949672960 # 5TB def __init__(self, s3, file, uri, headers_baseline = {}): self.s3 = s3 self.file = file self.uri = uri self.parts = {} self.headers_baseline = headers_baseline self.upload_id = self.initiate_multipart_upload() def get_parts_information(self, uri, upload_id): multipart_response = self.s3.list_multipart(uri, upload_id) tree = getTreeFromXml(multipart_response['data']) parts = dict() for elem in parseNodes(tree): try: parts[int(elem['PartNumber'])] = {'checksum': elem['ETag'], 'size': elem['Size']} except KeyError: pass return parts def get_unique_upload_id(self, uri): upload_id = None multipart_response = self.s3.get_multipart(uri) tree = getTreeFromXml(multipart_response['data']) for mpupload in parseNodes(tree): try: mp_upload_id = mpupload['UploadId'] mp_path = mpupload['Key'] info("mp_path: %s, object: %s" % (mp_path, uri.object())) if mp_path == uri.object(): if upload_id is not None: raise ValueError("More than one UploadId for URI %s. Disable multipart upload, or use\n %s multipart %s\nto list the Ids, then pass a unique --upload-id into the put command." % (uri, sys.argv[0], uri)) upload_id = mp_upload_id except KeyError: pass return upload_id def initiate_multipart_upload(self): """ Begin a multipart upload http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadInitiate.html """ if self.s3.config.upload_id is not None: self.upload_id = self.s3.config.upload_id elif self.s3.config.put_continue: self.upload_id = self.get_unique_upload_id(self.uri) else: self.upload_id = None if self.upload_id is None: request = self.s3.create_request("OBJECT_POST", uri = self.uri, headers = self.headers_baseline, extra = "?uploads") response = self.s3.send_request(request) data = response["data"] self.upload_id = getTextFromXml(data, "UploadId") return self.upload_id def upload_all_parts(self, extra_label=''): """ Execute a full multipart upload on a file Returns the seq/etag dict TODO use num_processes to thread it """ if not self.upload_id: raise RuntimeError("Attempting to use a multipart upload that has not been initiated.") self.chunk_size = self.s3.config.multipart_chunk_size_mb * 1024 * 1024 filename = unicodise(self.file.name) if filename != "": size_left = file_size = os.stat(deunicodise(filename))[ST_SIZE] nr_parts = file_size / self.chunk_size + (file_size % self.chunk_size and 1) debug("MultiPart: Uploading %s in %d parts" % (filename, nr_parts)) else: debug("MultiPart: Uploading from %s" % filename) remote_statuses = dict() if self.s3.config.put_continue: remote_statuses = self.get_parts_information(self.uri, self.upload_id) if extra_label: extra_label = u' ' + extra_label seq = 1 if filename != "": while size_left > 0: offset = self.chunk_size * (seq - 1) current_chunk_size = min(file_size - offset, self.chunk_size) size_left -= current_chunk_size labels = { 'source' : filename, 'destination' : self.uri.uri(), 'extra' : "[part %d of %d, %s]%s" % (seq, nr_parts, "%d%sB" % formatSize(current_chunk_size, human_readable = True), extra_label) } try: self.upload_part(seq, offset, current_chunk_size, labels, remote_status = remote_statuses.get(seq)) except: error(u"\nUpload of '%s' part %d failed. Use\n %s abortmp %s %s\nto abort the upload, or\n %s --upload-id %s put ...\nto continue the upload." % (filename, seq, sys.argv[0], self.uri, self.upload_id, sys.argv[0], self.upload_id)) raise seq += 1 else: while True: buffer = self.file.read(self.chunk_size) offset = 0 # send from start of the buffer current_chunk_size = len(buffer) labels = { 'source' : filename, 'destination' : self.uri.uri(), 'extra' : "[part %d, %s]" % (seq, "%d%sB" % formatSize(current_chunk_size, human_readable = True)) } if len(buffer) == 0: # EOF break try: self.upload_part(seq, offset, current_chunk_size, labels, buffer, remote_status = remote_statuses.get(seq)) except: error(u"\nUpload of '%s' part %d failed. Use\n %s abortmp %s %s\nto abort, or\n %s --upload-id %s put ...\nto continue the upload." % (filename, seq, sys.argv[0], self.uri, self.upload_id, sys.argv[0], self.upload_id)) raise seq += 1 debug("MultiPart: Upload finished: %d parts", seq - 1) def upload_part(self, seq, offset, chunk_size, labels, buffer = '', remote_status = None): """ Upload a file chunk http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadUploadPart.html """ # TODO implement Content-MD5 debug("Uploading part %i of %r (%s bytes)" % (seq, self.upload_id, chunk_size)) if remote_status is not None: if int(remote_status['size']) == chunk_size: checksum = calculateChecksum(buffer, self.file, offset, chunk_size, self.s3.config.send_chunk) remote_checksum = remote_status['checksum'].strip('"\'') if remote_checksum == checksum: warning("MultiPart: size and md5sum match for %s part %d, skipping." % (self.uri, seq)) self.parts[seq] = remote_status['checksum'] return else: warning("MultiPart: checksum (%s vs %s) does not match for %s part %d, reuploading." % (remote_checksum, checksum, self.uri, seq)) else: warning("MultiPart: size (%d vs %d) does not match for %s part %d, reuploading." % (int(remote_status['size']), chunk_size, self.uri, seq)) headers = { "content-length": str(chunk_size) } query_string = "?partNumber=%i&uploadId=%s" % (seq, encode_to_s3(self.upload_id)) request = self.s3.create_request("OBJECT_PUT", uri = self.uri, headers = headers, extra = query_string) response = self.s3.send_file(request, self.file, labels, buffer, offset = offset, chunk_size = chunk_size) self.parts[seq] = response["headers"]["etag"] return response def complete_multipart_upload(self): """ Finish a multipart upload http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadComplete.html """ debug("MultiPart: Completing upload: %s" % self.upload_id) parts_xml = [] part_xml = "%i%s" for seq, etag in self.parts.items(): parts_xml.append(part_xml % (seq, etag)) body = "%s" % ("".join(parts_xml)) headers = { "content-length": str(len(body)) } request = self.s3.create_request("OBJECT_POST", uri = self.uri, headers = headers, extra = "?uploadId=%s" % encode_to_s3(self.upload_id), body = body) response = self.s3.send_request(request) return response def abort_upload(self): """ Abort multipart upload http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadAbort.html """ debug("MultiPart: Aborting upload: %s" % self.upload_id) #request = self.s3.create_request("OBJECT_DELETE", uri = self.uri, extra = "?uploadId=%s" % (self.upload_id)) #response = self.s3.send_request(request) response = None return response # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/HashCache.py0000664000175000017500000000346412647745544016337 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- import cPickle as pickle from Utils import deunicodise class HashCache(object): def __init__(self): self.inodes = dict() def add(self, dev, inode, mtime, size, md5): if dev == 0 or inode == 0: return # Windows if dev not in self.inodes: self.inodes[dev] = dict() if inode not in self.inodes[dev]: self.inodes[dev][inode] = dict() self.inodes[dev][inode][mtime] = dict(md5=md5, size=size) def md5(self, dev, inode, mtime, size): try: d = self.inodes[dev][inode][mtime] if d['size'] != size: return None except: return None return d['md5'] def mark_all_for_purge(self): for d in self.inodes.keys(): for i in self.inodes[d].keys(): for c in self.inodes[d][i].keys(): self.inodes[d][i][c]['purge'] = True def unmark_for_purge(self, dev, inode, mtime, size): try: d = self.inodes[dev][inode][mtime] except KeyError: return if d['size'] == size and 'purge' in d: del self.inodes[dev][inode][mtime]['purge'] def purge(self): for d in self.inodes.keys(): for i in self.inodes[d].keys(): for m in self.inodes[d][i].keys(): if 'purge' in self.inodes[d][i][m]: del self.inodes[d][i] break def save(self, f): d = dict(inodes=self.inodes, version=1) f = open(deunicodise(f), 'w') pickle.dump(d, f) f.close() def load(self, f): f = open(deunicodise(f), 'r') d = pickle.load(f) f.close() if d.get('version') == 1 and 'inodes' in d: self.inodes = d['inodes'] s3cmd-1.6.1/S3/Utils.py0000664000175000017500000004120712647745544015625 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import os import sys import time import re import string import random import errno from calendar import timegm from logging import debug, warning, error from ExitCodes import EX_OSFILE try: import dateutil.parser except ImportError: sys.stderr.write(u""" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ImportError trying to import dateutil.parser. Please install the python dateutil module: $ sudo apt-get install python-dateutil or $ sudo yum install python-dateutil or $ pip install python-dateutil !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! """) sys.stderr.flush() sys.exit(EX_OSFILE) import Config import Exceptions # hashlib backported to python 2.4 / 2.5 is not compatible with hmac! if sys.version_info[0] == 2 and sys.version_info[1] < 6: from md5 import md5 else: from hashlib import md5 try: import xml.etree.ElementTree as ET except ImportError: # xml.etree.ElementTree was only added in python 2.5 import elementtree.ElementTree as ET __all__ = [] def parseNodes(nodes): ## WARNING: Ignores text nodes from mixed xml/text. ## For instance some textother text ## will be ignore "some text" node retval = [] for node in nodes: retval_item = {} for child in node.getchildren(): name = decode_from_s3(child.tag) if child.getchildren(): retval_item[name] = parseNodes([child]) else: found_text = node.findtext(".//%s" % child.tag) if found_text is not None: retval_item[name] = decode_from_s3(found_text) else: retval_item[name] = None retval.append(retval_item) return retval __all__.append("parseNodes") def stripNameSpace(xml): """ removeNameSpace(xml) -- remove top-level AWS namespace """ r = re.compile('^(]+?>\s*)(<\w+) xmlns=[\'"](http://[^\'"]+)[\'"](.*)', re.MULTILINE) if r.match(xml): xmlns = r.match(xml).groups()[2] xml = r.sub("\\1\\2\\4", xml) else: xmlns = None return xml, xmlns __all__.append("stripNameSpace") def getTreeFromXml(xml): xml, xmlns = stripNameSpace(xml) try: tree = ET.fromstring(xml) if xmlns: tree.attrib['xmlns'] = xmlns return tree except Exception, e: error("Error parsing xml: %s", e) error(xml) raise __all__.append("getTreeFromXml") def getListFromXml(xml, node): tree = getTreeFromXml(xml) nodes = tree.findall('.//%s' % (node)) return parseNodes(nodes) __all__.append("getListFromXml") def getDictFromTree(tree): ret_dict = {} for child in tree.getchildren(): if child.getchildren(): ## Complex-type child. Recurse content = getDictFromTree(child) else: content = decode_from_s3(child.text) if child.text is not None else None child_tag = decode_from_s3(child.tag) if ret_dict.has_key(child_tag): if not type(ret_dict[child_tag]) == list: ret_dict[child_tag] = [ret_dict[child_tag]] ret_dict[child_tag].append(content or "") else: ret_dict[child_tag] = content or "" return ret_dict __all__.append("getDictFromTree") def getTextFromXml(xml, xpath): tree = getTreeFromXml(xml) if tree.tag.endswith(xpath): return decode_from_s3(tree.text) if tree.text is not None else None else: result = tree.findtext(xpath) return decode_from_s3(result) if result is not None else None __all__.append("getTextFromXml") def getRootTagName(xml): tree = getTreeFromXml(xml) return decode_from_s3(tree.tag) if tree.tag is not None else None __all__.append("getRootTagName") def xmlTextNode(tag_name, text): el = ET.Element(tag_name) el.text = decode_from_s3(text) return el __all__.append("xmlTextNode") def appendXmlTextNode(tag_name, text, parent): """ Creates a new Node and sets its content to 'text'. Then appends the created Node to 'parent' element if given. Returns the newly created Node. """ el = xmlTextNode(tag_name, text) parent.append(el) return el __all__.append("appendXmlTextNode") def dateS3toPython(date): # Reset milliseconds to 000 date = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)').sub(".000", date) return dateutil.parser.parse(date, fuzzy=True) __all__.append("dateS3toPython") def dateS3toUnix(date): ## NOTE: This is timezone-aware and return the timestamp regarding GMT return timegm(dateS3toPython(date).utctimetuple()) __all__.append("dateS3toUnix") def dateRFC822toPython(date): return dateutil.parser.parse(date, fuzzy=True) __all__.append("dateRFC822toPython") def dateRFC822toUnix(date): return timegm(dateRFC822toPython(date).utctimetuple()) __all__.append("dateRFC822toUnix") def formatSize(size, human_readable = False, floating_point = False): size = floating_point and float(size) or int(size) if human_readable: coeffs = ['k', 'M', 'G', 'T'] coeff = "" while size > 2048: size /= 1024 coeff = coeffs.pop(0) return (size, coeff) else: return (size, "") __all__.append("formatSize") def formatDateTime(s3timestamp): date_obj = dateutil.parser.parse(s3timestamp, fuzzy=True) return date_obj.strftime("%Y-%m-%d %H:%M") __all__.append("formatDateTime") def convertTupleListToDict(list): retval = {} for tuple in list: retval[tuple[0]] = tuple[1] return retval __all__.append("convertTupleListToDict") _rnd_chars = string.ascii_letters+string.digits _rnd_chars_len = len(_rnd_chars) def rndstr(len): retval = "" while len > 0: retval += _rnd_chars[random.randint(0, _rnd_chars_len-1)] len -= 1 return retval __all__.append("rndstr") def mktmpsomething(prefix, randchars, createfunc): old_umask = os.umask(0077) tries = 5 while tries > 0: dirname = prefix + rndstr(randchars) try: createfunc(dirname) break except OSError, e: if e.errno != errno.EEXIST: os.umask(old_umask) raise tries -= 1 os.umask(old_umask) return dirname __all__.append("mktmpsomething") def mktmpdir(prefix = os.getenv('TMP','/tmp') + "/tmpdir-", randchars = 10): return mktmpsomething(prefix, randchars, os.mkdir) __all__.append("mktmpdir") def mktmpfile(prefix = os.getenv('TMP','/tmp') + "/tmpfile-", randchars = 20): createfunc = lambda filename : os.close(os.open(deunicodise(filename), os.O_CREAT | os.O_EXCL)) return mktmpsomething(prefix, randchars, createfunc) __all__.append("mktmpfile") def hash_file_md5(filename): h = md5() f = open(deunicodise(filename), "rb") while True: # Hash 32kB chunks data = f.read(32*1024) if not data: break h.update(data) f.close() return h.hexdigest() __all__.append("hash_file_md5") def mkdir_with_parents(dir_name): """ mkdir_with_parents(dst_dir) Create directory 'dir_name' with all parent directories Returns True on success, False otherwise. """ pathmembers = dir_name.split(os.sep) tmp_stack = [] while pathmembers and not os.path.isdir(deunicodise(os.sep.join(pathmembers))): tmp_stack.append(pathmembers.pop()) while tmp_stack: pathmembers.append(tmp_stack.pop()) cur_dir = os.sep.join(pathmembers) try: debug("mkdir(%s)" % cur_dir) os.mkdir(deunicodise(cur_dir)) except (OSError, IOError), e: debug("Can not make directory '%s' (Reason: %s)" % (cur_dir, e.strerror)) return False except Exception, e: debug("Can not make directory '%s' (Reason: %s)" % (cur_dir, e)) return False return True __all__.append("mkdir_with_parents") def unicodise(string, encoding = None, errors = "replace"): """ Convert 'string' to Unicode or raise an exception. """ if not encoding: encoding = Config.Config().encoding if type(string) == unicode: return string debug("Unicodising %r using %s" % (string, encoding)) try: return unicode(string, encoding, errors) except UnicodeDecodeError: raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) __all__.append("unicodise") def deunicodise(string, encoding = None, errors = "replace"): """ Convert unicode 'string' to , by default replacing all invalid characters with '?' or raise an exception. """ if not encoding: encoding = Config.Config().encoding if type(string) != unicode: return str(string) debug("DeUnicodising %r using %s" % (string, encoding)) try: return string.encode(encoding, errors) except UnicodeEncodeError: raise UnicodeEncodeError("Conversion from unicode failed: %r" % string) __all__.append("deunicodise") def unicodise_safe(string, encoding = None): """ Convert 'string' to Unicode according to current encoding and replace all invalid characters with '?' """ return unicodise(deunicodise(string, encoding), encoding).replace(u'\ufffd', '?') __all__.append("unicodise_safe") def decode_from_s3(string, errors = "replace"): """ Convert S3 UTF-8 'string' to Unicode or raise an exception. """ if type(string) == unicode: return string # Be quiet by default #debug("Decoding string from S3: %r" % string) try: return unicode(string, "UTF-8", errors) except UnicodeDecodeError: raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) __all__.append("decode_from_s3") def encode_to_s3(string, errors = "replace"): """ Convert Unicode to S3 UTF-8 'string', by default replacing all invalid characters with '?' or raise an exception. """ if type(string) != unicode: return str(string) # Be quiet by default #debug("Encoding string to S3: %r" % string) try: return string.encode("UTF-8", errors) except UnicodeEncodeError: raise UnicodeEncodeError("Conversion from unicode failed: %r" % string) __all__.append("encode_to_s3") def replace_nonprintables(string): """ replace_nonprintables(string) Replaces all non-printable characters 'ch' in 'string' where ord(ch) <= 26 with ^@, ^A, ... ^Z """ new_string = "" modified = 0 for c in string: o = ord(c) if (o <= 31): new_string += "^" + chr(ord('@') + o) modified += 1 elif (o == 127): new_string += "^?" modified += 1 else: new_string += c if modified and Config.Config().urlencoding_mode != "fixbucket": warning("%d non-printable characters replaced in: %s" % (modified, new_string)) return new_string __all__.append("replace_nonprintables") def time_to_epoch(t): """Convert time specified in a variety of forms into UNIX epoch time. Accepts datetime.datetime, int, anything that has a strftime() method, and standard time 9-tuples """ if isinstance(t, int): # Already an int return t elif isinstance(t, tuple) or isinstance(t, time.struct_time): # Assume it's a time 9-tuple return int(time.mktime(t)) elif hasattr(t, 'timetuple'): # Looks like a datetime object or compatible return int(time.mktime(t.timetuple())) elif hasattr(t, 'strftime'): # Looks like the object supports standard srftime() return int(t.strftime('%s')) elif isinstance(t, str) or isinstance(t, unicode): # See if it's a string representation of an epoch try: # Support relative times (eg. "+60") if t.startswith('+'): return time.time() + int(t[1:]) return int(t) except ValueError: # Try to parse it as a timestamp string try: return time.strptime(t) except ValueError, ex: # Will fall through debug("Failed to parse date with strptime: %s", ex) pass raise Exceptions.ParameterError('Unable to convert %r to an epoch time. Pass an epoch time. Try `date -d \'now + 1 year\' +%%s` (shell) or time.mktime (Python).' % t) def check_bucket_name(bucket, dns_strict = True): if dns_strict: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE) if invalid: raise Exceptions.ParameterError("Bucket name '%s' contains disallowed character '%s'. The only supported ones are: lowercase us-ascii letters (a-z), digits (0-9), dot (.) and hyphen (-)." % (bucket, invalid.groups()[0])) else: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE) if invalid: raise Exceptions.ParameterError("Bucket name '%s' contains disallowed character '%s'. The only supported ones are: us-ascii letters (a-z, A-Z), digits (0-9), dot (.), hyphen (-) and underscore (_)." % (bucket, invalid.groups()[0])) if len(bucket) < 3: raise Exceptions.ParameterError("Bucket name '%s' is too short (min 3 characters)" % bucket) if len(bucket) > 255: raise Exceptions.ParameterError("Bucket name '%s' is too long (max 255 characters)" % bucket) if dns_strict: if len(bucket) > 63: raise Exceptions.ParameterError("Bucket name '%s' is too long (max 63 characters)" % bucket) if re.search("-\.", bucket, re.UNICODE): raise Exceptions.ParameterError("Bucket name '%s' must not contain sequence '-.' for DNS compatibility" % bucket) if re.search("\.\.", bucket, re.UNICODE): raise Exceptions.ParameterError("Bucket name '%s' must not contain sequence '..' for DNS compatibility" % bucket) if not re.search("^[0-9a-z]", bucket, re.UNICODE): raise Exceptions.ParameterError("Bucket name '%s' must start with a letter or a digit" % bucket) if not re.search("[0-9a-z]$", bucket, re.UNICODE): raise Exceptions.ParameterError("Bucket name '%s' must end with a letter or a digit" % bucket) return True __all__.append("check_bucket_name") def check_bucket_name_dns_conformity(bucket): try: return check_bucket_name(bucket, dns_strict = True) except Exceptions.ParameterError: return False __all__.append("check_bucket_name_dns_conformity") def check_bucket_name_dns_support(bucket_host, bucket_name): """ Check whether either the host_bucket support buckets and either bucket name is dns compatible """ if "%(bucket)s" not in bucket_host: return False try: return check_bucket_name(bucket_name, dns_strict = True) except Exceptions.ParameterError: return False __all__.append("check_bucket_name_dns_support") def getBucketFromHostname(hostname): """ bucket, success = getBucketFromHostname(hostname) Only works for hostnames derived from bucket names using Config.host_bucket pattern. Returns bucket name and a boolean success flag. """ # Create RE pattern from Config.host_bucket pattern = Config.Config().host_bucket % { 'bucket' : '(?P.*)' } m = re.match(pattern, hostname, re.UNICODE) if not m: return (hostname, False) return m.groups()[0], True __all__.append("getBucketFromHostname") def getHostnameFromBucket(bucket): return Config.Config().host_bucket % { 'bucket' : bucket } __all__.append("getHostnameFromBucket") def calculateChecksum(buffer, mfile, offset, chunk_size, send_chunk): md5_hash = md5() size_left = chunk_size if buffer == '': mfile.seek(offset) while size_left > 0: data = mfile.read(min(send_chunk, size_left)) md5_hash.update(data) size_left -= len(data) else: md5_hash.update(buffer) return md5_hash.hexdigest() __all__.append("calculateChecksum") # Deal with the fact that pwd and grp modules don't exist for Windows try: import pwd def getpwuid_username(uid): """returns a username from the password databse for the given uid""" return pwd.getpwuid(uid).pw_name except ImportError: import getpass def getpwuid_username(uid): return getpass.getuser() __all__.append("getpwuid_username") try: import grp def getgrgid_grpname(gid): """returns a groupname from the group databse for the given gid""" return grp.getgrgid(gid).gr_name except ImportError: def getgrgid_grpname(gid): return "nobody" __all__.append("getgrgid_grpname") # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/ExitCodes.py0000664000175000017500000000412512647745544016412 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- # patterned on /usr/include/sysexits.h EX_OK = 0 EX_GENERAL = 1 EX_PARTIAL = 2 # some parts of the command succeeded, while others failed EX_SERVERMOVED = 10 # 301: Moved permanantly & 307: Moved temp EX_SERVERERROR = 11 # 400, 405, 411, 416, 501: Bad request, 504: Gateway Time-out EX_NOTFOUND = 12 # 404: Not found EX_CONFLICT = 13 # 409: Conflict (ex: bucket error) EX_PRECONDITION = 14 # 412: Precondition failed EX_SERVICE = 15 # 503: Service not available or slow down EX_USAGE = 64 # The command was used incorrectly (e.g. bad command line syntax) EX_DATAERR = 65 # Failed file transfer, upload or download EX_SOFTWARE = 70 # internal software error (e.g. S3 error of unknown specificity) EX_OSERR = 71 # system error (e.g. out of memory) EX_OSFILE = 72 # OS error (e.g. invalid Python version) EX_IOERR = 74 # An error occurred while doing I/O on some file. EX_TEMPFAIL = 75 # temporary failure (S3DownloadError or similar, retry later) EX_ACCESSDENIED = 77 # Insufficient permissions to perform the operation on S3 EX_CONFIG = 78 # Configuration file error _EX_SIGNAL = 128 _EX_SIGINT = 2 EX_BREAK = _EX_SIGNAL + _EX_SIGINT # Control-C (KeyboardInterrupt raised) class ExitScoreboard(object): """Helper to return best return code""" def __init__(self): self._success = 0 self._notfound = 0 self._failed = 0 def success(self): self._success += 1 def notfound(self): self._notfound += 1 def failed(self): self._failed += 1 def rc(self): if self._success: if not self._failed and not self._notfound: return EX_OK elif self._failed: return EX_PARTIAL else: if self._failed: return EX_GENERAL else: if self._notfound: return EX_NOTFOUND return EX_GENERAL s3cmd-1.6.1/S3/Progress.py0000664000175000017500000001762412647745544016337 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import sys import datetime import time import Utils class Progress(object): _stdout = sys.stdout _last_display = 0 def __init__(self, labels, total_size): self._stdout = sys.stdout self.new_file(labels, total_size) def new_file(self, labels, total_size): self.labels = labels self.total_size = total_size # Set initial_position to something in the # case we're not counting from 0. For instance # when appending to a partially downloaded file. # Setting initial_position will let the speed # be computed right. self.initial_position = 0 self.current_position = self.initial_position self.time_start = datetime.datetime.now() self.time_last = self.time_start self.time_current = self.time_start self.display(new_file = True) def update(self, current_position = -1, delta_position = -1): self.time_last = self.time_current self.time_current = datetime.datetime.now() if current_position > -1: self.current_position = current_position elif delta_position > -1: self.current_position += delta_position #else: # no update, just call display() self.display() def done(self, message): self.display(done_message = message) def output_labels(self): self._stdout.write(u"%(action)s: '%(source)s' -> '%(destination)s' %(extra)s\n" % self.labels) self._stdout.flush() def _display_needed(self): # We only need to update the display every so often. if time.time() - self._last_display > 1: self._last_display = time.time() return True return False def display(self, new_file = False, done_message = None): """ display(new_file = False[/True], done = False[/True]) Override this method to provide a nicer output. """ if new_file: self.output_labels() self.last_milestone = 0 return if self.current_position == self.total_size: print_size = Utils.formatSize(self.current_position, True) if print_size[1] != "": print_size[1] += "B" timedelta = self.time_current - self.time_start sec_elapsed = timedelta.days * 86400 + timedelta.seconds + float(timedelta.microseconds)/1000000.0 print_speed = Utils.formatSize((self.current_position - self.initial_position) / sec_elapsed, True, True) self._stdout.write("100%% %s%s in %.2fs (%.2f %sB/s)\n" % (print_size[0], print_size[1], sec_elapsed, print_speed[0], print_speed[1])) self._stdout.flush() return rel_position = self.current_position * 100 / self.total_size if rel_position >= self.last_milestone: self.last_milestone = (int(rel_position) / 5) * 5 self._stdout.write("%d%% ", self.last_milestone) self._stdout.flush() return class ProgressANSI(Progress): ## http://en.wikipedia.org/wiki/ANSI_escape_code SCI = '\x1b[' ANSI_hide_cursor = SCI + "?25l" ANSI_show_cursor = SCI + "?25h" ANSI_save_cursor_pos = SCI + "s" ANSI_restore_cursor_pos = SCI + "u" ANSI_move_cursor_to_column = SCI + "%uG" ANSI_erase_to_eol = SCI + "0K" ANSI_erase_current_line = SCI + "2K" def display(self, new_file = False, done_message = None): """ display(new_file = False[/True], done_message = None) """ if new_file: self.output_labels() self._stdout.write(self.ANSI_save_cursor_pos) self._stdout.flush() return # Only display progress every so often if not (new_file or done_message) and not self._display_needed(): return timedelta = self.time_current - self.time_start sec_elapsed = timedelta.days * 86400 + timedelta.seconds + float(timedelta.microseconds)/1000000.0 if (sec_elapsed > 0): print_speed = Utils.formatSize((self.current_position - self.initial_position) / sec_elapsed, True, True) else: print_speed = (0, "") self._stdout.write(self.ANSI_restore_cursor_pos) self._stdout.write(self.ANSI_erase_to_eol) self._stdout.write("%(current)s of %(total)s %(percent)3d%% in %(elapsed)ds %(speed).2f %(speed_coeff)sB/s" % { "current" : str(self.current_position).rjust(len(str(self.total_size))), "total" : self.total_size, "percent" : self.total_size and (self.current_position * 100 / self.total_size) or 0, "elapsed" : sec_elapsed, "speed" : print_speed[0], "speed_coeff" : print_speed[1] }) if done_message: self._stdout.write(" %s\n" % done_message) self._stdout.flush() class ProgressCR(Progress): ## Uses CR char (Carriage Return) just like other progress bars do. CR_char = chr(13) def display(self, new_file = False, done_message = None): """ display(new_file = False[/True], done_message = None) """ if new_file: self.output_labels() return # Only display progress every so often if not (new_file or done_message) and not self._display_needed(): return timedelta = self.time_current - self.time_start sec_elapsed = timedelta.days * 86400 + timedelta.seconds + float(timedelta.microseconds)/1000000.0 if (sec_elapsed > 0): print_speed = Utils.formatSize((self.current_position - self.initial_position) / sec_elapsed, True, True) else: print_speed = (0, "") self._stdout.write(self.CR_char) output = " %(current)s of %(total)s %(percent)3d%% in %(elapsed)4ds %(speed)7.2f %(speed_coeff)sB/s" % { "current" : str(self.current_position).rjust(len(str(self.total_size))), "total" : self.total_size, "percent" : self.total_size and (self.current_position * 100 / self.total_size) or 0, "elapsed" : sec_elapsed, "speed" : print_speed[0], "speed_coeff" : print_speed[1] } self._stdout.write(output) if done_message: self._stdout.write(" %s\n" % done_message) self._stdout.flush() class StatsInfo(object): """Holding info for stats totals""" def __init__(self): self.files = None self.size = None self.files_transferred = None self.size_transferred = None self.files_copied = None self.size_copied = None self.files_deleted = None self.size_deleted = None def format_output(self): outstr = u"" if self.files is not None: tmp_str = u"Number of files: %d"% self.files if self.size is not None: tmp_str += " (%d bytes) "% self.size outstr += u"\nStats: " + tmp_str if self.files_transferred: tmp_str = u"Number of files transferred: %d"% self.files_transferred if self.size_transferred is not None: tmp_str += " (%d bytes) "% self.size_transferred outstr += u"\nStats: " + tmp_str if self.files_copied: tmp_str = u"Number of files copied: %d"% self.files_copied if self.size_copied is not None: tmp_str += " (%d bytes) "% self.size_copied outstr += u"\nStats: " + tmp_str if self.files_deleted: tmp_str = u"Number of files deleted: %d"% self.files_deleted if self.size_deleted is not None: tmp_str += " (%d bytes) "% self.size_deleted outstr += u"\nStats: " + tmp_str return outstr # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/CloudFront.py0000664000175000017500000007777612647745544016630 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon CloudFront support ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import sys import time import random from datetime import datetime from logging import debug, info, warning, error try: import xml.etree.ElementTree as ET except ImportError: import elementtree.ElementTree as ET from S3 import S3 from Config import Config from Exceptions import * from Utils import getTreeFromXml, appendXmlTextNode, getDictFromTree, dateS3toPython, getBucketFromHostname, getHostnameFromBucket, deunicodise from Crypto import sign_string_v2 from S3Uri import S3Uri, S3UriS3 from ConnMan import ConnMan cloudfront_api_version = "2010-11-01" cloudfront_resource = "/%(api_ver)s/distribution" % { 'api_ver' : cloudfront_api_version } def output(message): sys.stdout.write(message + "\n") def pretty_output(label, message): #label = ("%s " % label).ljust(20, ".") label = ("%s:" % label).ljust(15) output("%s %s" % (label, message)) class DistributionSummary(object): ## Example: ## ## ## 1234567890ABC ## Deployed ## 2009-01-16T11:49:02.189Z ## blahblahblah.cloudfront.net ## ## example.bucket.s3.amazonaws.com ## ## cdn.example.com ## img.example.com ## What Ever ## true ## def __init__(self, tree): if tree.tag != "DistributionSummary": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) self.info['Enabled'] = (self.info['Enabled'].lower() == "true") if self.info.has_key("CNAME") and type(self.info['CNAME']) != list: self.info['CNAME'] = [self.info['CNAME']] def uri(self): return S3Uri(u"cf://%s" % self.info['Id']) class DistributionList(object): ## Example: ## ## ## ## 100 ## false ## ## ... handled by DistributionSummary() class ... ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "DistributionList": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) ## Normalise some items self.info['IsTruncated'] = (self.info['IsTruncated'].lower() == "true") self.dist_summs = [] for dist_summ in tree.findall(".//DistributionSummary"): self.dist_summs.append(DistributionSummary(dist_summ)) class Distribution(object): ## Example: ## ## ## 1234567890ABC ## InProgress ## 2009-01-16T13:07:11.319Z ## blahblahblah.cloudfront.net ## ## ... handled by DistributionConfig() class ... ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "Distribution": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) ## Normalise some items self.info['LastModifiedTime'] = dateS3toPython(self.info['LastModifiedTime']) self.info['DistributionConfig'] = DistributionConfig(tree = tree.find(".//DistributionConfig")) def uri(self): return S3Uri(u"cf://%s" % self.info['Id']) class DistributionConfig(object): ## Example: ## ## ## somebucket.s3.amazonaws.com ## s3://somebucket/ ## http://somebucket.s3.amazonaws.com/ ## true ## ## bu.ck.et ## /cf-somebucket/ ## ## EMPTY_CONFIG = "true" xmlns = "http://cloudfront.amazonaws.com/doc/%(api_ver)s/" % { 'api_ver' : cloudfront_api_version } def __init__(self, xml = None, tree = None): if xml is None: xml = DistributionConfig.EMPTY_CONFIG if tree is None: tree = getTreeFromXml(xml) if tree.tag != "DistributionConfig": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) self.info['Enabled'] = (self.info['Enabled'].lower() == "true") if not self.info.has_key("CNAME"): self.info['CNAME'] = [] if type(self.info['CNAME']) != list: self.info['CNAME'] = [self.info['CNAME']] self.info['CNAME'] = [cname.lower() for cname in self.info['CNAME']] if not self.info.has_key("Comment"): self.info['Comment'] = "" if not self.info.has_key("DefaultRootObject"): self.info['DefaultRootObject'] = "" ## Figure out logging - complex node not parsed by getDictFromTree() logging_nodes = tree.findall(".//Logging") if logging_nodes: logging_dict = getDictFromTree(logging_nodes[0]) logging_dict['Bucket'], success = getBucketFromHostname(logging_dict['Bucket']) if not success: warning("Logging to unparsable bucket name: %s" % logging_dict['Bucket']) self.info['Logging'] = S3UriS3(u"s3://%(Bucket)s/%(Prefix)s" % logging_dict) else: self.info['Logging'] = None def __str__(self): tree = ET.Element("DistributionConfig") tree.attrib['xmlns'] = DistributionConfig.xmlns ## Retain the order of the following calls! s3org = appendXmlTextNode("S3Origin", '', tree) appendXmlTextNode("DNSName", self.info['S3Origin']['DNSName'], s3org) appendXmlTextNode("CallerReference", self.info['CallerReference'], tree) for cname in self.info['CNAME']: appendXmlTextNode("CNAME", cname.lower(), tree) if self.info['Comment']: appendXmlTextNode("Comment", self.info['Comment'], tree) appendXmlTextNode("Enabled", str(self.info['Enabled']).lower(), tree) # don't create a empty DefaultRootObject element as it would result in a MalformedXML error if str(self.info['DefaultRootObject']): appendXmlTextNode("DefaultRootObject", str(self.info['DefaultRootObject']), tree) if self.info['Logging']: logging_el = ET.Element("Logging") appendXmlTextNode("Bucket", getHostnameFromBucket(self.info['Logging'].bucket()), logging_el) appendXmlTextNode("Prefix", self.info['Logging'].object(), logging_el) tree.append(logging_el) return ET.tostring(tree) class Invalidation(object): ## Example: ## ## ## id ## status ## date ## ## /image1.jpg ## /image2.jpg ## /videos/movie.flv ## my-batch ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "Invalidation": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) def __str__(self): return str(self.info) class InvalidationList(object): ## Example: ## ## ## ## Invalidation ID ## 2 ## true ## ## [Second Invalidation ID] ## Completed ## ## ## [First Invalidation ID] ## Completed ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "InvalidationList": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) def __str__(self): return str(self.info) class InvalidationBatch(object): ## Example: ## ## ## /image1.jpg ## /image2.jpg ## /videos/movie.flv ## /sound%20track.mp3 ## my-batch ## def __init__(self, reference = None, distribution = None, paths = []): if reference: self.reference = reference else: if not distribution: distribution="0" self.reference = "%s.%s.%s" % (distribution, datetime.strftime(datetime.now(),"%Y%m%d%H%M%S"), random.randint(1000,9999)) self.paths = [] self.add_objects(paths) def add_objects(self, paths): self.paths.extend(paths) def get_reference(self): return self.reference def __str__(self): tree = ET.Element("InvalidationBatch") s3 = S3(Config()) for path in self.paths: if len(path) < 1 or path[0] != "/": path = "/" + path appendXmlTextNode("Path", s3.urlencode_string(path), tree) appendXmlTextNode("CallerReference", self.reference, tree) return ET.tostring(tree) class CloudFront(object): operations = { "CreateDist" : { 'method' : "POST", 'resource' : "" }, "DeleteDist" : { 'method' : "DELETE", 'resource' : "/%(dist_id)s" }, "GetList" : { 'method' : "GET", 'resource' : "" }, "GetDistInfo" : { 'method' : "GET", 'resource' : "/%(dist_id)s" }, "GetDistConfig" : { 'method' : "GET", 'resource' : "/%(dist_id)s/config" }, "SetDistConfig" : { 'method' : "PUT", 'resource' : "/%(dist_id)s/config" }, "Invalidate" : { 'method' : "POST", 'resource' : "/%(dist_id)s/invalidation" }, "GetInvalList" : { 'method' : "GET", 'resource' : "/%(dist_id)s/invalidation" }, "GetInvalInfo" : { 'method' : "GET", 'resource' : "/%(dist_id)s/invalidation/%(request_id)s" }, } ## Maximum attempts of re-issuing failed requests _max_retries = 5 dist_list = None def __init__(self, config): self.config = config ## -------------------------------------------------- ## Methods implementing CloudFront API ## -------------------------------------------------- def GetList(self): response = self.send_request("GetList") response['dist_list'] = DistributionList(response['data']) if response['dist_list'].info['IsTruncated']: raise NotImplementedError("List is truncated. Ask s3cmd author to add support.") ## TODO: handle Truncated return response def CreateDistribution(self, uri, cnames_add = [], comment = None, logging = None, default_root_object = None): dist_config = DistributionConfig() dist_config.info['Enabled'] = True dist_config.info['S3Origin']['DNSName'] = uri.host_name() dist_config.info['CallerReference'] = str(uri) dist_config.info['DefaultRootObject'] = default_root_object if comment == None: dist_config.info['Comment'] = uri.public_url() else: dist_config.info['Comment'] = comment for cname in cnames_add: if dist_config.info['CNAME'].count(cname) == 0: dist_config.info['CNAME'].append(cname) if logging: dist_config.info['Logging'] = S3UriS3(logging) request_body = str(dist_config) debug("CreateDistribution(): request_body: %s" % request_body) response = self.send_request("CreateDist", body = request_body) response['distribution'] = Distribution(response['data']) return response def ModifyDistribution(self, cfuri, cnames_add = [], cnames_remove = [], comment = None, enabled = None, logging = None, default_root_object = None): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) # Get current dist status (enabled/disabled) and Etag info("Checking current status of %s" % cfuri) response = self.GetDistConfig(cfuri) dc = response['dist_config'] if enabled != None: dc.info['Enabled'] = enabled if comment != None: dc.info['Comment'] = comment if default_root_object != None: dc.info['DefaultRootObject'] = default_root_object for cname in cnames_add: if dc.info['CNAME'].count(cname) == 0: dc.info['CNAME'].append(cname) for cname in cnames_remove: while dc.info['CNAME'].count(cname) > 0: dc.info['CNAME'].remove(cname) if logging != None: if logging == False: dc.info['Logging'] = False else: dc.info['Logging'] = S3UriS3(logging) response = self.SetDistConfig(cfuri, dc, response['headers']['etag']) return response def DeleteDistribution(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) # Get current dist status (enabled/disabled) and Etag info("Checking current status of %s" % cfuri) response = self.GetDistConfig(cfuri) if response['dist_config'].info['Enabled']: info("Distribution is ENABLED. Disabling first.") response['dist_config'].info['Enabled'] = False response = self.SetDistConfig(cfuri, response['dist_config'], response['headers']['etag']) warning("Waiting for Distribution to become disabled.") warning("This may take several minutes, please wait.") while True: response = self.GetDistInfo(cfuri) d = response['distribution'] if d.info['Status'] == "Deployed" and d.info['Enabled'] == False: info("Distribution is now disabled") break warning("Still waiting...") time.sleep(10) headers = {} headers['if-match'] = response['headers']['etag'] response = self.send_request("DeleteDist", dist_id = cfuri.dist_id(), headers = headers) return response def GetDistInfo(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) response = self.send_request("GetDistInfo", dist_id = cfuri.dist_id()) response['distribution'] = Distribution(response['data']) return response def GetDistConfig(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) response = self.send_request("GetDistConfig", dist_id = cfuri.dist_id()) response['dist_config'] = DistributionConfig(response['data']) return response def SetDistConfig(self, cfuri, dist_config, etag = None): if etag == None: debug("SetDistConfig(): Etag not set. Fetching it first.") etag = self.GetDistConfig(cfuri)['headers']['etag'] debug("SetDistConfig(): Etag = %s" % etag) request_body = str(dist_config) debug("SetDistConfig(): request_body: %s" % request_body) headers = {} headers['if-match'] = etag response = self.send_request("SetDistConfig", dist_id = cfuri.dist_id(), body = request_body, headers = headers) return response def InvalidateObjects(self, uri, paths, default_index_file, invalidate_default_index_on_cf, invalidate_default_index_root_on_cf): # joseprio: if the user doesn't want to invalidate the default index # path, or if the user wants to invalidate the root of the default # index, we need to process those paths if default_index_file is not None and (not invalidate_default_index_on_cf or invalidate_default_index_root_on_cf): new_paths = [] default_index_suffix = '/' + default_index_file for path in paths: if path.endswith(default_index_suffix) or path == default_index_file: if invalidate_default_index_on_cf: new_paths.append(path) if invalidate_default_index_root_on_cf: new_paths.append(path[:-len(default_index_file)]) else: new_paths.append(path) paths = new_paths # uri could be either cf:// or s3:// uri cfuri = self.get_dist_name_for_bucket(uri) if len(paths) > 999: try: tmp_filename = Utils.mktmpfile() f = open(deunicodise(tmp_filename), "w") f.write(deunicodise("\n".join(paths)+"\n")) f.close() warning("Request to invalidate %d paths (max 999 supported)" % len(paths)) warning("All the paths are now saved in: %s" % tmp_filename) except: pass raise ParameterError("Too many paths to invalidate") invalbatch = InvalidationBatch(distribution = cfuri.dist_id(), paths = paths) debug("InvalidateObjects(): request_body: %s" % invalbatch) response = self.send_request("Invalidate", dist_id = cfuri.dist_id(), body = str(invalbatch)) response['dist_id'] = cfuri.dist_id() if response['status'] == 201: inval_info = Invalidation(response['data']).info response['request_id'] = inval_info['Id'] debug("InvalidateObjects(): response: %s" % response) return response def GetInvalList(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) response = self.send_request("GetInvalList", dist_id = cfuri.dist_id()) response['inval_list'] = InvalidationList(response['data']) return response def GetInvalInfo(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) if cfuri.request_id() is None: raise ValueError("Expected CFUri with Request ID") response = self.send_request("GetInvalInfo", dist_id = cfuri.dist_id(), request_id = cfuri.request_id()) response['inval_status'] = Invalidation(response['data']) return response ## -------------------------------------------------- ## Low-level methods for handling CloudFront requests ## -------------------------------------------------- def send_request(self, op_name, dist_id = None, request_id = None, body = None, headers = {}, retries = _max_retries): operation = self.operations[op_name] if body: headers['content-type'] = 'text/plain' request = self.create_request(operation, dist_id, request_id, headers) conn = self.get_connection() debug("send_request(): %s %s" % (request['method'], request['resource'])) conn.c.request(request['method'], request['resource'], body, request['headers']) http_response = conn.c.getresponse() response = {} response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = dict(http_response.getheaders()) response["data"] = http_response.read() ConnMan.put(conn) debug("CloudFront: response: %r" % response) if response["status"] >= 500: e = CloudFrontError(response) if retries: warning(u"Retrying failed request: %s" % op_name) warning(unicode(e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(op_name, dist_id, body, retries = retries - 1) else: raise e if response["status"] < 200 or response["status"] > 299: raise CloudFrontError(response) return response def create_request(self, operation, dist_id = None, request_id = None, headers = None): resource = cloudfront_resource + ( operation['resource'] % { 'dist_id' : dist_id, 'request_id' : request_id }) if not headers: headers = {} if headers.has_key("date"): if not headers.has_key("x-amz-date"): headers["x-amz-date"] = headers["date"] del(headers["date"]) if not headers.has_key("x-amz-date"): headers["x-amz-date"] = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime()) if len(self.config.access_token)>0: self.config.role_refresh() headers['x-amz-security-token']=self.config.access_token signature = self.sign_request(headers) headers["Authorization"] = "AWS "+self.config.access_key+":"+signature request = {} request['resource'] = resource request['headers'] = headers request['method'] = operation['method'] return request def sign_request(self, headers): string_to_sign = headers['x-amz-date'] signature = sign_string_v2(string_to_sign) debug(u"CloudFront.sign_request('%s') = %s" % (string_to_sign, signature)) return signature def get_connection(self): conn = ConnMan.get(self.config.cloudfront_host, ssl = True) return conn def _fail_wait(self, retries): # Wait a few seconds. The more it fails the more we wait. return (self._max_retries - retries + 1) * 3 def get_dist_name_for_bucket(self, uri): if (uri.type == "cf"): return uri if (uri.type != "s3"): raise ParameterError("CloudFront or S3 URI required instead of: %s" % uri) debug("_get_dist_name_for_bucket(%r)" % uri) if CloudFront.dist_list is None: response = self.GetList() CloudFront.dist_list = {} for d in response['dist_list'].dist_summs: if d.info.has_key("S3Origin"): CloudFront.dist_list[getBucketFromHostname(d.info['S3Origin']['DNSName'])[0]] = d.uri() elif d.info.has_key("CustomOrigin"): # Aral: This used to skip over distributions with CustomOrigin, however, we mustn't # do this since S3 buckets that are set up as websites use custom origins. # Thankfully, the custom origin URLs they use start with the URL of the # S3 bucket. Here, we make use this naming convention to support this use case. distListIndex = getBucketFromHostname(d.info['CustomOrigin']['DNSName'])[0]; distListIndex = distListIndex[:len(uri.bucket())] CloudFront.dist_list[distListIndex] = d.uri() else: # Aral: I'm not sure when this condition will be reached, but keeping it in there. continue debug("dist_list: %s" % CloudFront.dist_list) try: return CloudFront.dist_list[uri.bucket()] except Exception, e: debug(e) raise ParameterError("Unable to translate S3 URI to CloudFront distribution name: %s" % uri) class Cmd(object): """ Class that implements CloudFront commands """ class Options(object): cf_cnames_add = [] cf_cnames_remove = [] cf_comment = None cf_enable = None cf_logging = None cf_default_root_object = None def option_list(self): return [opt for opt in dir(self) if opt.startswith("cf_")] def update_option(self, option, value): setattr(Cmd.options, option, value) options = Options() @staticmethod def _parse_args(args): cf = CloudFront(Config()) cfuris = [] for arg in args: uri = cf.get_dist_name_for_bucket(S3Uri(arg)) cfuris.append(uri) return cfuris @staticmethod def info(args): cf = CloudFront(Config()) if not args: response = cf.GetList() for d in response['dist_list'].dist_summs: if d.info.has_key("S3Origin"): origin = S3UriS3.httpurl_to_s3uri(d.info['S3Origin']['DNSName']) elif d.info.has_key("CustomOrigin"): origin = "http://%s/" % d.info['CustomOrigin']['DNSName'] else: origin = "" pretty_output("Origin", origin) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) if d.info.has_key("CNAME"): pretty_output("CNAMEs", ", ".join(d.info['CNAME'])) pretty_output("Status", d.info['Status']) pretty_output("Enabled", d.info['Enabled']) output("") else: cfuris = Cmd._parse_args(args) for cfuri in cfuris: response = cf.GetDistInfo(cfuri) d = response['distribution'] dc = d.info['DistributionConfig'] if dc.info.has_key("S3Origin"): origin = S3UriS3.httpurl_to_s3uri(dc.info['S3Origin']['DNSName']) elif dc.info.has_key("CustomOrigin"): origin = "http://%s/" % dc.info['CustomOrigin']['DNSName'] else: origin = "" pretty_output("Origin", origin) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) if dc.info.has_key("CNAME"): pretty_output("CNAMEs", ", ".join(dc.info['CNAME'])) pretty_output("Status", d.info['Status']) pretty_output("Comment", dc.info['Comment']) pretty_output("Enabled", dc.info['Enabled']) pretty_output("DfltRootObject", dc.info['DefaultRootObject']) pretty_output("Logging", dc.info['Logging'] or "Disabled") pretty_output("Etag", response['headers']['etag']) @staticmethod def create(args): cf = CloudFront(Config()) buckets = [] for arg in args: uri = S3Uri(arg) if uri.type != "s3": raise ParameterError("Distribution can only be created from a s3:// URI instead of: %s" % arg) if uri.object(): raise ParameterError("Use s3:// URI with a bucket name only instead of: %s" % arg) if not uri.is_dns_compatible(): raise ParameterError("CloudFront can only handle lowercase-named buckets.") buckets.append(uri) if not buckets: raise ParameterError("No valid bucket names found") for uri in buckets: info("Creating distribution from: %s" % uri) response = cf.CreateDistribution(uri, cnames_add = Cmd.options.cf_cnames_add, comment = Cmd.options.cf_comment, logging = Cmd.options.cf_logging, default_root_object = Cmd.options.cf_default_root_object) d = response['distribution'] dc = d.info['DistributionConfig'] output("Distribution created:") pretty_output("Origin", S3UriS3.httpurl_to_s3uri(dc.info['S3Origin']['DNSName'])) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) pretty_output("CNAMEs", ", ".join(dc.info['CNAME'])) pretty_output("Comment", dc.info['Comment']) pretty_output("Status", d.info['Status']) pretty_output("Enabled", dc.info['Enabled']) pretty_output("DefaultRootObject", dc.info['DefaultRootObject']) pretty_output("Etag", response['headers']['etag']) @staticmethod def delete(args): cf = CloudFront(Config()) cfuris = Cmd._parse_args(args) for cfuri in cfuris: response = cf.DeleteDistribution(cfuri) if response['status'] >= 400: error("Distribution %s could not be deleted: %s" % (cfuri, response['reason'])) output("Distribution %s deleted" % cfuri) @staticmethod def modify(args): cf = CloudFront(Config()) if len(args) > 1: raise ParameterError("Too many parameters. Modify one Distribution at a time.") try: cfuri = Cmd._parse_args(args)[0] except IndexError: raise ParameterError("No valid Distribution URI found.") response = cf.ModifyDistribution(cfuri, cnames_add = Cmd.options.cf_cnames_add, cnames_remove = Cmd.options.cf_cnames_remove, comment = Cmd.options.cf_comment, enabled = Cmd.options.cf_enable, logging = Cmd.options.cf_logging, default_root_object = Cmd.options.cf_default_root_object) if response['status'] >= 400: error("Distribution %s could not be modified: %s" % (cfuri, response['reason'])) output("Distribution modified: %s" % cfuri) response = cf.GetDistInfo(cfuri) d = response['distribution'] dc = d.info['DistributionConfig'] pretty_output("Origin", S3UriS3.httpurl_to_s3uri(dc.info['S3Origin']['DNSName'])) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) pretty_output("Status", d.info['Status']) pretty_output("CNAMEs", ", ".join(dc.info['CNAME'])) pretty_output("Comment", dc.info['Comment']) pretty_output("Enabled", dc.info['Enabled']) pretty_output("DefaultRootObject", dc.info['DefaultRootObject']) pretty_output("Etag", response['headers']['etag']) @staticmethod def invalinfo(args): cf = CloudFront(Config()) cfuris = Cmd._parse_args(args) requests = [] for cfuri in cfuris: if cfuri.request_id(): requests.append(str(cfuri)) else: inval_list = cf.GetInvalList(cfuri) try: for i in inval_list['inval_list'].info['InvalidationSummary']: requests.append("/".join(["cf:/", cfuri.dist_id(), i["Id"]])) except: continue for req in requests: cfuri = S3Uri(req) inval_info = cf.GetInvalInfo(cfuri) st = inval_info['inval_status'].info pretty_output("URI", str(cfuri)) pretty_output("Status", st['Status']) pretty_output("Created", st['CreateTime']) pretty_output("Nr of paths", len(st['InvalidationBatch']['Path'])) pretty_output("Reference", st['InvalidationBatch']['CallerReference']) output("") # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/AccessLog.py0000664000175000017500000000564412647745544016375 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 - Access Control List representation ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import S3Uri from Exceptions import ParameterError from Utils import getTreeFromXml from ACL import GranteeAnonRead try: import xml.etree.ElementTree as ET except ImportError: import elementtree.ElementTree as ET __all__ = [] class AccessLog(object): LOG_DISABLED = "" LOG_TEMPLATE = "" def __init__(self, xml = None): if not xml: xml = self.LOG_DISABLED self.tree = getTreeFromXml(xml) self.tree.attrib['xmlns'] = "http://doc.s3.amazonaws.com/2006-03-01" def isLoggingEnabled(self): return (self.tree.find(".//LoggingEnabled") is not None) def disableLogging(self): el = self.tree.find(".//LoggingEnabled") if el: self.tree.remove(el) def enableLogging(self, target_prefix_uri): el = self.tree.find(".//LoggingEnabled") if not el: el = getTreeFromXml(self.LOG_TEMPLATE) self.tree.append(el) el.find(".//TargetBucket").text = target_prefix_uri.bucket() el.find(".//TargetPrefix").text = target_prefix_uri.object() def targetPrefix(self): if self.isLoggingEnabled(): target_prefix = u"s3://%s/%s" % ( self.tree.find(".//LoggingEnabled//TargetBucket").text, self.tree.find(".//LoggingEnabled//TargetPrefix").text) return S3Uri.S3Uri(target_prefix) else: return "" def setAclPublic(self, acl_public): le = self.tree.find(".//LoggingEnabled") if le is None: raise ParameterError("Logging not enabled, can't set default ACL for logs") tg = le.find(".//TargetGrants") if not acl_public: if not tg: ## All good, it's not been there return else: le.remove(tg) else: # acl_public == True anon_read = GranteeAnonRead().getElement() if not tg: tg = ET.SubElement(le, "TargetGrants") ## What if TargetGrants already exists? We should check if ## AnonRead is there before appending a new one. Later... tg.append(anon_read) def isAclPublic(self): raise NotImplementedError() def __str__(self): return ET.tostring(self.tree) __all__.append("AccessLog") if __name__ == "__main__": log = AccessLog() print log log.enableLogging(S3Uri.S3Uri(u"s3://targetbucket/prefix/log-")) print log log.setAclPublic(True) print log log.setAclPublic(False) print log log.disableLogging() print log # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/ConnMan.py0000664000175000017500000002027212647745544016055 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import sys import httplib import ssl from threading import Semaphore from logging import debug from Config import Config from Exceptions import ParameterError if not 'CertificateError ' in ssl.__dict__: class CertificateError(Exception): pass ssl.CertificateError = CertificateError __all__ = [ "ConnMan" ] class http_connection(object): context = None context_set = False @staticmethod def _ssl_verified_context(cafile): cfg = Config() context = None try: context = ssl.create_default_context(cafile=cafile) except AttributeError: # no ssl.create_default_context pass if context and not cfg.check_ssl_hostname: context.check_hostname = False debug(u'Disabling SSL certificate hostname checking') return context @staticmethod def _ssl_unverified_context(cafile): debug(u'Disabling SSL certificate checking') context = None try: context = ssl._create_unverified_context(cafile=cafile, cert_reqs=ssl.CERT_NONE) except AttributeError: # no ssl._create_unverified_context pass return context @staticmethod def _ssl_context(): if http_connection.context_set: return http_connection.context cfg = Config() cafile = cfg.ca_certs_file if cafile == "": cafile = None debug(u"Using ca_certs_file %s" % cafile) if cfg.check_ssl_certificate: context = http_connection._ssl_verified_context(cafile) else: context = http_connection._ssl_unverified_context(cafile) http_connection.context = context http_connection.context_set = True return context def match_hostname_aws(self, cert, e): """ Wildcard matching for *.s3.amazonaws.com and similar per region. Per http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html: "We recommend that all bucket names comply with DNS naming conventions." Per http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html: "When using virtual hosted-style buckets with SSL, the SSL wild card certificate only matches buckets that do not contain periods. To work around this, use HTTP or write your own certificate verification logic." Therefore, we need a custom validation routine that allows mybucket.example.com.s3.amazonaws.com to be considered a valid hostname for the *.s3.amazonaws.com wildcard cert, and for the region-specific *.s3-[region].amazonaws.com wildcard cert. """ debug(u'checking SSL subjectAltName against amazonaws.com') san = cert.get('subjectAltName', ()) for key, value in san: if key == 'DNS': if value.startswith('*.s3') and \ (value.endswith('.amazonaws.com') and self.hostname.endswith('.amazonaws.com')) or \ (value.endswith('.amazonaws.com.cn') and self.hostname.endswith('.amazonaws.com.cn')): return raise e def match_hostname(self): cert = self.c.sock.getpeercert() try: ssl.match_hostname(cert, self.hostname) except AttributeError: # old ssl module doesn't have this function return except ValueError: # empty SSL cert means underlying SSL library didn't validate it, we don't either. return except ssl.CertificateError, e: self.match_hostname_aws(cert, e) @staticmethod def _https_connection(hostname, port=None): check_hostname = True try: context = http_connection._ssl_context() # S3's wildcart certificate doesn't work with DNS-style named buckets. if (hostname.endswith('.amazonaws.com') or hostname.endswith('.amazonaws.com.cn')): # this merely delays running the hostname check until # after the connection is made and we get control # back. We then run the same check, relaxed for S3's # wildcard certificates. debug(u'Recognized AWS S3 host, disabling initial SSL hostname check') check_hostname = False if context: context.check_hostname = False conn = httplib.HTTPSConnection(hostname, port, context=context, check_hostname=check_hostname) debug(u'httplib.HTTPSConnection() has both context and check_hostname') except TypeError: try: # in case check_hostname parameter is not present try again conn = httplib.HTTPSConnection(hostname, port, context=context) debug(u'httplib.HTTPSConnection() has only context') except TypeError: # in case even context parameter is not present try one last time conn = httplib.HTTPSConnection(hostname, port) debug(u'httplib.HTTPSConnection() has neither context nor check_hostname') return conn def __init__(self, id, hostname, ssl, cfg): self.ssl = ssl self.id = id self.counter = 0 self.hostname = hostname if not ssl: if cfg.proxy_host != "": self.c = httplib.HTTPConnection(cfg.proxy_host, cfg.proxy_port) debug(u'proxied HTTPConnection(%s, %s)' % (cfg.proxy_host, cfg.proxy_port)) else: self.c = httplib.HTTPConnection(hostname) debug(u'non-proxied HTTPConnection(%s)' % hostname) else: if cfg.proxy_host != "": self.c = http_connection._https_connection(cfg.proxy_host, cfg.proxy_port) self.c.set_tunnel(hostname) debug(u'proxied HTTPSConnection(%s, %s)' % (cfg.proxy_host, cfg.proxy_port)) debug(u'tunnel to %s' % hostname) else: self.c = http_connection._https_connection(hostname) debug(u'non-proxied HTTPSConnection(%s)' % hostname) class ConnMan(object): conn_pool_sem = Semaphore() conn_pool = {} conn_max_counter = 800 ## AWS closes connection after some ~90 requests @staticmethod def get(hostname, ssl = None): cfg = Config() if ssl == None: ssl = cfg.use_https conn = None if cfg.proxy_host != "": if ssl and sys.hexversion < 0x02070000: raise ParameterError("use_https=True can't be used with proxy on Python <2.7") conn_id = "proxy://%s:%s" % (cfg.proxy_host, cfg.proxy_port) else: conn_id = "http%s://%s" % (ssl and "s" or "", hostname) ConnMan.conn_pool_sem.acquire() if not ConnMan.conn_pool.has_key(conn_id): ConnMan.conn_pool[conn_id] = [] if len(ConnMan.conn_pool[conn_id]): conn = ConnMan.conn_pool[conn_id].pop() debug("ConnMan.get(): re-using connection: %s#%d" % (conn.id, conn.counter)) ConnMan.conn_pool_sem.release() if not conn: debug("ConnMan.get(): creating new connection: %s" % conn_id) conn = http_connection(conn_id, hostname, ssl, cfg) conn.c.connect() if conn.ssl and cfg.check_ssl_certificate and cfg.check_ssl_hostname: conn.match_hostname() conn.counter += 1 return conn @staticmethod def put(conn): if conn.id.startswith("proxy://"): conn.c.close() debug("ConnMan.put(): closing proxy connection (keep-alive not yet supported)") return if conn.counter >= ConnMan.conn_max_counter: conn.c.close() debug("ConnMan.put(): closing over-used connection") return ConnMan.conn_pool_sem.acquire() ConnMan.conn_pool[conn.id].append(conn) ConnMan.conn_pool_sem.release() debug("ConnMan.put(): connection put back to pool (%s#%d)" % (conn.id, conn.counter)) s3cmd-1.6.1/S3/ACL.py0000664000175000017500000001631712647745544015130 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 - Access Control List representation ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors from Utils import getTreeFromXml try: import xml.etree.ElementTree as ET except ImportError: import elementtree.ElementTree as ET class Grantee(object): ALL_USERS_URI = "http://acs.amazonaws.com/groups/global/AllUsers" LOG_DELIVERY_URI = "http://acs.amazonaws.com/groups/s3/LogDelivery" def __init__(self): self.xsi_type = None self.tag = None self.name = None self.display_name = None self.permission = None def __repr__(self): return 'Grantee("%(tag)s", "%(name)s", "%(permission)s")' % { "tag" : self.tag, "name" : self.name, "permission" : self.permission } def isAllUsers(self): return self.tag == "URI" and self.name == Grantee.ALL_USERS_URI def isAnonRead(self): return self.isAllUsers() and (self.permission == "READ" or self.permission == "FULL_CONTROL") def getElement(self): el = ET.Element("Grant") grantee = ET.SubElement(el, "Grantee", { 'xmlns:xsi' : 'http://www.w3.org/2001/XMLSchema-instance', 'xsi:type' : self.xsi_type }) name = ET.SubElement(grantee, self.tag) name.text = self.name permission = ET.SubElement(el, "Permission") permission.text = self.permission return el class GranteeAnonRead(Grantee): def __init__(self): Grantee.__init__(self) self.xsi_type = "Group" self.tag = "URI" self.name = Grantee.ALL_USERS_URI self.permission = "READ" class GranteeLogDelivery(Grantee): def __init__(self, permission): """ permission must be either READ_ACP or WRITE """ Grantee.__init__(self) self.xsi_type = "Group" self.tag = "URI" self.name = Grantee.LOG_DELIVERY_URI self.permission = permission class ACL(object): EMPTY_ACL = "" def __init__(self, xml = None): if not xml: xml = ACL.EMPTY_ACL self.grantees = [] self.owner_id = "" self.owner_nick = "" tree = getTreeFromXml(xml) self.parseOwner(tree) self.parseGrants(tree) def parseOwner(self, tree): self.owner_id = tree.findtext(".//Owner//ID") self.owner_nick = tree.findtext(".//Owner//DisplayName") def parseGrants(self, tree): for grant in tree.findall(".//Grant"): grantee = Grantee() g = grant.find(".//Grantee") grantee.xsi_type = g.attrib['{http://www.w3.org/2001/XMLSchema-instance}type'] grantee.permission = grant.find('Permission').text for el in g: if el.tag == "DisplayName": grantee.display_name = el.text else: grantee.tag = el.tag grantee.name = el.text self.grantees.append(grantee) def getGrantList(self): acl = [] for grantee in self.grantees: if grantee.display_name: user = grantee.display_name elif grantee.isAllUsers(): user = "*anon*" else: user = grantee.name acl.append({'grantee': user, 'permission': grantee.permission}) return acl def getOwner(self): return { 'id' : self.owner_id, 'nick' : self.owner_nick } def isAnonRead(self): for grantee in self.grantees: if grantee.isAnonRead(): return True return False def grantAnonRead(self): if not self.isAnonRead(): self.appendGrantee(GranteeAnonRead()) def revokeAnonRead(self): self.grantees = [g for g in self.grantees if not g.isAnonRead()] def appendGrantee(self, grantee): self.grantees.append(grantee) def hasGrant(self, name, permission): name = name.lower() permission = permission.upper() for grantee in self.grantees: if grantee.name.lower() == name: if grantee.permission == "FULL_CONTROL": return True elif grantee.permission.upper() == permission: return True return False; def grant(self, name, permission): if self.hasGrant(name, permission): return permission = permission.upper() if "ALL" == permission: permission = "FULL_CONTROL" if "FULL_CONTROL" == permission: self.revoke(name, "ALL") grantee = Grantee() grantee.name = name grantee.permission = permission if '@' in name: grantee.name = grantee.name.lower() grantee.xsi_type = "AmazonCustomerByEmail" grantee.tag = "EmailAddress" elif 'http://acs.amazonaws.com/groups/' in name: grantee.xsi_type = "Group" grantee.tag = "URI" else: grantee.name = grantee.name.lower() grantee.xsi_type = "CanonicalUser" grantee.tag = "ID" self.appendGrantee(grantee) def revoke(self, name, permission): name = name.lower() permission = permission.upper() if "ALL" == permission: self.grantees = [g for g in self.grantees if not (g.name.lower() == name or g.display_name.lower() == name)] else: self.grantees = [g for g in self.grantees if not ((g.display_name.lower() == name and g.permission.upper() == permission)\ or (g.name.lower() == name and g.permission.upper() == permission))] def __str__(self): tree = getTreeFromXml(ACL.EMPTY_ACL) tree.attrib['xmlns'] = "http://s3.amazonaws.com/doc/2006-03-01/" owner = tree.find(".//Owner//ID") owner.text = self.owner_id acl = tree.find(".//AccessControlList") for grantee in self.grantees: acl.append(grantee.getElement()) return ET.tostring(tree) if __name__ == "__main__": xml = """ 12345678901234567890 owner-nickname 12345678901234567890 owner-nickname FULL_CONTROL http://acs.amazonaws.com/groups/global/AllUsers READ """ acl = ACL(xml) print "Grants:", acl.getGrantList() acl.revokeAnonRead() print "Grants:", acl.getGrantList() acl.grantAnonRead() print "Grants:", acl.getGrantList() print acl # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/S3Uri.py0000664000175000017500000001507312647745544015474 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors import os import re import sys from Utils import unicodise, deunicodise, check_bucket_name_dns_support import Config class S3Uri(object): type = None _subclasses = None def __new__(self, string): if not self._subclasses: ## Generate a list of all subclasses of S3Uri self._subclasses = [] dict = sys.modules[__name__].__dict__ for something in dict: if type(dict[something]) is not type(self): continue if issubclass(dict[something], self) and dict[something] != self: self._subclasses.append(dict[something]) for subclass in self._subclasses: try: instance = object.__new__(subclass) instance.__init__(string) return instance except ValueError: continue raise ValueError("%s: not a recognized URI" % string) def __str__(self): return self.uri() def __unicode__(self): return self.uri() def __repr__(self): return "<%s: %s>" % (self.__class__.__name__, self.__unicode__()) def public_url(self): raise ValueError("This S3 URI does not have Anonymous URL representation") def basename(self): return self.__unicode__().split("/")[-1] class S3UriS3(S3Uri): type = "s3" _re = re.compile("^s3:///*([^/]*)/?(.*)", re.IGNORECASE | re.UNICODE) def __init__(self, string): match = self._re.match(string) if not match: raise ValueError("%s: not a S3 URI" % string) groups = match.groups() self._bucket = groups[0] self._object = groups[1] def bucket(self): return self._bucket def object(self): return self._object def has_bucket(self): return bool(self._bucket) def has_object(self): return bool(self._object) def uri(self): return u"/".join([u"s3:/", self._bucket, self._object]) def is_dns_compatible(self): return check_bucket_name_dns_support(Config.Config().host_bucket, self._bucket) def public_url(self): if self.is_dns_compatible(): return "http://%s.%s/%s" % (self._bucket, Config.Config().host_base, self._object) else: return "http://%s/%s/%s" % (Config.Config().host_base, self._bucket, self._object) def host_name(self): if self.is_dns_compatible(): return "%s.s3.amazonaws.com" % (self._bucket) else: return "s3.amazonaws.com" @staticmethod def compose_uri(bucket, object = ""): return u"s3://%s/%s" % (bucket, object) @staticmethod def httpurl_to_s3uri(http_url): m=re.match("(https?://)?([^/]+)/?(.*)", http_url, re.IGNORECASE | re.UNICODE) hostname, object = m.groups()[1:] hostname = hostname.lower() if hostname == "s3.amazonaws.com": ## old-style url: http://s3.amazonaws.com/bucket/object if object.count("/") == 0: ## no object given bucket = object object = "" else: ## bucket/object bucket, object = object.split("/", 1) elif hostname.endswith(".s3.amazonaws.com"): ## new-style url: http://bucket.s3.amazonaws.com/object bucket = hostname[:-(len(".s3.amazonaws.com"))] else: raise ValueError("Unable to parse URL: %s" % http_url) return S3Uri(u"s3://%(bucket)s/%(object)s" % { 'bucket' : bucket, 'object' : object }) class S3UriS3FS(S3Uri): type = "s3fs" _re = re.compile("^s3fs:///*([^/]*)/?(.*)", re.IGNORECASE | re.UNICODE) def __init__(self, string): match = self._re.match(string) if not match: raise ValueError("%s: not a S3fs URI" % string) groups = match.groups() self._fsname = groups[0] self._path = groups[1].split("/") def fsname(self): return self._fsname def path(self): return "/".join(self._path) def uri(self): return "/".join([u"s3fs:/", self._fsname, self.path()]) class S3UriFile(S3Uri): type = "file" _re = re.compile("^(\w+://)?(.*)", re.UNICODE) def __init__(self, string): match = self._re.match(string) groups = match.groups() if groups[0] not in (None, "file://"): raise ValueError("%s: not a file:// URI" % string) self._path = groups[1].split("/") def path(self): return "/".join(self._path) def uri(self): return "/".join(["file:/", self.path()]) def isdir(self): return os.path.isdir(deunicodise(self.path())) def dirname(self): return unicodise(os.path.dirname(deunicodise(self.path()))) class S3UriCloudFront(S3Uri): type = "cf" _re = re.compile("^cf://([^/]*)/*(.*)", re.IGNORECASE | re.UNICODE) def __init__(self, string): match = self._re.match(string) if not match: raise ValueError("%s: not a CloudFront URI" % string) groups = match.groups() self._dist_id = groups[0] self._request_id = groups[1] != "/" and groups[1] or None def dist_id(self): return self._dist_id def request_id(self): return self._request_id def uri(self): uri = "cf://" + self.dist_id() if self.request_id(): uri += "/" + self.request_id() return uri if __name__ == "__main__": uri = S3Uri("s3://bucket/object") print "type() =", type(uri) print "uri =", uri print "uri.type=", uri.type print "bucket =", uri.bucket() print "object =", uri.object() print uri = S3Uri("s3://bucket") print "type() =", type(uri) print "uri =", uri print "uri.type=", uri.type print "bucket =", uri.bucket() print uri = S3Uri("s3fs://filesystem1/path/to/remote/file.txt") print "type() =", type(uri) print "uri =", uri print "uri.type=", uri.type print "path =", uri.path() print uri = S3Uri("/path/to/local/file.txt") print "type() =", type(uri) print "uri =", uri print "uri.type=", uri.type print "path =", uri.path() print uri = S3Uri("cf://1234567890ABCD/") print "type() =", type(uri) print "uri =", uri print "uri.type=", uri.type print "dist_id =", uri.dist_id() print # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/S3/SortedDict.py0000664000175000017500000000424112647745544016566 0ustar mdomschmdomsch00000000000000# -*- coding: utf-8 -*- ## Amazon S3 manager ## Author: Michal Ludvig ## http://www.logix.cz/michal ## License: GPL Version 2 ## Copyright: TGRMN Software and contributors from BidirMap import BidirMap class SortedDictIterator(object): def __init__(self, sorted_dict, keys): self.sorted_dict = sorted_dict self.keys = keys def next(self): try: return self.keys.pop(0) except IndexError: raise StopIteration class SortedDict(dict): def __init__(self, mapping = {}, ignore_case = True, **kwargs): """ WARNING: SortedDict() with ignore_case==True will drop entries differing only in capitalisation! Eg: SortedDict({'auckland':1, 'Auckland':2}).keys() => ['Auckland'] With ignore_case==False it's all right """ dict.__init__(self, mapping, **kwargs) self.ignore_case = ignore_case def keys(self): keys = dict.keys(self) if self.ignore_case: # Translation map xlat_map = BidirMap() for key in keys: xlat_map[key.lower()] = key # Lowercase keys lc_keys = xlat_map.keys() lc_keys.sort() return [xlat_map[k] for k in lc_keys] else: keys.sort() return keys def __iter__(self): return SortedDictIterator(self, self.keys()) def __getslice__(self, i=0, j=-1): keys = self.keys()[i:j] r = SortedDict(ignore_case = self.ignore_case) for k in keys: r[k] = self[k] return r if __name__ == "__main__": d = { 'AWS' : 1, 'Action' : 2, 'america' : 3, 'Auckland' : 4, 'America' : 5 } sd = SortedDict(d) print "Wanted: Action, america, Auckland, AWS, [ignore case]" print "Got: ", for key in sd: print "%s," % key, print " [used: __iter__()]" d = SortedDict(d, ignore_case = False) print "Wanted: AWS, Action, Auckland, america, [case sensitive]" print "Got: ", for key in d.keys(): print "%s," % key, print " [used: keys()]" # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/s3cmd.egg-info/0000775000175000017500000000000012647747124016361 5ustar mdomschmdomsch00000000000000s3cmd-1.6.1/s3cmd.egg-info/dependency_links.txt0000664000175000017500000000000112647747123022426 0ustar mdomschmdomsch00000000000000 s3cmd-1.6.1/s3cmd.egg-info/top_level.txt0000664000175000017500000000000312647747123021103 0ustar mdomschmdomsch00000000000000S3 s3cmd-1.6.1/s3cmd.egg-info/SOURCES.txt0000664000175000017500000000076312647747123020252 0ustar mdomschmdomsch00000000000000INSTALL MANIFEST.in NEWS README.md s3cmd s3cmd.1 setup.cfg setup.py S3/ACL.py S3/AccessLog.py S3/BidirMap.py S3/CloudFront.py S3/Config.py S3/ConnMan.py S3/Crypto.py S3/Exceptions.py S3/ExitCodes.py S3/FileDict.py S3/FileLists.py S3/HashCache.py S3/MultiPart.py S3/PkgInfo.py S3/Progress.py S3/S3.py S3/S3Uri.py S3/SortedDict.py S3/Utils.py S3/__init__.py s3cmd.egg-info/PKG-INFO s3cmd.egg-info/SOURCES.txt s3cmd.egg-info/dependency_links.txt s3cmd.egg-info/requires.txt s3cmd.egg-info/top_level.txts3cmd-1.6.1/s3cmd.egg-info/PKG-INFO0000664000175000017500000000274512647747123017465 0ustar mdomschmdomsch00000000000000Metadata-Version: 1.1 Name: s3cmd Version: 1.6.1 Summary: Command line tool for managing Amazon S3 and CloudFront services Home-page: http://s3tools.org Author: github.com/mdomsch, github.com/matteobar Author-email: s3tools-bugs@lists.sourceforge.net License: GNU GPL v2+ Description: S3cmd lets you copy files from/to Amazon S3 (Simple Storage Service) using a simple to use command line client. Supports rsync-like backup, GPG encryption, and more. Also supports management of Amazon's CloudFront content delivery network. Authors: -------- Michal Ludvig Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Environment :: MacOS X Classifier: Environment :: Win32 (MS Windows) Classifier: Intended Audience :: End Users/Desktop Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+) Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: Unix Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 2 :: Only Classifier: Topic :: System :: Archiving Classifier: Topic :: Utilities s3cmd-1.6.1/s3cmd.egg-info/requires.txt0000664000175000017500000000003512647747123020756 0ustar mdomschmdomsch00000000000000python-dateutil python-magic s3cmd-1.6.1/s3cmd.10000664000175000017500000005031612647746204014754 0ustar mdomschmdomsch00000000000000 .\" !!! IMPORTANT: This file is generated from s3cmd \-\-help output using format-manpage.pl .\" !!! Do your changes either in s3cmd file or in 'format\-manpage.pl' otherwise .\" !!! they will be overwritten! .TH s3cmd 1 .SH NAME s3cmd \- tool for managing Amazon S3 storage space and Amazon CloudFront content delivery network .SH SYNOPSIS .B s3cmd [\fIOPTIONS\fR] \fICOMMAND\fR [\fIPARAMETERS\fR] .SH DESCRIPTION .PP .B s3cmd is a command line client for copying files to/from Amazon S3 (Simple Storage Service) and performing other related tasks, for instance creating and removing buckets, listing objects, etc. .SH COMMANDS .PP .B s3cmd can do several \fIactions\fR specified by the following \fIcommands\fR. .TP s3cmd \fBmb\fR \fIs3://BUCKET\fR Make bucket .TP s3cmd \fBrb\fR \fIs3://BUCKET\fR Remove bucket .TP s3cmd \fBls\fR \fI[s3://BUCKET[/PREFIX]]\fR List objects or buckets .TP s3cmd \fBla\fR \fI\fR List all object in all buckets .TP s3cmd \fBput\fR \fIFILE [FILE...] s3://BUCKET[/PREFIX]\fR Put file into bucket .TP s3cmd \fBget\fR \fIs3://BUCKET/OBJECT LOCAL_FILE\fR Get file from bucket .TP s3cmd \fBdel\fR \fIs3://BUCKET/OBJECT\fR Delete file from bucket .TP s3cmd \fBrm\fR \fIs3://BUCKET/OBJECT\fR Delete file from bucket (alias for del) .TP s3cmd \fBrestore\fR \fIs3://BUCKET/OBJECT\fR Restore file from Glacier storage .TP s3cmd \fBsync\fR \fILOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR\fR Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below) .TP s3cmd \fBdu\fR \fI[s3://BUCKET[/PREFIX]]\fR Disk usage by buckets .TP s3cmd \fBinfo\fR \fIs3://BUCKET[/OBJECT]\fR Get various information about Buckets or Files .TP s3cmd \fBcp\fR \fIs3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]\fR Copy object .TP s3cmd \fBmodify\fR \fIs3://BUCKET1/OBJECT\fR Modify object metadata .TP s3cmd \fBmv\fR \fIs3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]\fR Move object .TP s3cmd \fBsetacl\fR \fIs3://BUCKET[/OBJECT]\fR Modify Access control list for Bucket or Files .TP s3cmd \fBsetpolicy\fR \fIFILE s3://BUCKET\fR Modify Bucket Policy .TP s3cmd \fBdelpolicy\fR \fIs3://BUCKET\fR Delete Bucket Policy .TP s3cmd \fBsetcors\fR \fIFILE s3://BUCKET\fR Modify Bucket CORS .TP s3cmd \fBdelcors\fR \fIs3://BUCKET\fR Delete Bucket CORS .TP s3cmd \fBpayer\fR \fIs3://BUCKET\fR Modify Bucket Requester Pays policy .TP s3cmd \fBmultipart\fR \fIs3://BUCKET [Id]\fR Show multipart uploads .TP s3cmd \fBabortmp\fR \fIs3://BUCKET/OBJECT Id\fR Abort a multipart upload .TP s3cmd \fBlistmp\fR \fIs3://BUCKET/OBJECT Id\fR List parts of a multipart upload .TP s3cmd \fBaccesslog\fR \fIs3://BUCKET\fR Enable/disable bucket access logging .TP s3cmd \fBsign\fR \fISTRING\-TO\-SIGN\fR Sign arbitrary string using the secret key .TP s3cmd \fBsignurl\fR \fIs3://BUCKET/OBJECT \fR Sign an S3 URL to provide limited public access with expiry .TP s3cmd \fBfixbucket\fR \fIs3://BUCKET[/PREFIX]\fR Fix invalid file names in a bucket .TP s3cmd \fBexpire\fR \fIs3://BUCKET\fR Set or delete expiration rule for the bucket .TP s3cmd \fBsetlifecycle\fR \fIFILE s3://BUCKET\fR Upload a lifecycle policy for the bucket .TP s3cmd \fBdellifecycle\fR \fIs3://BUCKET\fR Remove a lifecycle policy for the bucket .PP Commands for static WebSites configuration .TP s3cmd \fBws\-create\fR \fIs3://BUCKET\fR Create Website from bucket .TP s3cmd \fBws\-delete\fR \fIs3://BUCKET\fR Delete Website .TP s3cmd \fBws\-info\fR \fIs3://BUCKET\fR Info about Website .PP Commands for CloudFront management .TP s3cmd \fBcflist\fR \fI\fR List CloudFront distribution points .TP s3cmd \fBcfinfo\fR \fI[cf://DIST_ID]\fR Display CloudFront distribution point parameters .TP s3cmd \fBcfcreate\fR \fIs3://BUCKET\fR Create CloudFront distribution point .TP s3cmd \fBcfdelete\fR \fIcf://DIST_ID\fR Delete CloudFront distribution point .TP s3cmd \fBcfmodify\fR \fIcf://DIST_ID\fR Change CloudFront distribution point parameters .TP s3cmd \fBcfinvalinfo\fR \fIcf://DIST_ID[/INVAL_ID]\fR Display CloudFront invalidation request(s) status .SH OPTIONS .PP Some of the below specified options can have their default values set in .B s3cmd config file (by default $HOME/.s3cmd). As it's a simple text file feel free to open it with your favorite text editor and do any changes you like. .TP \fB\-h\fR, \fB\-\-help\fR show this help message and exit .TP \fB\-\-configure\fR Invoke interactive (re)configuration tool. Optionally use as '\fB\-\-configure\fR s3://some\-bucket' to test access to a specific bucket instead of attempting to list them all. .TP \fB\-c\fR FILE, \fB\-\-config\fR=FILE Config file name. Defaults to $HOME/.s3cfg .TP \fB\-\-dump\-config\fR Dump current configuration after parsing config files and command line options and exit. .TP \fB\-\-access_key\fR=ACCESS_KEY AWS Access Key .TP \fB\-\-secret_key\fR=SECRET_KEY AWS Secret Key .TP \fB\-n\fR, \fB\-\-dry\-run\fR Only show what should be uploaded or downloaded but don't actually do it. May still perform S3 requests to get bucket listings and other information though (only for file transfer commands) .TP \fB\-s\fR, \fB\-\-ssl\fR Use HTTPS connection when communicating with S3. (default) .TP \fB\-\-no\-ssl\fR Don't use HTTPS. .TP \fB\-e\fR, \fB\-\-encrypt\fR Encrypt files before uploading to S3. .TP \fB\-\-no\-encrypt\fR Don't encrypt files. .TP \fB\-f\fR, \fB\-\-force\fR Force overwrite and other dangerous operations. .TP \fB\-\-continue\fR Continue getting a partially downloaded file (only for [get] command). .TP \fB\-\-continue\-put\fR Continue uploading partially uploaded files or multipart upload parts. Restarts/parts files that don't have matching size and md5. Skips files/parts that do. Note: md5sum checks are not always sufficient to check (part) file equality. Enable this at your own risk. .TP \fB\-\-upload\-id\fR=UPLOAD_ID UploadId for Multipart Upload, in case you want continue an existing upload (equivalent to \fB\-\-continue\-\fR put) and there are multiple partial uploads. Use s3cmd multipart [URI] to see what UploadIds are associated with the given URI. .TP \fB\-\-skip\-existing\fR Skip over files that exist at the destination (only for [get] and [sync] commands). .TP \fB\-r\fR, \fB\-\-recursive\fR Recursive upload, download or removal. .TP \fB\-\-check\-md5\fR Check MD5 sums when comparing files for [sync]. (default) .TP \fB\-\-no\-check\-md5\fR Do not check MD5 sums when comparing files for [sync]. Only size will be compared. May significantly speed up transfer but may also miss some changed files. .TP \fB\-P\fR, \fB\-\-acl\-public\fR Store objects with ACL allowing read for anyone. .TP \fB\-\-acl\-private\fR Store objects with default ACL allowing access for you only. .TP \fB\-\-acl\-grant\fR=PERMISSION:EMAIL or USER_CANONICAL_ID Grant stated permission to a given amazon user. Permission is one of: read, write, read_acp, write_acp, full_control, all .TP \fB\-\-acl\-revoke\fR=PERMISSION:USER_CANONICAL_ID Revoke stated permission for a given amazon user. Permission is one of: read, write, read_acp, wr ite_acp, full_control, all .TP \fB\-D\fR NUM, \fB\-\-restore\-days\fR=NUM Number of days to keep restored file available (only for 'restore' command). .TP \fB\-\-delete\-removed\fR Delete remote objects with no corresponding local file [sync] .TP \fB\-\-no\-delete\-removed\fR Don't delete remote objects. .TP \fB\-\-delete\-after\fR Perform deletes after new uploads [sync] .TP \fB\-\-delay\-updates\fR *OBSOLETE* Put all updated files into place at end [sync] .TP \fB\-\-max\-delete\fR=NUM Do not delete more than NUM files. [del] and [sync] .TP \fB\-\-add\-destination\fR=ADDITIONAL_DESTINATIONS Additional destination for parallel uploads, in addition to last arg. May be repeated. .TP \fB\-\-delete\-after\-fetch\fR Delete remote objects after fetching to local file (only for [get] and [sync] commands). .TP \fB\-p\fR, \fB\-\-preserve\fR Preserve filesystem attributes (mode, ownership, timestamps). Default for [sync] command. .TP \fB\-\-no\-preserve\fR Don't store FS attributes .TP \fB\-\-exclude\fR=GLOB Filenames and paths matching GLOB will be excluded from sync .TP \fB\-\-exclude\-from\fR=FILE Read --exclude GLOBs from FILE .TP \fB\-\-rexclude\fR=REGEXP Filenames and paths matching REGEXP (regular expression) will be excluded from sync .TP \fB\-\-rexclude\-from\fR=FILE Read --rexclude REGEXPs from FILE .TP \fB\-\-include\fR=GLOB Filenames and paths matching GLOB will be included even if previously excluded by one of \fB\-\-(r)exclude(\-from)\fR patterns .TP \fB\-\-include\-from\fR=FILE Read --include GLOBs from FILE .TP \fB\-\-rinclude\fR=REGEXP Same as --include but uses REGEXP (regular expression) instead of GLOB .TP \fB\-\-rinclude\-from\fR=FILE Read --rinclude REGEXPs from FILE .TP \fB\-\-files\-from\fR=FILE Read list of source-file names from FILE. Use - to read from stdin. .TP \fB\-\-region\fR=REGION, \fB\-\-bucket\-location\fR=REGION Region to create bucket in. As of now the regions are: us\-east\-1, us\-west\-1, us\-west\-2, eu\-west\-1, eu\- central\-1, ap\-northeast\-1, ap\-southeast\-1, ap\- southeast\-2, sa\-east\-1 .TP \fB\-\-host\fR=HOSTNAME HOSTNAME:PORT for S3 endpoint (default: s3.amazonaws.com, alternatives such as s3\-eu\- west\-1.amazonaws.com). You should also set \fB\-\-host\-\fR bucket. .TP \fB\-\-host\-bucket\fR=HOST_BUCKET DNS\-style bucket+hostname:port template for accessing a bucket (default: %(bucket)s.s3.amazonaws.com) .TP \fB\-\-reduced\-redundancy\fR, \fB\-\-rr\fR Store object with 'Reduced redundancy'. Lower per\-GB price. [put, cp, mv] .TP \fB\-\-no\-reduced\-redundancy\fR, \fB\-\-no\-rr\fR Store object without 'Reduced redundancy'. Higher per\- GB price. [put, cp, mv] .TP \fB\-\-storage\-class\fR=CLASS Store object with specified CLASS (STANDARD, STANDARD_IA, or REDUCED_REDUNDANCY). Lower per\-GB price. [put, cp, mv] .TP \fB\-\-access\-logging\-target\-prefix\fR=LOG_TARGET_PREFIX Target prefix for access logs (S3 URI) (for [cfmodify] and [accesslog] commands) .TP \fB\-\-no\-access\-logging\fR Disable access logging (for [cfmodify] and [accesslog] commands) .TP \fB\-\-default\-mime\-type\fR=DEFAULT_MIME_TYPE Default MIME\-type for stored objects. Application default is binary/octet\-stream. .TP \fB\-M\fR, \fB\-\-guess\-mime\-type\fR Guess MIME\-type of files by their extension or mime magic. Fall back to default MIME\-Type as specified by \fB\-\-default\-mime\-type\fR option .TP \fB\-\-no\-guess\-mime\-type\fR Don't guess MIME-type and use the default type instead. .TP \fB\-\-no\-mime\-magic\fR Don't use mime magic when guessing MIME-type. .TP \fB\-m\fR MIME/TYPE, \fB\-\-mime\-type\fR=MIME/TYPE Force MIME\-type. Override both \fB\-\-default\-mime\-type\fR and \fB\-\-guess\-mime\-type\fR. .TP \fB\-\-add\-header\fR=NAME:VALUE Add a given HTTP header to the upload request. Can be used multiple times. For instance set 'Expires' or \&'Cache\-Control' headers (or both) using this option. .TP \fB\-\-remove\-header\fR=NAME Remove a given HTTP header. Can be used multiple times. For instance, remove 'Expires' or 'Cache\- Control' headers (or both) using this option. [modify] .TP \fB\-\-server\-side\-encryption\fR Specifies that server\-side encryption will be used when putting objects. [put, sync, cp, modify] .TP \fB\-\-server\-side\-encryption\-kms\-id\fR=KMS_KEY Specifies the key id used for server\-side encryption with AWS KMS\-Managed Keys (SSE\-KMS) when putting objects. [put, sync, cp, modify] .TP \fB\-\-encoding\fR=ENCODING Override autodetected terminal and filesystem encoding (character set). Autodetected: UTF\-8 .TP \fB\-\-add\-encoding\-exts\fR=EXTENSIONs Add encoding to these comma delimited extensions i.e. (css,js,html) when uploading to S3 ) .TP \fB\-\-verbatim\fR Use the S3 name as given on the command line. No pre- processing, encoding, etc. Use with caution! .TP \fB\-\-disable\-multipart\fR Disable multipart upload on files bigger than \fB\-\-multipart\-chunk\-size\-mb\fR .TP \fB\-\-multipart\-chunk\-size\-mb\fR=SIZE Size of each chunk of a multipart upload. Files bigger than SIZE are automatically uploaded as multithreaded\- multipart, smaller files are uploaded using the traditional method. SIZE is in Mega\-Bytes, default chunk size is 15MB, minimum allowed chunk size is 5MB, maximum is 5GB. .TP \fB\-\-list\-md5\fR Include MD5 sums in bucket listings (only for 'ls' command). .TP \fB\-H\fR, \fB\-\-human\-readable\-sizes\fR Print sizes in human readable form (eg 1kB instead of 1234). .TP \fB\-\-ws\-index\fR=WEBSITE_INDEX Name of index\-document (only for [ws\-create] command) .TP \fB\-\-ws\-error\fR=WEBSITE_ERROR Name of error\-document (only for [ws\-create] command) .TP \fB\-\-expiry\-date\fR=EXPIRY_DATE Indicates when the expiration rule takes effect. (only for [expire] command) .TP \fB\-\-expiry\-days\fR=EXPIRY_DAYS Indicates the number of days after object creation the expiration rule takes effect. (only for [expire] command) .TP \fB\-\-expiry\-prefix\fR=EXPIRY_PREFIX Identifying one or more objects with the prefix to which the expiration rule applies. (only for [expire] command) .TP \fB\-\-progress\fR Display progress meter (default on TTY). .TP \fB\-\-no\-progress\fR Don't display progress meter (default on non-TTY). .TP \fB\-\-stats\fR Give some file-transfer stats. .TP \fB\-\-enable\fR Enable given CloudFront distribution (only for [cfmodify] command) .TP \fB\-\-disable\fR Enable given CloudFront distribution (only for [cfmodify] command) .TP \fB\-\-cf\-invalidate\fR Invalidate the uploaded filed in CloudFront. Also see [cfinval] command. .TP \fB\-\-cf\-invalidate\-default\-index\fR When using Custom Origin and S3 static website, invalidate the default index file. .TP \fB\-\-cf\-no\-invalidate\-default\-index\-root\fR When using Custom Origin and S3 static website, don't invalidate the path to the default index file. .TP \fB\-\-cf\-add\-cname\fR=CNAME Add given CNAME to a CloudFront distribution (only for [cfcreate] and [cfmodify] commands) .TP \fB\-\-cf\-remove\-cname\fR=CNAME Remove given CNAME from a CloudFront distribution (only for [cfmodify] command) .TP \fB\-\-cf\-comment\fR=COMMENT Set COMMENT for a given CloudFront distribution (only for [cfcreate] and [cfmodify] commands) .TP \fB\-\-cf\-default\-root\-object\fR=DEFAULT_ROOT_OBJECT Set the default root object to return when no object is specified in the URL. Use a relative path, i.e. default/index.html instead of /default/index.html or s3://bucket/default/index.html (only for [cfcreate] and [cfmodify] commands) .TP \fB\-v\fR, \fB\-\-verbose\fR Enable verbose output. .TP \fB\-d\fR, \fB\-\-debug\fR Enable debug output. .TP \fB\-\-version\fR Show s3cmd version (1.6.1) and exit. .TP \fB\-F\fR, \fB\-\-follow\-symlinks\fR Follow symbolic links as if they are regular files .TP \fB\-\-cache\-file\fR=FILE Cache FILE containing local source MD5 values .TP \fB\-q\fR, \fB\-\-quiet\fR Silence output on stdout .TP \fB\-\-ca\-certs\fR=CA_CERTS_FILE Path to SSL CA certificate FILE (instead of system default) .TP \fB\-\-check\-certificate\fR Check SSL certificate validity .TP \fB\-\-no\-check\-certificate\fR Do not check SSL certificate validity .TP \fB\-\-check\-hostname\fR Check SSL certificate hostname validity .TP \fB\-\-no\-check\-hostname\fR Do not check SSL certificate hostname validity .TP \fB\-\-signature\-v2\fR Use AWS Signature version 2 instead of newer signature methods. Helpful for S3\-like systems that don't have AWS Signature v4 yet. .TP \fB\-\-limit\-rate\fR=LIMITRATE Limit the upload or download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix .TP \fB\-\-requester\-pays\fR Set the REQUESTER PAYS flag for operations .TP \fB\-l\fR, \fB\-\-long\-listing\fR Produce long listing [ls] .TP \fB\-\-stop\-on\-error\fR stop if error in transfer .TP \fB\-\-content\-disposition\fR=CONTENT_DISPOSITION Provide a Content\-Disposition for signed URLs, e.g., "inline; filename=myvideo.mp4" .TP \fB\-\-content\-type\fR=CONTENT_TYPE Provide a Content\-Type for signed URLs, e.g., "video/mp4" .SH EXAMPLES One of the most powerful commands of \fIs3cmd\fR is \fBs3cmd sync\fR used for synchronising complete directory trees to or from remote S3 storage. To some extent \fBs3cmd put\fR and \fBs3cmd get\fR share a similar behaviour with \fBsync\fR. .PP Basic usage common in backup scenarios is as simple as: .nf s3cmd sync /local/path/ s3://test\-bucket/backup/ .fi .PP This command will find all files under /local/path directory and copy them to corresponding paths under s3://test\-bucket/backup on the remote side. For example: .nf /local/path/\fBfile1.ext\fR \-> s3://bucket/backup/\fBfile1.ext\fR /local/path/\fBdir123/file2.bin\fR \-> s3://bucket/backup/\fBdir123/file2.bin\fR .fi .PP However if the local path doesn't end with a slash the last directory's name is used on the remote side as well. Compare these with the previous example: .nf s3cmd sync /local/path s3://test\-bucket/backup/ .fi will sync: .nf /local/\fBpath/file1.ext\fR \-> s3://bucket/backup/\fBpath/file1.ext\fR /local/\fBpath/dir123/file2.bin\fR \-> s3://bucket/backup/\fBpath/dir123/file2.bin\fR .fi .PP To retrieve the files back from S3 use inverted syntax: .nf s3cmd sync s3://test\-bucket/backup/ ~/restore/ .fi that will download files: .nf s3://bucket/backup/\fBfile1.ext\fR \-> ~/restore/\fBfile1.ext\fR s3://bucket/backup/\fBdir123/file2.bin\fR \-> ~/restore/\fBdir123/file2.bin\fR .fi .PP Without the trailing slash on source the behaviour is similar to what has been demonstrated with upload: .nf s3cmd sync s3://test\-bucket/backup ~/restore/ .fi will download the files as: .nf s3://bucket/\fBbackup/file1.ext\fR \-> ~/restore/\fBbackup/file1.ext\fR s3://bucket/\fBbackup/dir123/file2.bin\fR \-> ~/restore/\fBbackup/dir123/file2.bin\fR .fi .PP All source file names, the bold ones above, are matched against \fBexclude\fR rules and those that match are then re\-checked against \fBinclude\fR rules to see whether they should be excluded or kept in the source list. .PP For the purpose of \fB\-\-exclude\fR and \fB\-\-include\fR matching only the bold file names above are used. For instance only \fBpath/file1.ext\fR is tested against the patterns, not \fI/local/\fBpath/file1.ext\fR .PP Both \fB\-\-exclude\fR and \fB\-\-include\fR work with shell\-style wildcards (a.k.a. GLOB). For a greater flexibility s3cmd provides Regular\-expression versions of the two exclude options named \fB\-\-rexclude\fR and \fB\-\-rinclude\fR. The options with ...\fB\-from\fR suffix (eg \-\-rinclude\-from) expect a filename as an argument. Each line of such a file is treated as one pattern. .PP There is only one set of patterns built from all \fB\-\-(r)exclude(\-from)\fR options and similarly for include variant. Any file excluded with eg \-\-exclude can be put back with a pattern found in \-\-rinclude\-from list. .PP Run s3cmd with \fB\-\-dry\-run\fR to verify that your rules work as expected. Use together with \fB\-\-debug\fR get detailed information about matching file names against exclude and include rules. .PP For example to exclude all files with ".jpg" extension except those beginning with a number use: .PP \-\-exclude '*.jpg' \-\-rinclude '[0\-9].*\.jpg' .PP To exclude all files except "*.jpg" extension, use: .PP \-\-exclude '*' \-\-include '*.jpg' .PP To exclude local directory 'somedir', be sure to use a trailing forward slash, as such: .PP \-\-exclude 'somedir/' .PP .SH SEE ALSO For the most up to date list of options run: .B s3cmd \-\-help .br For more info about usage, examples and other related info visit project homepage at: .B http://s3tools.org .SH AUTHOR Written by Michal Ludvig and contributors .SH CONTACT, SUPPORT Preferred way to get support is our mailing list: .br .I s3tools\-general@lists.sourceforge.net .br or visit the project homepage: .br .B http://s3tools.org .SH REPORTING BUGS Report bugs to .I s3tools\-bugs@lists.sourceforge.net .SH COPYRIGHT Copyright \(co 2007\-2015 TGRMN Software \- http://www.tgrmn.com \- and contributors .br .SH LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. .br s3cmd-1.6.1/s3cmd0000775000175000017500000041750112647745544014631 0ustar mdomschmdomsch00000000000000#!/usr/bin/env python2 # -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## s3cmd - S3 client ## ## Authors : Michal Ludvig and contributors ## Copyright : TGRMN Software - http://www.tgrmn.com - and contributors ## Website : http://s3tools.org ## License : GPL Version 2 ## -------------------------------------------------------------------- ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## -------------------------------------------------------------------- import sys if float("%d.%d" %(sys.version_info[0], sys.version_info[1])) < 2.6: sys.stderr.write(u"ERROR: Python 2.6 or higher required, sorry.\n") sys.exit(EX_OSFILE) import logging import time import os import re import errno import glob import traceback import codecs import locale import subprocess import htmlentitydefs import socket import shutil import tempfile from copy import copy from optparse import OptionParser, Option, OptionValueError, IndentedHelpFormatter from logging import debug, info, warning, error from distutils.spawn import find_executable from ssl import SSLError def output(message): sys.stdout.write(message + "\n") sys.stdout.flush() def check_args_type(args, type, verbose_type): """NOTE: This function looks like to not be used.""" for arg in args: if S3Uri(arg).type != type: raise ParameterError("Expecting %s instead of '%s'" % (verbose_type, arg)) def cmd_du(args): s3 = S3(Config()) if len(args) > 0: uri = S3Uri(args[0]) if uri.type == "s3" and uri.has_bucket(): subcmd_bucket_usage(s3, uri) return EX_OK subcmd_bucket_usage_all(s3) return EX_OK def subcmd_bucket_usage_all(s3): """ Returns: sum of bucket sizes as integer Raises: S3Error """ response = s3.list_all_buckets() buckets_size = 0 for bucket in response["list"]: size = subcmd_bucket_usage(s3, S3Uri("s3://" + bucket["Name"])) if size != None: buckets_size += size total_size, size_coeff = formatSize(buckets_size, Config().human_readable_sizes) total_size_str = str(total_size) + size_coeff output(u"".rjust(8, "-")) output(u"%s Total" % (total_size_str.ljust(8))) return size def subcmd_bucket_usage(s3, uri): """ Returns: bucket size as integer Raises: S3Error """ bucket_size = 0 object_count = 0 extra_info = u'' try: for _, objects in s3.bucket_list_streaming(uri.bucket(), prefix=uri.object(), recursive=True): for obj in objects: bucket_size += int(obj["Size"]) object_count += 1 except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % bucket) raise except KeyboardInterrupt, e: extra_info = u' [interrupted]' total_size, size_coeff = formatSize(bucket_size, Config().human_readable_sizes) total_size_str = str(total_size) + size_coeff output(u"%s %s objects %s%s" % (total_size_str.ljust(8), object_count, uri, extra_info)) return bucket_size def cmd_ls(args): s3 = S3(Config()) if len(args) > 0: uri = S3Uri(args[0]) if uri.type == "s3" and uri.has_bucket(): subcmd_bucket_list(s3, uri) return EX_OK # If not a s3 type uri or no bucket was provided, list all the buckets subcmd_all_buckets_list(s3) return EX_OK def subcmd_all_buckets_list(s3): response = s3.list_all_buckets() for bucket in sorted(response["list"], key=lambda b:b["Name"]): output(u"%s s3://%s" % (formatDateTime(bucket["CreationDate"]), bucket["Name"])) def cmd_all_buckets_list_all_content(args): s3 = S3(Config()) response = s3.list_all_buckets() for bucket in response["list"]: subcmd_bucket_list(s3, S3Uri("s3://" + bucket["Name"])) output(u"") return EX_OK def subcmd_bucket_list(s3, uri): bucket = uri.bucket() prefix = uri.object() debug(u"Bucket 's3://%s':" % bucket) if prefix.endswith('*'): prefix = prefix[:-1] try: response = s3.bucket_list(bucket, prefix = prefix) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % bucket) raise if cfg.long_listing: format_string = u"%(timestamp)16s %(size)9s%(coeff)1s %(md5)32s %(storageclass)s %(uri)s" elif cfg.list_md5: format_string = u"%(timestamp)16s %(size)9s%(coeff)1s %(md5)32s %(uri)s" else: format_string = u"%(timestamp)16s %(size)9s%(coeff)1s %(uri)s" for prefix in response['common_prefixes']: output(format_string % { "timestamp": "", "size": "DIR", "coeff": "", "md5": "", "storageclass": "", "uri": uri.compose_uri(bucket, prefix["Prefix"])}) for object in response["list"]: md5 = object['ETag'].strip('"\'') storageclass = object.get('StorageClass','') if cfg.list_md5: if '-' in md5: # need to get md5 from the object object_uri = uri.compose_uri(bucket, object["Key"]) info_response = s3.object_info(S3Uri(object_uri)) try: md5 = info_response['s3cmd-attrs']['md5'] except KeyError: pass size, size_coeff = formatSize(object["Size"], Config().human_readable_sizes) output(format_string % { "timestamp": formatDateTime(object["LastModified"]), "size" : str(size), "coeff": size_coeff, "md5" : md5, "storageclass" : storageclass, "uri": uri.compose_uri(bucket, object["Key"]), }) def cmd_bucket_create(args): s3 = S3(Config()) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.bucket_create(uri.bucket(), cfg.bucket_location) output(u"Bucket '%s' created" % uri.uri()) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_website_info(args): s3 = S3(Config()) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.website_info(uri, cfg.bucket_location) if response: output(u"Bucket %s: Website configuration" % uri.uri()) output(u"Website endpoint: %s" % response['website_endpoint']) output(u"Index document: %s" % response['index_document']) output(u"Error document: %s" % response['error_document']) else: output(u"Bucket %s: Unable to receive website configuration." % (uri.uri())) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_website_create(args): s3 = S3(Config()) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.website_create(uri, cfg.bucket_location) output(u"Bucket '%s': website configuration created." % (uri.uri())) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_website_delete(args): s3 = S3(Config()) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.website_delete(uri, cfg.bucket_location) output(u"Bucket '%s': website configuration deleted." % (uri.uri())) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_expiration_set(args): s3 = S3(Config()) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.expiration_set(uri, cfg.bucket_location) if response["status"] is 200: output(u"Bucket '%s': expiration configuration is set." % (uri.uri())) elif response["status"] is 204: output(u"Bucket '%s': expiration configuration is deleted." % (uri.uri())) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_bucket_delete(args): def _bucket_delete_one(uri, retry=True): try: response = s3.bucket_delete(uri.bucket()) output(u"Bucket '%s' removed" % uri.uri()) except S3Error, e: if e.info['Code'] == 'NoSuchBucket': if cfg.force: return EX_OK else: raise if e.info['Code'] == 'BucketNotEmpty' and retry and (cfg.force or cfg.recursive): warning(u"Bucket is not empty. Removing all the objects from it first. This may take some time...") rc = subcmd_batch_del(uri_str = uri.uri()) if rc == EX_OK: return _bucket_delete_one(uri, False) else: output(u"Bucket was not removed") elif S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK s3 = S3(Config()) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) rc = _bucket_delete_one(uri) if rc != EX_OK: return rc return EX_OK def cmd_object_put(args): cfg = Config() s3 = S3(cfg) if len(args) == 0: raise ParameterError("Nothing to upload. Expecting a local file or directory and a S3 URI destination.") ## Normalize URI to convert s3://bkt to s3://bkt/ (trailing slash) destination_base_uri = S3Uri(args.pop()) if destination_base_uri.type != 's3': raise ParameterError("Destination must be S3Uri. Got: %s" % destination_base_uri) destination_base = destination_base_uri.uri() if len(args) == 0: raise ParameterError("Nothing to upload. Expecting a local file or directory.") local_list, single_file_local, exclude_list, total_size_local = fetch_local_list(args, is_src = True) local_count = len(local_list) info(u"Summary: %d local files to upload" % local_count) if local_count == 0: raise ParameterError("Nothing to upload.") if local_count > 0: if not single_file_local and '-' in local_list.keys(): raise ParameterError("Cannot specify multiple local files if uploading from '-' (ie stdin)") elif single_file_local and local_list.keys()[0] == "-" and destination_base.endswith("/"): raise ParameterError("Destination S3 URI must not end with '/' when uploading from stdin.") elif not destination_base.endswith("/"): if not single_file_local: raise ParameterError("Destination S3 URI must end with '/' (ie must refer to a directory on the remote side).") local_list[local_list.keys()[0]]['remote_uri'] = destination_base else: for key in local_list: local_list[key]['remote_uri'] = destination_base + key if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in local_list: if key != "-": nicekey = local_list[key]['full_name'] else: nicekey = "" output(u"upload: '%s' -> '%s'" % (nicekey, local_list[key]['remote_uri'])) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 ret = EX_OK for key in local_list: seq += 1 uri_final = S3Uri(local_list[key]['remote_uri']) extra_headers = copy(cfg.extra_headers) full_name_orig = local_list[key]['full_name'] full_name = full_name_orig seq_label = "[%d of %d]" % (seq, local_count) if Config().encrypt: gpg_exitcode, full_name, extra_headers["x-amz-meta-s3tools-gpgenc"] = gpg_encrypt(full_name_orig) attr_header = _build_attr_header(local_list, key) debug(u"attr_header: %s" % attr_header) extra_headers.update(attr_header) try: response = s3.object_put(full_name, uri_final, extra_headers, extra_label = seq_label) except S3UploadError, exc: error(u"Upload of '%s' failed too many times (Last reason: %s)" % (full_name_orig, exc)) if cfg.stop_on_error: ret = EX_DATAERR error(u"Exiting now because of --stop-on-error") break ret = EX_PARTIAL continue except InvalidFileError, exc: error(u"Upload of '%s' is not possible (Reason: %s)" % (full_name_orig, exc)) ret = EX_PARTIAL if cfg.stop_on_error: ret = EX_OSFILE error(u"Exiting now because of --stop-on-error") break continue if response is not None: speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) if not Config().progress_meter: if full_name_orig != "-": nicekey = full_name_orig else: nicekey = "" output(u"upload: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s) %s" % (nicekey, uri_final, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1], seq_label)) if Config().acl_public: output(u"Public URL of the object is: %s" % (uri_final.public_url())) if Config().encrypt and full_name != full_name_orig: debug(u"Removing temporary encrypted file: %s" % full_name) os.remove(deunicodise(full_name)) return ret def cmd_object_get(args): cfg = Config() s3 = S3(cfg) ## Check arguments: ## if not --recursive: ## - first N arguments must be S3Uri ## - if the last one is S3 make current dir the destination_base ## - if the last one is a directory: ## - take all 'basenames' of the remote objects and ## make the destination name be 'destination_base'+'basename' ## - if the last one is a file or not existing: ## - if the number of sources (N, above) == 1 treat it ## as a filename and save the object there. ## - if there's more sources -> Error ## if --recursive: ## - first N arguments must be S3Uri ## - for each Uri get a list of remote objects with that Uri as a prefix ## - apply exclude/include rules ## - each list item will have MD5sum, Timestamp and pointer to S3Uri ## used as a prefix. ## - the last arg may be '-' (stdout) ## - the last arg may be a local directory - destination_base ## - if the last one is S3 make current dir the destination_base ## - if the last one doesn't exist check remote list: ## - if there is only one item and its_prefix==its_name ## download that item to the name given in last arg. ## - if there are more remote items use the last arg as a destination_base ## and try to create the directory (incl. all parents). ## ## In both cases we end up with a list mapping remote object names (keys) to local file names. ## Each item will be a dict with the following attributes # {'remote_uri', 'local_filename'} download_list = [] if len(args) == 0: raise ParameterError("Nothing to download. Expecting S3 URI.") if S3Uri(args[-1]).type == 'file': destination_base = args.pop() else: destination_base = "." if len(args) == 0: raise ParameterError("Nothing to download. Expecting S3 URI.") remote_list, exclude_list, remote_total_size = fetch_remote_list(args, require_attribs = False) remote_count = len(remote_list) info(u"Summary: %d remote files to download" % remote_count) if remote_count > 0: if destination_base == "-": ## stdout is ok for multiple remote files! for key in remote_list: remote_list[key]['local_filename'] = "-" elif not os.path.isdir(deunicodise(destination_base)): ## We were either given a file name (existing or not) if remote_count > 1: raise ParameterError("Destination must be a directory or stdout when downloading multiple sources.") remote_list[remote_list.keys()[0]]['local_filename'] = destination_base elif os.path.isdir(deunicodise(destination_base)): if destination_base[-1] != os.path.sep: destination_base += os.path.sep for key in remote_list: local_filename = destination_base + key if os.path.sep != "/": local_filename = os.path.sep.join(local_filename.split("/")) remote_list[key]['local_filename'] = local_filename else: raise InternalError("WTF? Is it a dir or not? -- %s" % destination_base) if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"download: '%s' -> '%s'" % (remote_list[key]['object_uri_str'], remote_list[key]['local_filename'])) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 ret = EX_OK for key in remote_list: seq += 1 item = remote_list[key] uri = S3Uri(item['object_uri_str']) ## Encode / Decode destination with "replace" to make sure it's compatible with current encoding destination = unicodise_safe(item['local_filename']) seq_label = "[%d of %d]" % (seq, remote_count) start_position = 0 if destination == "-": ## stdout dst_stream = sys.__stdout__ file_exists = True else: ## File try: file_exists = os.path.exists(deunicodise(destination)) try: dst_stream = open(deunicodise(destination), "ab") except IOError, e: if e.errno == errno.ENOENT: basename = destination[:destination.rindex(os.path.sep)] info(u"Creating directory: %s" % basename) os.makedirs(deunicodise(basename)) dst_stream = open(deunicodise(destination), "ab") else: raise if file_exists: if Config().get_continue: start_position = dst_stream.tell() elif Config().force: start_position = 0L dst_stream.seek(0L) dst_stream.truncate() elif Config().skip_existing: info(u"Skipping over existing file: %s" % (destination)) continue else: dst_stream.close() raise ParameterError(u"File %s already exists. Use either of --force / --continue / --skip-existing or give it a new name." % destination) except IOError, e: error(u"Skipping %s: %s" % (destination, e.strerror)) continue try: response = s3.object_get(uri, dst_stream, destination, start_position = start_position, extra_label = seq_label) except S3DownloadError, e: error(u"%s: Skipping that file. This is usually a transient error, please try again later." % e) if not file_exists: # Delete, only if file didn't exist before! debug(u"object_get failed for '%s', deleting..." % (destination,)) os.unlink(deunicodise(destination)) ret = EX_PARTIAL if cfg.stop_on_error: ret = EX_DATAERR break continue except S3Error, e: if not file_exists: # Delete, only if file didn't exist before! debug(u"object_get failed for '%s', deleting..." % (destination,)) os.unlink(deunicodise(destination)) raise if response["headers"].has_key("x-amz-meta-s3tools-gpgenc"): gpg_decrypt(destination, response["headers"]["x-amz-meta-s3tools-gpgenc"]) response["size"] = os.stat(deunicodise(destination))[6] if response["headers"].has_key("last-modified") and destination != "-": last_modified = time.mktime(time.strptime(response["headers"]["last-modified"], "%a, %d %b %Y %H:%M:%S GMT")) os.utime(deunicodise(destination), (last_modified, last_modified)) debug("set mtime to %s" % last_modified) if not Config().progress_meter and destination != "-": speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) output(u"download: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s)" % (uri, destination, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1])) if Config().delete_after_fetch: s3.object_delete(uri) output(u"File '%s' removed after fetch" % (uri)) return EX_OK def cmd_object_del(args): recursive = Config().recursive for uri_str in args: uri = S3Uri(uri_str) if uri.type != "s3": raise ParameterError("Expecting S3 URI instead of '%s'" % uri_str) if not uri.has_object(): if recursive and not Config().force: raise ParameterError("Please use --force to delete ALL contents of %s" % uri_str) elif not recursive: raise ParameterError("File name required, not only the bucket name. Alternatively use --recursive") if not recursive: rc = subcmd_object_del_uri(uri_str) elif Config().exclude or cfg.max_delete > 0: # subcmd_batch_del_iterative does not support file exclusion and can't # accurately know how many total files will be deleted, so revert to batch delete. rc = subcmd_batch_del(uri_str = uri_str) else: rc = subcmd_batch_del_iterative(uri_str = uri_str) if not rc: return rc return EX_OK def subcmd_batch_del_iterative(uri_str = None, bucket = None): """ Streaming version of batch deletion (doesn't realize whole list in memory before deleting). Differences from subcmd_batch_del: - Does not obey --exclude directives or obey cfg.max_delete (use subcmd_batch_del in those cases) """ if bucket and uri_str: raise ValueError("Pass only one of uri_str or bucket") if bucket: # bucket specified uri_str = "s3://%s" % bucket s3 = S3(cfg) uri = S3Uri(uri_str) bucket = uri.bucket() deleted_bytes = deleted_count = 0 for _, to_delete in s3.bucket_list_streaming(bucket, prefix=uri.object(), recursive=True): if not to_delete: continue if not cfg.dry_run: response = s3.object_batch_delete_uri_strs([uri.compose_uri(bucket, item['Key']) for item in to_delete]) deleted_bytes += sum(int(item["Size"]) for item in to_delete) deleted_count += len(to_delete) output('\n'.join(u"delete: '%s'" % uri.compose_uri(bucket, p['Key']) for p in to_delete)) if deleted_count: # display summary data of deleted files if cfg.stats: stats_info = StatsInfo() stats_info.files_deleted = deleted_count stats_info.size_deleted = deleted_bytes output(stats_info.format_output()) else: total_size, size_coeff = formatSize(deleted_bytes, Config().human_readable_sizes) total_size_str = str(total_size) + size_coeff info(u"Deleted %s objects (%s) from %s" % (deleted_count, total_size_str, uri)) else: warning(u"Remote list is empty.") return EX_OK def subcmd_batch_del(uri_str = None, bucket = None, remote_list = None): """ Returns: EX_OK Raises: ValueError """ def _batch_del(remote_list): s3 = S3(cfg) to_delete = remote_list[:1000] remote_list = remote_list[1000:] while len(to_delete): debug(u"Batch delete %d, remaining %d" % (len(to_delete), len(remote_list))) if not cfg.dry_run: response = s3.object_batch_delete(to_delete) output('\n'.join((u"delete: '%s'" % to_delete[p]['object_uri_str']) for p in to_delete)) to_delete = remote_list[:1000] remote_list = remote_list[1000:] if remote_list is not None and len(remote_list) == 0: return False if len([item for item in [uri_str, bucket, remote_list] if item]) != 1: raise ValueError("One and only one of 'uri_str', 'bucket', 'remote_list' can be specified.") if bucket: # bucket specified uri_str = "s3://%s" % bucket if remote_list is None: # uri_str specified remote_list, exclude_list, remote_total_size = fetch_remote_list(uri_str, require_attribs = False) if len(remote_list) == 0: warning(u"Remote list is empty.") return EX_OK if cfg.max_delete > 0 and len(remote_list) > cfg.max_delete: warning(u"delete: maximum requested number of deletes would be exceeded, none performed.") return EX_OK _batch_del(remote_list) if cfg.dry_run: warning(u"Exiting now because of --dry-run") return EX_OK def subcmd_object_del_uri(uri_str, recursive = None): """ Returns: True if XXX, False if XXX Raises: ValueError """ s3 = S3(cfg) if recursive is None: recursive = cfg.recursive remote_list, exclude_list, remote_total_size = fetch_remote_list(uri_str, require_attribs = False, recursive = recursive) remote_count = len(remote_list) info(u"Summary: %d remote files to delete" % remote_count) if cfg.max_delete > 0 and remote_count > cfg.max_delete: warning(u"delete: maximum requested number of deletes would be exceeded, none performed.") return False if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"delete: %s" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return True for key in remote_list: item = remote_list[key] response = s3.object_delete(S3Uri(item['object_uri_str'])) output(u"delete: '%s'" % item['object_uri_str']) return True def cmd_object_restore(args): s3 = S3(cfg) if cfg.restore_days < 1: raise ParameterError("You must restore a file for 1 or more days") remote_list, exclude_list, remote_total_size = fetch_remote_list(args, require_attribs = False, recursive = cfg.recursive) remote_count = len(remote_list) info(u"Summary: Restoring %d remote files for %d days" % (remote_count, cfg.restore_days)) if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"restore: '%s'" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return EX_OK for key in remote_list: item = remote_list[key] uri = S3Uri(item['object_uri_str']) if not item['object_uri_str'].endswith("/"): try: response = s3.object_restore(S3Uri(item['object_uri_str'])) output(u"restore: '%s'" % item['object_uri_str']) except S3Error, e: if e.code == "RestoreAlreadyInProgress": warning("%s: %s" % (e.message, item['object_uri_str'])) else: raise e else: debug(u"Skipping directory since only files may be restored") return EX_OK def subcmd_cp_mv(args, process_fce, action_str, message): if action_str != 'modify' and len(args) < 2: raise ParameterError("Expecting two or more S3 URIs for " + action_str) if action_str == 'modify' and len(args) < 1: raise ParameterError("Expecting one or more S3 URIs for " + action_str) if action_str != 'modify': dst_base_uri = S3Uri(args.pop()) else: dst_base_uri = S3Uri(args[-1]) scoreboard = ExitScoreboard() if dst_base_uri.type != "s3": raise ParameterError("Destination must be S3 URI. To download a file use 'get' or 'sync'.") destination_base = dst_base_uri.uri() remote_list, exclude_list, remote_total_size = fetch_remote_list(args, require_attribs = False) remote_count = len(remote_list) info(u"Summary: %d remote files to %s" % (remote_count, action_str)) if cfg.recursive: if not destination_base.endswith("/"): destination_base += "/" for key in remote_list: remote_list[key]['dest_name'] = destination_base + key else: for key in remote_list: if destination_base.endswith("/"): remote_list[key]['dest_name'] = destination_base + key else: remote_list[key]['dest_name'] = destination_base if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"%s: '%s' -> '%s'" % (action_str, remote_list[key]['object_uri_str'], remote_list[key]['dest_name'])) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 for key in remote_list: seq += 1 seq_label = "[%d of %d]" % (seq, remote_count) item = remote_list[key] src_uri = S3Uri(item['object_uri_str']) dst_uri = S3Uri(item['dest_name']) extra_headers = copy(cfg.extra_headers) try: response = process_fce(src_uri, dst_uri, extra_headers) output(message % { "src" : src_uri, "dst" : dst_uri }) if Config().acl_public: info(u"Public URL is: %s" % dst_uri.public_url()) scoreboard.success() except S3Error, e: if e.code == "NoSuchKey": scoreboard.notfound() warning(u"Key not found %s" % item['object_uri_str']) else: scoreboard.failed() if cfg.stop_on_error: break return scoreboard.rc() def cmd_cp(args): s3 = S3(Config()) return subcmd_cp_mv(args, s3.object_copy, "copy", u"remote copy: '%(src)s' -> '%(dst)s'") def cmd_modify(args): s3 = S3(Config()) return subcmd_cp_mv(args, s3.object_modify, "modify", u"modify: '%(src)s'") def cmd_mv(args): s3 = S3(Config()) return subcmd_cp_mv(args, s3.object_move, "move", u"move: '%(src)s' -> '%(dst)s'") def cmd_info(args): s3 = S3(Config()) while (len(args)): uri_arg = args.pop(0) uri = S3Uri(uri_arg) if uri.type != "s3" or not uri.has_bucket(): raise ParameterError("Expecting S3 URI instead of '%s'" % uri_arg) try: if uri.has_object(): info = s3.object_info(uri) output(u"%s (object):" % uri.uri()) output(u" File size: %s" % info['headers']['content-length']) output(u" Last mod: %s" % info['headers']['last-modified']) output(u" MIME type: %s" % info['headers'].get('content-type', 'none')) output(u" Storage: %s" % info['headers'].get('x-amz-storage-class', 'STANDARD')) md5 = info['headers']['etag'].strip('"\'') try: md5 = info['s3cmd-attrs']['md5'] except KeyError: pass output(u" MD5 sum: %s" % md5) if 'x-amz-server-side-encryption' in info['headers']: output(u" SSE: %s" % info['headers']['x-amz-server-side-encryption']) else: output(u" SSE: none") else: info = s3.bucket_info(uri) output(u"%s (bucket):" % uri.uri()) output(u" Location: %s" % info['bucket-location']) output(u" Payer: %s" % info['requester-pays']) try: expiration = s3.expiration_info(uri, cfg.bucket_location) expiration_desc = "Expiration Rule: " if expiration['prefix'] == "": expiration_desc += "all objects in this bucket " else: expiration_desc += "objects with key prefix '" + expiration['prefix'] + "' " expiration_desc += "will expire in '" if expiration['days']: expiration_desc += expiration['days'] + "' day(s) after creation" elif expiration['date']: expiration_desc += expiration['date'] + "' " output(u" %s" % expiration_desc) except: output(u" Expiration Rule: none") acl = s3.get_acl(uri) acl_grant_list = acl.getGrantList() try: policy = s3.get_policy(uri) output(u" policy: %s" % policy) except: output(u" policy: none") try: cors = s3.get_cors(uri) output(u" cors: %s" % cors) except: output(u" cors: none") for grant in acl_grant_list: output(u" ACL: %s: %s" % (grant['grantee'], grant['permission'])) if acl.isAnonRead(): output(u" URL: %s" % uri.public_url()) if uri.has_object(): for header, value in info['headers'].iteritems(): if header.startswith('x-amz-meta-'): output(u" %s: %s" % (header, value)) except S3Error, e: if S3.codes.has_key(e.info["Code"]): error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def filedicts_to_keys(*args): keys = set() for a in args: keys.update(a.keys()) keys = list(keys) keys.sort() return keys def cmd_sync_remote2remote(args): s3 = S3(Config()) # Normalise s3://uri (e.g. assert trailing slash) destination_base = S3Uri(args[-1]).uri() destbase_with_source_list = set() for source_arg in args[:-1]: if source_arg.endswith('/'): destbase_with_source_list.add(destination_base) else: destbase_with_source_list.add(os.path.join(destination_base, os.path.basename(source_arg))) stats_info = StatsInfo() src_list, src_exclude_list, remote_total_size = fetch_remote_list(args[:-1], recursive = True, require_attribs = True) dst_list, dst_exclude_list, _ = fetch_remote_list(destbase_with_source_list, recursive = True, require_attribs = True) src_count = len(src_list) orig_src_count = src_count dst_count = len(dst_list) deleted_count = 0 info(u"Found %d source files, %d destination files" % (src_count, dst_count)) src_list, dst_list, update_list, copy_pairs = compare_filelists(src_list, dst_list, src_remote = True, dst_remote = True) src_count = len(src_list) update_count = len(update_list) dst_count = len(dst_list) print(u"Summary: %d source files to copy, %d files at destination to delete" % (src_count + update_count, dst_count)) ### Populate 'target_uri' only if we've got something to sync from src to dst for key in src_list: src_list[key]['target_uri'] = destination_base + key for key in update_list: update_list[key]['target_uri'] = destination_base + key if cfg.dry_run: keys = filedicts_to_keys(src_exclude_list, dst_exclude_list) for key in keys: output(u"exclude: %s" % key) if cfg.delete_removed: for key in dst_list: output(u"delete: '%s'" % dst_list[key]['object_uri_str']) for key in src_list: output(u"remote copy: '%s' -> '%s'" % (src_list[key]['object_uri_str'], src_list[key]['target_uri'])) for key in update_list: output(u"remote copy: '%s' -> '%s'" % (update_list[key]['object_uri_str'], update_list[key]['target_uri'])) warning(u"Exiting now because of --dry-run") return EX_OK # if there are copy pairs, we can't do delete_before, on the chance # we need one of the to-be-deleted files as a copy source. if len(copy_pairs) > 0: cfg.delete_after = True if cfg.delete_removed and orig_src_count == 0 and len(dst_list) and not cfg.force: warning(u"delete: cowardly refusing to delete because no source files were found. Use --force to override.") cfg.delete_removed = False # Delete items in destination that are not in source if cfg.delete_removed and not cfg.delete_after: subcmd_batch_del(remote_list = dst_list) deleted_count = len(dst_list) def _upload(src_list, seq, src_count): file_list = src_list.keys() file_list.sort() ret = EX_OK total_nb_files = 0 total_size = 0 for file in file_list: seq += 1 item = src_list[file] src_uri = S3Uri(item['object_uri_str']) dst_uri = S3Uri(item['target_uri']) seq_label = "[%d of %d]" % (seq, src_count) extra_headers = copy(cfg.extra_headers) try: response = s3.object_copy(src_uri, dst_uri, extra_headers) output("remote copy: '%(src)s' -> '%(dst)s'" % { "src" : src_uri, "dst" : dst_uri }) total_nb_files += 1 total_size += item.get(u'size', 0) except S3Error, e: ret = EX_PARTIAL error("File '%(src)s' could not be copied: %(e)s" % { "src" : src_uri, "e" : e }) if cfg.stop_on_error: raise return ret, seq, total_nb_files, total_size # Perform the synchronization of files timestamp_start = time.time() seq = 0 ret, seq, nb_files, size = _upload(src_list, seq, src_count + update_count) total_files_copied = nb_files total_size_copied = size status, seq, nb_files, size = _upload(update_list, seq, src_count + update_count) if ret == EX_OK: ret = status total_files_copied += nb_files total_size_copied += size n_copied, bytes_saved, failed_copy_files = remote_copy(s3, copy_pairs, destination_base) total_files_copied += n_copied total_size_copied += bytes_saved #process files not copied debug("Process files that was not remote copied") failed_copy_count = len (failed_copy_files) for key in failed_copy_files: failed_copy_files[key]['target_uri'] = destination_base + key status, seq, nb_files, size = _upload(failed_copy_files, seq, src_count + update_count + failed_copy_count) if ret == EX_OK: ret = status total_files_copied += nb_files total_size_copied += size # Delete items in destination that are not in source if cfg.delete_removed and cfg.delete_after: subcmd_batch_del(remote_list = dst_list) deleted_count = len(dst_list) stats_info.files = orig_src_count stats_info.size = remote_total_size stats_info.files_copied = total_files_copied stats_info.size_copied = total_size_copied stats_info.files_deleted = deleted_count total_elapsed = max(1.0, time.time() - timestamp_start) outstr = "Done. Copied %d files in %0.1f seconds, %0.2f files/s." % (total_files_copied, total_elapsed, seq/total_elapsed) if cfg.stats: outstr += stats_info.format_output() output(outstr) elif seq > 0: output(outstr) else: info(outstr) return ret def cmd_sync_remote2local(args): def _do_deletes(local_list): total_size = 0 if cfg.max_delete > 0 and len(local_list) > cfg.max_delete: warning(u"delete: maximum requested number of deletes would be exceeded, none performed.") return total_size for key in local_list: os.unlink(deunicodise(local_list[key]['full_name'])) output(u"delete: '%s'" % local_list[key]['full_name']) total_size += local_list[key].get(u'size', 0) return len(local_list), total_size s3 = S3(Config()) destination_base = args[-1] source_args = args[:-1] fetch_source_args = args[:-1] if not destination_base.endswith(os.path.sep): if fetch_source_args[0].endswith(u'/') or len(fetch_source_args) > 1: raise ParameterError("Destination must be a directory and end with '/' when downloading multiple sources.") elif fetch_source_args[0].endswith(u'/'): fetch_source_args[0] += u'/' stats_info = StatsInfo() remote_list, src_exclude_list, remote_total_size = fetch_remote_list(fetch_source_args, recursive = True, require_attribs = True) # - The source path is either like "/myPath/my_src_folder" and # the user want to download this single folder and Optionally only delete # things that have been removed inside this folder. For this case, we only # have to look inside destination_base/my_src_folder and not at the root of # destination_base. # - Or like "/myPath/my_src_folder/" and the user want to have the sync # with the content of this folder destbase_with_source_list = set() for source_arg in fetch_source_args: if source_arg.endswith('/'): if destination_base.endswith(os.path.sep): destbase_with_source_list.add(destination_base) else: destbase_with_source_list.add(destination_base + os.path.sep) else: destbase_with_source_list.add(os.path.join(destination_base, os.path.basename(source_arg))) local_list, single_file_local, dst_exclude_list, local_total_size = fetch_local_list(destbase_with_source_list, is_src = False, recursive = True) local_count = len(local_list) remote_count = len(remote_list) orig_remote_count = remote_count info(u"Found %d remote files, %d local files" % (remote_count, local_count)) remote_list, local_list, update_list, copy_pairs = compare_filelists(remote_list, local_list, src_remote = True, dst_remote = False) local_count = len(local_list) remote_count = len(remote_list) update_count = len(update_list) copy_pairs_count = len(copy_pairs) info(u"Summary: %d remote files to download, %d local files to delete, %d local files to hardlink" % (remote_count + update_count, local_count, copy_pairs_count)) def _set_local_filename(remote_list, destination_base, source_args): if len(remote_list) == 0: return if destination_base.endswith(os.path.sep): if not os.path.exists(deunicodise(destination_base)): if not cfg.dry_run: os.makedirs(deunicodise(destination_base)) if not os.path.isdir(deunicodise(destination_base)): raise ParameterError("Destination is not an existing directory") elif len(remote_list) == 1 and \ source_args[0] == remote_list[remote_list.keys()[0]].get(u'object_uri_str', ''): if os.path.isdir(deunicodise(destination_base)): raise ParameterError("Destination already exists and is a directory") remote_list[remote_list.keys()[0]]['local_filename'] = destination_base return if destination_base[-1] != os.path.sep: destination_base += os.path.sep for key in remote_list: local_filename = destination_base + key if os.path.sep != "/": local_filename = os.path.sep.join(local_filename.split("/")) remote_list[key]['local_filename'] = local_filename _set_local_filename(remote_list, destination_base, source_args) _set_local_filename(update_list, destination_base, source_args) if cfg.dry_run: keys = filedicts_to_keys(src_exclude_list, dst_exclude_list) for key in keys: output(u"exclude: %s" % key) if cfg.delete_removed: for key in local_list: output(u"delete: '%s'" % local_list[key]['full_name']) for key in remote_list: output(u"download: '%s' -> '%s'" % (remote_list[key]['object_uri_str'], remote_list[key]['local_filename'])) for key in update_list: output(u"download: '%s' -> '%s'" % (update_list[key]['object_uri_str'], update_list[key]['local_filename'])) warning(u"Exiting now because of --dry-run") return EX_OK # if there are copy pairs, we can't do delete_before, on the chance # we need one of the to-be-deleted files as a copy source. if len(copy_pairs) > 0: cfg.delete_after = True if cfg.delete_removed and orig_remote_count == 0 and len(local_list) and not cfg.force: warning(u"delete: cowardly refusing to delete because no source files were found. Use --force to override.") cfg.delete_removed = False if cfg.delete_removed and not cfg.delete_after: deleted_count, deleted_size = _do_deletes(local_list) else: deleted_count, deleted_size = (0, 0) def _download(remote_list, seq, total, total_size, dir_cache): original_umask = os.umask(0); os.umask(original_umask); file_list = remote_list.keys() file_list.sort() ret = EX_OK for file in file_list: seq += 1 item = remote_list[file] uri = S3Uri(item['object_uri_str']) dst_file = item['local_filename'] is_empty_directory = dst_file.endswith('/') seq_label = "[%d of %d]" % (seq, total) dst_dir = unicodise(os.path.dirname(deunicodise(dst_file))) if not dir_cache.has_key(dst_dir): dir_cache[dst_dir] = Utils.mkdir_with_parents(dst_dir) if dir_cache[dst_dir] == False: if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise OSError("Download of '%s' failed (Reason: %s destination directory is not writable)" % (file, dst_dir)) error(u"Download of '%s' failed (Reason: %s destination directory is not writable)" % (file, dst_dir)) ret = EX_PARTIAL continue try: chkptfname = '' if not is_empty_directory: # ignore empty directory at S3: debug(u"dst_file=%s" % dst_file) # create temporary files (of type .s3cmd.XXXX.tmp) in the same directory # for downloading and then rename once downloaded chkptfd, chkptfname = tempfile.mkstemp(".tmp",".s3cmd.",os.path.dirname(deunicodise(dst_file))) chkptfname = unicodise(chkptfname) debug(u"created chkptfname=%s" % chkptfname) dst_stream = os.fdopen(chkptfd, "wb") response = s3.object_get(uri, dst_stream, dst_file, extra_label = seq_label) dst_stream.close() # download completed, rename the file to destination os.rename(deunicodise(chkptfname), deunicodise(dst_file)) debug(u"renamed chkptfname=%s to dst_file=%s" % (chkptfname, dst_file)) except OSError, exc: if (exc.errno == errno.EISDIR or exc.errno == errno.ETXTBSY or exc.errno == errno.EPERM or exc.errno == errno.EACCES or exc.errno == errno.EBUSY or exc.errno == errno.EFBIG or exc.errno == errno.ENAMETOOLONG): if exc.errno == errno.EISDIR: error(u"Download of '%s' failed (Reason: %s is a directory)" % (file, dst_file)) elif exc.errno == errno.ETXTBSY: error(u"Download of '%s' failed (Reason: %s is currently open for execute, cannot be overwritten)" % (file, dst_file)) elif (exc.errno == errno.EPERM or exc.errno == errno.EACCES): error(u"Download of '%s' failed (Reason: %s permission denied)" % (file, dst_file)) elif exc.errno == errno.EBUSY: error(u"Download of '%s' failed (Reason: %s is busy)" % (file, dst_file)) elif exc.errno == errno.EFBIG: error(u"Download of '%s' failed (Reason: %s is too big)" % (file, dst_file)) elif exc.errno == errno.ENAMETOOLONG: error(u"Download of '%s' failed (Reason: File Name is too long)" % file) try: # Try to remove the temp file if it exists if chkptfname: os.unlink(deunicodise(chkptfname)) except: pass if cfg.stop_on_error: ret = EX_OSFILE error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue elif (exc.errno == errno.ENOSPC or exc.errno == errno.EDQUOT): error(u"Download of '%s' failed (Reason: No space left)" % file) try: # Try to remove the temp file if it exists if chkptfname: os.unlink(deunicodise(chkptfname)) except: pass raise raise except S3DownloadError, exc: error(u"Download of '%s' failed too many times (Last Reason: %s). " "This is usually a transient error, please try again " "later." % (file, exc)) try: os.unlink(deunicodise(chkptfname)) except Exception, sub_exc: warning(u"Error deleting temporary file %s (Reason: %s)", (chkptfname, sub_exc)) if cfg.stop_on_error: ret = EX_DATAERR error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue except S3Error, exc: warning(u"Remote file '%s'. S3Error: %s" % (exc.resource, exc)) try: os.unlink(deunicodise(chkptfname)) except Exception, sub_exc: warning(u"Error deleting temporary file %s (Reason: %s)", (chkptfname, sub_exc)) if cfg.stop_on_error: raise ret = EX_PARTIAL continue try: # set permissions on destination file if not is_empty_directory: # a normal file mode = 0777 - original_umask; else: # an empty directory, make them readable/executable mode = 0775 debug(u"mode=%s" % oct(mode)) os.chmod(deunicodise(dst_file), mode); except: raise # because we don't upload empty directories, # we can continue the loop here, we won't be setting stat info. # if we do start to upload empty directories, we'll have to reconsider this. if is_empty_directory: continue try: if response.has_key('s3cmd-attrs') and cfg.preserve_attrs: attrs = response['s3cmd-attrs'] if attrs.has_key('mode'): os.chmod(deunicodise(dst_file), int(attrs['mode'])) if attrs.has_key('mtime') or attrs.has_key('atime'): mtime = attrs.has_key('mtime') and int(attrs['mtime']) or int(time.time()) atime = attrs.has_key('atime') and int(attrs['atime']) or int(time.time()) os.utime(deunicodise(dst_file), (atime, mtime)) if attrs.has_key('uid') and attrs.has_key('gid'): uid = int(attrs['uid']) gid = int(attrs['gid']) os.lchown(deunicodise(dst_file),uid,gid) elif response["headers"].has_key("last-modified"): last_modified = time.mktime(time.strptime(response["headers"]["last-modified"], "%a, %d %b %Y %H:%M:%S GMT")) os.utime(deunicodise(dst_file), (last_modified, last_modified)) debug("set mtime to %s" % last_modified) except OSError, e: try: dst_stream.close() os.remove(deunicodise(chkptfname)) except: pass ret = EX_PARTIAL if e.errno == errno.EEXIST: warning(u"%s exists - not overwriting" % dst_file) continue if e.errno in (errno.EPERM, errno.EACCES): warning(u"%s not writable: %s" % (dst_file, e.strerror)) if cfg.stop_on_error: raise e continue raise e except KeyboardInterrupt: try: dst_stream.close() os.remove(deunicodise(chkptfname)) except: pass warning(u"Exiting after keyboard interrupt") return except Exception, e: try: dst_stream.close() os.remove(deunicodise(chkptfname)) except: pass ret = EX_PARTIAL error(u"%s: %s" % (file, e)) if cfg.stop_on_error: raise OSError, e continue # We have to keep repeating this call because # Python 2.4 doesn't support try/except/finally # construction :-( try: dst_stream.close() os.remove(deunicodise(chkptfname)) except: pass speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) if not Config().progress_meter: output(u"download: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s) %s" % (uri, dst_file, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1], seq_label)) total_size += response["size"] if Config().delete_after_fetch: s3.object_delete(uri) output(u"File '%s' removed after syncing" % (uri)) return ret, seq, total_size size_transferred = 0 total_elapsed = 0.0 timestamp_start = time.time() dir_cache = {} seq = 0 ret, seq, size_transferred = _download(remote_list, seq, remote_count + update_count, size_transferred, dir_cache) status, seq, size_transferred = _download(update_list, seq, remote_count + update_count, size_transferred, dir_cache) if ret == EX_OK: ret = status n_copies, size_copies, failed_copy_list = local_copy(copy_pairs, destination_base) _set_local_filename(failed_copy_list, destination_base, source_args) status, seq, size_transferred = _download(failed_copy_list, seq, len(failed_copy_list) + remote_count + update_count, size_transferred, dir_cache) if ret == EX_OK: ret = status if cfg.delete_removed and cfg.delete_after: deleted_count, deleted_size = _do_deletes(local_list) total_elapsed = max(1.0, time.time() - timestamp_start) speed_fmt = formatSize(size_transferred/total_elapsed, human_readable = True, floating_point = True) stats_info.files = orig_remote_count stats_info.size = remote_total_size stats_info.files_transferred = len(failed_copy_list) + remote_count + update_count stats_info.size_transferred = size_transferred stats_info.files_copied = n_copies stats_info.size_copied = size_copies stats_info.files_deleted = deleted_count stats_info.size_deleted = deleted_size # Only print out the result if any work has been done or # if the user asked for verbose output outstr = "Done. Downloaded %d bytes in %0.1f seconds, %0.2f %sB/s." % (size_transferred, total_elapsed, speed_fmt[0], speed_fmt[1]) if cfg.stats: outstr += stats_info.format_output() output(outstr) elif size_transferred > 0: output(outstr) else: info(outstr) return ret def local_copy(copy_pairs, destination_base): # Do NOT hardlink local files by default, that'd be silly # For instance all empty files would become hardlinked together! saved_bytes = 0 failed_copy_list = FileDict() for (src_obj, dst1, relative_file) in copy_pairs: src_file = os.path.join(destination_base, dst1) dst_file = os.path.join(destination_base, relative_file) dst_dir = os.path.dirname(deunicodise(dst_file)) try: if not os.path.isdir(deunicodise(dst_dir)): debug("MKDIR %s" % dst_dir) os.makedirs(deunicodise(dst_dir)) debug(u"Copying %s to %s" % (src_file, dst_file)) shutil.copy2(deunicodise(src_file), deunicodise(dst_file)) saved_bytes += src_obj.get(u'size', 0) except (IOError, OSError), e: warning(u'Unable to copy or hardlink files %s -> %s (Reason: %s)' % (src_file, dst_file, e)) failed_copy_list[relative_file] = src_obj return len(copy_pairs), saved_bytes, failed_copy_list def remote_copy(s3, copy_pairs, destination_base): saved_bytes = 0 failed_copy_list = FileDict() for (src_obj, dst1, dst2) in copy_pairs: debug(u"Remote Copying from %s to %s" % (dst1, dst2)) dst1_uri = S3Uri(destination_base + dst1) dst2_uri = S3Uri(destination_base + dst2) extra_headers = copy(cfg.extra_headers) try: s3.object_copy(dst1_uri, dst2_uri, extra_headers) saved_bytes += src_obj.get(u'size', 0) output(u"remote copy: '%s' -> '%s'" % (dst1, dst2)) except: warning(u"Unable to remote copy files '%s' -> '%s'" % (dst1_uri, dst2_uri)) failed_copy_list[dst2] = src_obj return (len(copy_pairs), saved_bytes, failed_copy_list) def _build_attr_header(local_list, src): attrs = {} if cfg.preserve_attrs: for attr in cfg.preserve_attrs_list: if attr == 'uname': try: val = Utils.getpwuid_username(local_list[src]['uid']) except (KeyError, TypeError): attr = "uid" val = local_list[src].get('uid') if val: warning(u"%s: Owner username not known. Storing UID=%d instead." % (src, val)) elif attr == 'gname': try: val = Utils.getgrgid_grpname(local_list[src].get('gid')) except (KeyError, TypeError): attr = "gid" val = local_list[src].get('gid') if val: warning(u"%s: Owner groupname not known. Storing GID=%d instead." % (src, val)) elif attr != "md5": try: val = getattr(local_list[src]['sr'], 'st_' + attr) except: val = None if val is not None: attrs[attr] = val if 'md5' in cfg.preserve_attrs_list: try: val = local_list.get_md5(src) if val is not None: attrs['md5'] = val except IOError: pass if attrs: result = "" for k in attrs: result += "%s:%s/" % (k, attrs[k]) return {'x-amz-meta-s3cmd-attrs' : result[:-1]} else: return {} def cmd_sync_local2remote(args): def _single_process(source_args): for dest in destinations: ## Normalize URI to convert s3://bkt to s3://bkt/ (trailing slash) destination_base_uri = S3Uri(dest) if destination_base_uri.type != 's3': raise ParameterError("Destination must be S3Uri. Got: %s" % destination_base_uri) destination_base = destination_base_uri.uri() return _child(destination_base, source_args) def _parent(source_args): # Now that we've done all the disk I/O to look at the local file system and # calculate the md5 for each file, fork for each destination to upload to them separately # and in parallel child_pids = [] ret = EX_OK for dest in destinations: ## Normalize URI to convert s3://bkt to s3://bkt/ (trailing slash) destination_base_uri = S3Uri(dest) if destination_base_uri.type != 's3': raise ParameterError("Destination must be S3Uri. Got: %s" % destination_base_uri) destination_base = destination_base_uri.uri() child_pid = os.fork() if child_pid == 0: os._exit(_child(destination_base, source_args)) else: child_pids.append(child_pid) while len(child_pids): (pid, status) = os.wait() child_pids.remove(pid) if ret == EX_OK: ret = os.WEXITSTATUS(status) return ret def _child(destination_base, source_args): def _set_remote_uri(local_list, destination_base, single_file_local): if len(local_list) > 0: ## Populate 'remote_uri' only if we've got something to upload if not destination_base.endswith("/"): if not single_file_local: raise ParameterError("Destination S3 URI must end with '/' (ie must refer to a directory on the remote side).") local_list[local_list.keys()[0]]['remote_uri'] = destination_base else: for key in local_list: local_list[key]['remote_uri'] = destination_base + key def _upload(local_list, seq, total, total_size): file_list = local_list.keys() file_list.sort() ret = EX_OK for file in file_list: seq += 1 item = local_list[file] src = item['full_name'] uri = S3Uri(item['remote_uri']) seq_label = "[%d of %d]" % (seq, total) extra_headers = copy(cfg.extra_headers) try: attr_header = _build_attr_header(local_list, file) debug(u"attr_header: %s" % attr_header) extra_headers.update(attr_header) response = s3.object_put(src, uri, extra_headers, extra_label = seq_label) except S3UploadError, exc: error(u"Upload of '%s' failed too many times (Last reason: %s)" % (item['full_name'], exc)) if cfg.stop_on_error: ret = EX_DATAERR error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue except InvalidFileError, exc: error(u"Upload of '%s' is not possible (Reason: %s)" % (item['full_name'], exc)) ret = EX_PARTIAL if cfg.stop_on_error: ret = EX_OSFILE error(u"Exiting now because of --stop-on-error") raise continue speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) if not cfg.progress_meter: output(u"upload: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s) %s" % (item['full_name'], uri, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1], seq_label)) total_size += response["size"] uploaded_objects_list.append(uri.object()) return ret, seq, total_size stats_info = StatsInfo() local_list, single_file_local, src_exclude_list, local_total_size = fetch_local_list(args[:-1], is_src = True, recursive = True) # - The source path is either like "/myPath/my_src_folder" and # the user want to upload this single folder and optionally only delete # things that have been removed inside this folder. For this case, # we only have to look inside destination_base/my_src_folder and not at # the root of destination_base. # - Or like "/myPath/my_src_folder/" and the user want to have the sync # with the content of this folder destbase_with_source_list = set() for source_arg in source_args: if not source_arg.endswith('/'): destbase_with_source_list.add(os.path.join(destination_base, os.path.basename(source_arg))) else: destbase_with_source_list.add(destination_base) remote_list, dst_exclude_list, remote_total_size = fetch_remote_list(destbase_with_source_list, recursive = True, require_attribs = True) local_count = len(local_list) orig_local_count = local_count remote_count = len(remote_list) info(u"Found %d local files, %d remote files" % (local_count, remote_count)) if single_file_local and len(local_list) == 1 and len(remote_list) == 1: ## Make remote_key same as local_key for comparison if we're dealing with only one file remote_list_entry = remote_list[remote_list.keys()[0]] # Flush remote_list, by the way remote_list = FileDict() remote_list[local_list.keys()[0]] = remote_list_entry local_list, remote_list, update_list, copy_pairs = compare_filelists(local_list, remote_list, src_remote = False, dst_remote = True) local_count = len(local_list) update_count = len(update_list) copy_count = len(copy_pairs) remote_count = len(remote_list) upload_count = local_count + update_count info(u"Summary: %d local files to upload, %d files to remote copy, %d remote files to delete" % (upload_count, copy_count, remote_count)) _set_remote_uri(local_list, destination_base, single_file_local) _set_remote_uri(update_list, destination_base, single_file_local) if cfg.dry_run: keys = filedicts_to_keys(src_exclude_list, dst_exclude_list) for key in keys: output(u"exclude: %s" % key) for key in local_list: output(u"upload: '%s' -> '%s'" % (local_list[key]['full_name'], local_list[key]['remote_uri'])) for key in update_list: output(u"upload: '%s' -> '%s'" % (update_list[key]['full_name'], update_list[key]['remote_uri'])) for (src_obj, dst1, dst2) in copy_pairs: output(u"remote copy: '%s' -> '%s'" % (dst1, dst2)) if cfg.delete_removed: for key in remote_list: output(u"delete: '%s'" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return EX_OK # if there are copy pairs, we can't do delete_before, on the chance # we need one of the to-be-deleted files as a copy source. if len(copy_pairs) > 0: cfg.delete_after = True if cfg.delete_removed and orig_local_count == 0 and len(remote_list) and not cfg.force: warning(u"delete: cowardly refusing to delete because no source files were found. Use --force to override.") cfg.delete_removed = False if cfg.delete_removed and not cfg.delete_after and remote_list: subcmd_batch_del(remote_list = remote_list) size_transferred = 0 total_elapsed = 0.0 timestamp_start = time.time() ret, n, size_transferred = _upload(local_list, 0, upload_count, size_transferred) status, n, size_transferred = _upload(update_list, n, upload_count, size_transferred) if ret == EX_OK: ret = status n_copies, saved_bytes, failed_copy_files = remote_copy(s3, copy_pairs, destination_base) #upload file that could not be copied debug("Process files that was not remote copied") failed_copy_count = len(failed_copy_files) _set_remote_uri(failed_copy_files, destination_base, single_file_local) status, n, size_transferred = _upload(failed_copy_files, n, upload_count + failed_copy_count, size_transferred) if ret == EX_OK: ret = status if cfg.delete_removed and cfg.delete_after and remote_list: subcmd_batch_del(remote_list = remote_list) total_elapsed = max(1.0, time.time() - timestamp_start) total_speed = total_elapsed and size_transferred/total_elapsed or 0.0 speed_fmt = formatSize(total_speed, human_readable = True, floating_point = True) stats_info.files = orig_local_count stats_info.size = local_total_size stats_info.files_transferred = upload_count + failed_copy_count stats_info.size_transferred = size_transferred stats_info.files_copied = n_copies stats_info.size_copied = saved_bytes stats_info.files_deleted = remote_count # Only print out the result if any work has been done or # if the user asked for verbose output outstr = "Done. Uploaded %d bytes in %0.1f seconds, %0.2f %sB/s." % (size_transferred, total_elapsed, speed_fmt[0], speed_fmt[1]) if cfg.stats: outstr += stats_info.format_output() output(outstr) elif size_transferred + saved_bytes > 0: output(outstr) else: info(outstr) return ret def _invalidate_on_cf(destination_base_uri): cf = CloudFront(cfg) default_index_file = None if cfg.invalidate_default_index_on_cf or cfg.invalidate_default_index_root_on_cf: info_response = s3.website_info(destination_base_uri, cfg.bucket_location) if info_response: default_index_file = info_response['index_document'] if len(default_index_file) < 1: default_index_file = None result = cf.InvalidateObjects(destination_base_uri, uploaded_objects_list, default_index_file, cfg.invalidate_default_index_on_cf, cfg.invalidate_default_index_root_on_cf) if result['status'] == 201: output("Created invalidation request for %d paths" % len(uploaded_objects_list)) output("Check progress with: s3cmd cfinvalinfo cf://%s/%s" % (result['dist_id'], result['request_id'])) # main execution s3 = S3(cfg) uploaded_objects_list = [] if cfg.encrypt: error(u"S3cmd 'sync' doesn't yet support GPG encryption, sorry.") error(u"Either use unconditional 's3cmd put --recursive'") error(u"or disable encryption with --no-encrypt parameter.") sys.exit(EX_USAGE) for arg in args[:-1]: if not os.path.exists(deunicodise(arg)): raise ParameterError("Invalid source: '%s' is not an existing file or directory" % arg) destinations = [args[-1]] if cfg.additional_destinations: destinations = destinations + cfg.additional_destinations if 'fork' not in os.__all__ or len(destinations) < 2: ret = _single_process(args[:-1]) destination_base_uri = S3Uri(destinations[-1]) if cfg.invalidate_on_cf: if len(uploaded_objects_list) == 0: info("Nothing to invalidate in CloudFront") else: _invalidate_on_cf(destination_base_uri) else: ret = _parent(args[:-1]) if cfg.invalidate_on_cf: error(u"You cannot use both --cf-invalidate and --add-destination.") return(EX_USAGE) return ret def cmd_sync(args): if (len(args) < 2): raise ParameterError("Too few parameters! Expected: %s" % commands['sync']['param']) if cfg.delay_updates: warning(u"`delay-updates` is obsolete.") for arg in args: if arg == u'-': raise ParameterError("Stdin or stdout ('-') can't be used for a source or a destination with the sync command.") if S3Uri(args[0]).type == "file" and S3Uri(args[-1]).type == "s3": return cmd_sync_local2remote(args) if S3Uri(args[0]).type == "s3" and S3Uri(args[-1]).type == "file": return cmd_sync_remote2local(args) if S3Uri(args[0]).type == "s3" and S3Uri(args[-1]).type == "s3": return cmd_sync_remote2remote(args) raise ParameterError("Invalid source/destination: '%s'" % "' '".join(args)) def cmd_setacl(args): s3 = S3(cfg) set_to_acl = cfg.acl_public and "Public" or "Private" if not cfg.recursive: old_args = args args = [] for arg in old_args: uri = S3Uri(arg) if not uri.has_object(): if cfg.acl_public != None: info("Setting bucket-level ACL for %s to %s" % (uri.uri(), set_to_acl)) else: info("Setting bucket-level ACL for %s" % (uri.uri())) if not cfg.dry_run: update_acl(s3, uri) else: args.append(arg) remote_list, exclude_list, _ = fetch_remote_list(args) remote_count = len(remote_list) info(u"Summary: %d remote files to update" % remote_count) if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"setacl: '%s'" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 for key in remote_list: seq += 1 seq_label = "[%d of %d]" % (seq, remote_count) uri = S3Uri(remote_list[key]['object_uri_str']) update_acl(s3, uri, seq_label) return EX_OK def cmd_setpolicy(args): s3 = S3(cfg) uri = S3Uri(args[1]) policy_file = args[0] policy = open(deunicodise(policy_file), 'r').read() if cfg.dry_run: return EX_OK response = s3.set_policy(uri, policy) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) if response['status'] == 204: output(u"%s: Policy updated" % uri) return EX_OK def cmd_delpolicy(args): s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.delete_policy(uri) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) output(u"%s: Policy deleted" % uri) return EX_OK def cmd_setcors(args): s3 = S3(cfg) uri = S3Uri(args[1]) cors_file = args[0] cors = open(deunicodise(cors_file), 'r').read() if cfg.dry_run: return EX_OK response = s3.set_cors(uri, cors) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) if response['status'] == 204: output(u"%s: CORS updated" % uri) return EX_OK def cmd_delcors(args): s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.delete_cors(uri) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) output(u"%s: CORS deleted" % uri) return EX_OK def cmd_set_payer(args): s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.set_payer(uri) if response['status'] == 200: output(u"%s: Payer updated" % uri) return EX_OK else: output(u"%s: Payer NOT updated" % uri) return EX_CONFLICT def cmd_setlifecycle(args): s3 = S3(cfg) uri = S3Uri(args[1]) lifecycle_policy_file = args[0] lifecycle_policy = open(deunicodise(lifecycle_policy_file), 'r').read() if cfg.dry_run: return EX_OK response = s3.set_lifecycle_policy(uri, lifecycle_policy) debug(u"response - %s" % response['status']) if response['status'] == 200: output(u"%s: Lifecycle Policy updated" % uri) return EX_OK def cmd_dellifecycle(args): s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.delete_lifecycle_policy(uri) debug(u"response - %s" % response['status']) output(u"%s: Lifecycle Policy deleted" % uri) return EX_OK def cmd_multipart(args): s3 = S3(cfg) uri = S3Uri(args[0]) #id = '' #if(len(args) > 1): id = args[1] response = s3.get_multipart(uri) debug(u"response - %s" % response['status']) output(u"%s" % uri) tree = getTreeFromXml(response['data']) debug(parseNodes(tree)) output(u"Initiated\tPath\tId") for mpupload in parseNodes(tree): try: output("%s\t%s\t%s" % (mpupload['Initiated'], "s3://" + uri.bucket() + "/" + mpupload['Key'], mpupload['UploadId'])) except KeyError: pass return EX_OK def cmd_abort_multipart(args): '''{"cmd":"abortmp", "label":"abort a multipart upload", "param":"s3://BUCKET Id", "func":cmd_abort_multipart, "argc":2},''' s3 = S3(cfg) uri = S3Uri(args[0]) id = args[1] response = s3.abort_multipart(uri, id) debug(u"response - %s" % response['status']) output(u"%s" % uri) return EX_OK def cmd_list_multipart(args): '''{"cmd":"abortmp", "label":"list a multipart upload", "param":"s3://BUCKET Id", "func":cmd_list_multipart, "argc":2},''' s3 = S3(cfg) uri = S3Uri(args[0]) id = args[1] response = s3.list_multipart(uri, id) debug(u"response - %s" % response['status']) tree = getTreeFromXml(response['data']) output(u"LastModified\t\t\tPartNumber\tETag\tSize") for mpupload in parseNodes(tree): try: output("%s\t%s\t%s\t%s" % (mpupload['LastModified'], mpupload['PartNumber'], mpupload['ETag'], mpupload['Size'])) except: pass return EX_OK def cmd_accesslog(args): s3 = S3(cfg) bucket_uri = S3Uri(args.pop()) if bucket_uri.object(): raise ParameterError("Only bucket name is required for [accesslog] command") if cfg.log_target_prefix == False: accesslog, response = s3.set_accesslog(bucket_uri, enable = False) elif cfg.log_target_prefix: log_target_prefix_uri = S3Uri(cfg.log_target_prefix) if log_target_prefix_uri.type != "s3": raise ParameterError("--log-target-prefix must be a S3 URI") accesslog, response = s3.set_accesslog(bucket_uri, enable = True, log_target_prefix_uri = log_target_prefix_uri, acl_public = cfg.acl_public) else: # cfg.log_target_prefix == None accesslog = s3.get_accesslog(bucket_uri) output(u"Access logging for: %s" % bucket_uri.uri()) output(u" Logging Enabled: %s" % accesslog.isLoggingEnabled()) if accesslog.isLoggingEnabled(): output(u" Target prefix: %s" % accesslog.targetPrefix().uri()) #output(u" Public Access: %s" % accesslog.isAclPublic()) return EX_OK def cmd_sign(args): string_to_sign = args.pop() debug("string-to-sign: %r" % string_to_sign) signature = Crypto.sign_string_v2(string_to_sign) output("Signature: %s" % signature) return EX_OK def cmd_signurl(args): expiry = args.pop() url_to_sign = S3Uri(args.pop()) if url_to_sign.type != 's3': raise ParameterError("Must be S3Uri. Got: %s" % url_to_sign) debug("url to sign: %r" % url_to_sign) signed_url = Crypto.sign_url_v2(url_to_sign, expiry) output(signed_url) return EX_OK def cmd_fixbucket(args): def _unescape(text): ## # Removes HTML or XML character references and entities from a text string. # # @param text The HTML (or XML) source text. # @return The plain text, as a Unicode string, if necessary. # # From: http://effbot.org/zone/re-sub.htm#unescape-html def _unescape_fixup(m): text = m.group(0) if not htmlentitydefs.name2codepoint.has_key('apos'): htmlentitydefs.name2codepoint['apos'] = ord("'") if text[:2] == "&#": # character reference try: if text[:3] == "&#x": return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is text = text.encode('ascii', 'xmlcharrefreplace') return re.sub("&#?\w+;", _unescape_fixup, text) cfg.urlencoding_mode = "fixbucket" s3 = S3(cfg) count = 0 for arg in args: culprit = S3Uri(arg) if culprit.type != "s3": raise ParameterError("Expecting S3Uri instead of: %s" % arg) response = s3.bucket_list_noparse(culprit.bucket(), culprit.object(), recursive = True) r_xent = re.compile("&#x[\da-fA-F]+;") keys = re.findall("(.*?)", response['data'], re.MULTILINE | re.UNICODE) debug("Keys: %r" % keys) for key in keys: if r_xent.search(key): info("Fixing: %s" % key) debug("Step 1: Transforming %s" % key) key_bin = _unescape(key) debug("Step 2: ... to %s" % key_bin) key_new = replace_nonprintables(key_bin) debug("Step 3: ... then to %s" % key_new) src = S3Uri("s3://%s/%s" % (culprit.bucket(), key_bin)) dst = S3Uri("s3://%s/%s" % (culprit.bucket(), key_new)) if cfg.dry_run: output("[--dry-run] File %r would be renamed to %s" % (key_bin, key_new)) continue try: resp_move = s3.object_move(src, dst) if resp_move['status'] == 200: output("File '%r' renamed to '%s'" % (key_bin, key_new)) count += 1 else: error("Something went wrong for: %r" % key) error("Please report the problem to s3tools-bugs@lists.sourceforge.net") except S3Error: error("Something went wrong for: %r" % key) error("Please report the problem to s3tools-bugs@lists.sourceforge.net") if count > 0: warning("Fixed %d files' names. Their ACL were reset to Private." % count) warning("Use 's3cmd setacl --acl-public s3://...' to make") warning("them publicly readable if required.") return EX_OK def resolve_list(lst, args): retval = [] for item in lst: retval.append(item % args) return retval def gpg_command(command, passphrase = ""): debug("GPG command: " + " ".join(command)) command = [deunicodise(cmd_entry) for cmd_entry in command] p = subprocess.Popen(command, stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.STDOUT, close_fds = True) p_stdout, p_stderr = p.communicate(deunicodise(passphrase) + "\n") debug("GPG output:") for line in p_stdout.split("\n"): debug("GPG: " + line) p_exitcode = p.wait() return p_exitcode def gpg_encrypt(filename): tmp_filename = Utils.mktmpfile() args = { "gpg_command" : cfg.gpg_command, "passphrase_fd" : "0", "input_file" : filename, "output_file" : tmp_filename, } info(u"Encrypting file %s to %s..." % (filename, tmp_filename)) command = resolve_list(cfg.gpg_encrypt.split(" "), args) code = gpg_command(command, cfg.gpg_passphrase) return (code, tmp_filename, "gpg") def gpg_decrypt(filename, gpgenc_header = "", in_place = True): tmp_filename = Utils.mktmpfile(filename) args = { "gpg_command" : cfg.gpg_command, "passphrase_fd" : "0", "input_file" : filename, "output_file" : tmp_filename, } info(u"Decrypting file %s to %s..." % (filename, tmp_filename)) command = resolve_list(cfg.gpg_decrypt.split(" "), args) code = gpg_command(command, cfg.gpg_passphrase) if code == 0 and in_place: debug(u"Renaming %s to %s" % (tmp_filename, filename)) os.unlink(deunicodise(filename)) os.rename(deunicodise(tmp_filename), deunicodise(filename)) tmp_filename = filename return (code, tmp_filename) def run_configure(config_file, args): cfg = Config() options = [ ("access_key", "Access Key", "Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables."), ("secret_key", "Secret Key"), ("bucket_location", "Default Region"), ("gpg_passphrase", "Encryption password", "Encryption password is used to protect your files from reading\nby unauthorized persons while in transfer to S3"), ("gpg_command", "Path to GPG program"), ("use_https", "Use HTTPS protocol", "When using secure HTTPS protocol all communication with Amazon S3\nservers is protected from 3rd party eavesdropping. This method is\nslower than plain HTTP, and can only be proxied with Python 2.7 or newer"), ("proxy_host", "HTTP Proxy server name", "On some networks all internet access must go through a HTTP proxy.\nTry setting it here if you can't connect to S3 directly"), ("proxy_port", "HTTP Proxy server port"), ] ## Option-specfic defaults if getattr(cfg, "gpg_command") == "": setattr(cfg, "gpg_command", find_executable("gpg")) if getattr(cfg, "proxy_host") == "" and os.getenv("http_proxy"): re_match=re.match("(http://)?([^:]+):(\d+)", os.getenv("http_proxy")) if re_match: setattr(cfg, "proxy_host", re_match.groups()[1]) setattr(cfg, "proxy_port", re_match.groups()[2]) try: while 1: output(u"\nEnter new values or accept defaults in brackets with Enter.") output(u"Refer to user manual for detailed description of all options.") for option in options: prompt = option[1] ## Option-specific handling if option[0] == 'proxy_host' and getattr(cfg, 'use_https') == True and sys.hexversion < 0x02070000: setattr(cfg, option[0], "") continue if option[0] == 'proxy_port' and getattr(cfg, 'proxy_host') == "": setattr(cfg, option[0], 0) continue try: val = getattr(cfg, option[0]) if type(val) is bool: val = val and "Yes" or "No" if val not in (None, ""): prompt += " [%s]" % val except AttributeError: pass if len(option) >= 3: output(u"\n%s" % option[2]) val = raw_input(prompt + ": ") if val != "": if type(getattr(cfg, option[0])) is bool: # Turn 'Yes' into True, everything else into False val = val.lower().startswith('y') setattr(cfg, option[0], val) output(u"\nNew settings:") for option in options: output(u" %s: %s" % (option[1], getattr(cfg, option[0]))) val = raw_input("\nTest access with supplied credentials? [Y/n] ") if val.lower().startswith("y") or val == "": try: # Default, we try to list 'all' buckets which requires # ListAllMyBuckets permission if len(args) == 0: output(u"Please wait, attempting to list all buckets...") S3(Config()).bucket_list("", "") else: # If user specified a bucket name directly, we check it and only it. # Thus, access check can succeed even if user only has access to # to a single bucket and not ListAllMyBuckets permission. output(u"Please wait, attempting to list bucket: " + args[0]) uri = S3Uri(args[0]) if uri.type == "s3" and uri.has_bucket(): S3(Config()).bucket_list(uri.bucket(), "") else: raise Exception(u"Invalid bucket uri: " + args[0]) output(u"Success. Your access key and secret key worked fine :-)") output(u"\nNow verifying that encryption works...") if not getattr(cfg, "gpg_command") or not getattr(cfg, "gpg_passphrase"): output(u"Not configured. Never mind.") else: if not getattr(cfg, "gpg_command"): raise Exception("Path to GPG program not set") if not os.path.isfile(deunicodise(getattr(cfg, "gpg_command"))): raise Exception("GPG program not found") filename = Utils.mktmpfile() f = open(deunicodise(filename), "w") f.write(os.sys.copyright) f.close() ret_enc = gpg_encrypt(filename) ret_dec = gpg_decrypt(ret_enc[1], ret_enc[2], False) hash = [ Utils.hash_file_md5(filename), Utils.hash_file_md5(ret_enc[1]), Utils.hash_file_md5(ret_dec[1]), ] os.unlink(deunicodise(filename)) os.unlink(deunicodise(ret_enc[1])) os.unlink(deunicodise(ret_dec[1])) if hash[0] == hash[2] and hash[0] != hash[1]: output ("Success. Encryption and decryption worked fine :-)") else: raise Exception("Encryption verification error.") except S3Error, e: error(u"Test failed: %s" % (e)) if e.code == "AccessDenied": error(u"Are you sure your keys have s3:ListAllMyBuckets permissions?") val = raw_input("\nRetry configuration? [Y/n] ") if val.lower().startswith("y") or val == "": continue except Exception, e: error(u"Test failed: %s" % (e)) val = raw_input("\nRetry configuration? [Y/n] ") if val.lower().startswith("y") or val == "": continue val = raw_input("\nSave settings? [y/N] ") if val.lower().startswith("y"): break val = raw_input("Retry configuration? [Y/n] ") if val.lower().startswith("n"): raise EOFError() ## Overwrite existing config file, make it user-readable only old_mask = os.umask(0077) try: os.remove(deunicodise(config_file)) except OSError, e: if e.errno != errno.ENOENT: raise f = open(config_file, "w") os.umask(old_mask) cfg.dump_config(f) f.close() output(u"Configuration saved to '%s'" % config_file) except (EOFError, KeyboardInterrupt): output(u"\nConfiguration aborted. Changes were NOT saved.") return except IOError, e: error(u"Writing config file failed: %s: %s" % (config_file, e.strerror)) sys.exit(EX_IOERR) def process_patterns_from_file(fname, patterns_list): try: fn = open(deunicodise(fname), "rt") except IOError, e: error(e) sys.exit(EX_IOERR) for pattern in fn: pattern = pattern.strip() if re.match("^#", pattern) or re.match("^\s*$", pattern): continue debug(u"%s: adding rule: %s" % (fname, pattern)) patterns_list.append(pattern) return patterns_list def process_patterns(patterns_list, patterns_from, is_glob, option_txt = ""): """ process_patterns(patterns, patterns_from, is_glob, option_txt = "") Process --exclude / --include GLOB and REGEXP patterns. 'option_txt' is 'exclude' / 'include' / 'rexclude' / 'rinclude' Returns: patterns_compiled, patterns_text """ patterns_compiled = [] patterns_textual = {} if patterns_list is None: patterns_list = [] if patterns_from: ## Append patterns from glob_from for fname in patterns_from: debug(u"processing --%s-from %s" % (option_txt, fname)) patterns_list = process_patterns_from_file(fname, patterns_list) for pattern in patterns_list: debug(u"processing %s rule: %s" % (option_txt, patterns_list)) if is_glob: pattern = glob.fnmatch.translate(pattern) r = re.compile(pattern) patterns_compiled.append(r) patterns_textual[r] = pattern return patterns_compiled, patterns_textual def get_commands_list(): return [ {"cmd":"mb", "label":"Make bucket", "param":"s3://BUCKET", "func":cmd_bucket_create, "argc":1}, {"cmd":"rb", "label":"Remove bucket", "param":"s3://BUCKET", "func":cmd_bucket_delete, "argc":1}, {"cmd":"ls", "label":"List objects or buckets", "param":"[s3://BUCKET[/PREFIX]]", "func":cmd_ls, "argc":0}, {"cmd":"la", "label":"List all object in all buckets", "param":"", "func":cmd_all_buckets_list_all_content, "argc":0}, {"cmd":"put", "label":"Put file into bucket", "param":"FILE [FILE...] s3://BUCKET[/PREFIX]", "func":cmd_object_put, "argc":2}, {"cmd":"get", "label":"Get file from bucket", "param":"s3://BUCKET/OBJECT LOCAL_FILE", "func":cmd_object_get, "argc":1}, {"cmd":"del", "label":"Delete file from bucket", "param":"s3://BUCKET/OBJECT", "func":cmd_object_del, "argc":1}, {"cmd":"rm", "label":"Delete file from bucket (alias for del)", "param":"s3://BUCKET/OBJECT", "func":cmd_object_del, "argc":1}, #{"cmd":"mkdir", "label":"Make a virtual S3 directory", "param":"s3://BUCKET/path/to/dir", "func":cmd_mkdir, "argc":1}, {"cmd":"restore", "label":"Restore file from Glacier storage", "param":"s3://BUCKET/OBJECT", "func":cmd_object_restore, "argc":1}, {"cmd":"sync", "label":"Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)", "param":"LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR", "func":cmd_sync, "argc":2}, {"cmd":"du", "label":"Disk usage by buckets", "param":"[s3://BUCKET[/PREFIX]]", "func":cmd_du, "argc":0}, {"cmd":"info", "label":"Get various information about Buckets or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_info, "argc":1}, {"cmd":"cp", "label":"Copy object", "param":"s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]", "func":cmd_cp, "argc":2}, {"cmd":"modify", "label":"Modify object metadata", "param":"s3://BUCKET1/OBJECT", "func":cmd_modify, "argc":1}, {"cmd":"mv", "label":"Move object", "param":"s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]", "func":cmd_mv, "argc":2}, {"cmd":"setacl", "label":"Modify Access control list for Bucket or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_setacl, "argc":1}, {"cmd":"setpolicy", "label":"Modify Bucket Policy", "param":"FILE s3://BUCKET", "func":cmd_setpolicy, "argc":2}, {"cmd":"delpolicy", "label":"Delete Bucket Policy", "param":"s3://BUCKET", "func":cmd_delpolicy, "argc":1}, {"cmd":"setcors", "label":"Modify Bucket CORS", "param":"FILE s3://BUCKET", "func":cmd_setcors, "argc":2}, {"cmd":"delcors", "label":"Delete Bucket CORS", "param":"s3://BUCKET", "func":cmd_delcors, "argc":1}, {"cmd":"payer", "label":"Modify Bucket Requester Pays policy", "param":"s3://BUCKET", "func":cmd_set_payer, "argc":1}, {"cmd":"multipart", "label":"Show multipart uploads", "param":"s3://BUCKET [Id]", "func":cmd_multipart, "argc":1}, {"cmd":"abortmp", "label":"Abort a multipart upload", "param":"s3://BUCKET/OBJECT Id", "func":cmd_abort_multipart, "argc":2}, {"cmd":"listmp", "label":"List parts of a multipart upload", "param":"s3://BUCKET/OBJECT Id", "func":cmd_list_multipart, "argc":2}, {"cmd":"accesslog", "label":"Enable/disable bucket access logging", "param":"s3://BUCKET", "func":cmd_accesslog, "argc":1}, {"cmd":"sign", "label":"Sign arbitrary string using the secret key", "param":"STRING-TO-SIGN", "func":cmd_sign, "argc":1}, {"cmd":"signurl", "label":"Sign an S3 URL to provide limited public access with expiry", "param":"s3://BUCKET/OBJECT ", "func":cmd_signurl, "argc":2}, {"cmd":"fixbucket", "label":"Fix invalid file names in a bucket", "param":"s3://BUCKET[/PREFIX]", "func":cmd_fixbucket, "argc":1}, ## Website commands {"cmd":"ws-create", "label":"Create Website from bucket", "param":"s3://BUCKET", "func":cmd_website_create, "argc":1}, {"cmd":"ws-delete", "label":"Delete Website", "param":"s3://BUCKET", "func":cmd_website_delete, "argc":1}, {"cmd":"ws-info", "label":"Info about Website", "param":"s3://BUCKET", "func":cmd_website_info, "argc":1}, ## Lifecycle commands {"cmd":"expire", "label":"Set or delete expiration rule for the bucket", "param":"s3://BUCKET", "func":cmd_expiration_set, "argc":1}, {"cmd":"setlifecycle", "label":"Upload a lifecycle policy for the bucket", "param":"FILE s3://BUCKET", "func":cmd_setlifecycle, "argc":2}, {"cmd":"dellifecycle", "label":"Remove a lifecycle policy for the bucket", "param":"s3://BUCKET", "func":cmd_dellifecycle, "argc":1}, ## CloudFront commands {"cmd":"cflist", "label":"List CloudFront distribution points", "param":"", "func":CfCmd.info, "argc":0}, {"cmd":"cfinfo", "label":"Display CloudFront distribution point parameters", "param":"[cf://DIST_ID]", "func":CfCmd.info, "argc":0}, {"cmd":"cfcreate", "label":"Create CloudFront distribution point", "param":"s3://BUCKET", "func":CfCmd.create, "argc":1}, {"cmd":"cfdelete", "label":"Delete CloudFront distribution point", "param":"cf://DIST_ID", "func":CfCmd.delete, "argc":1}, {"cmd":"cfmodify", "label":"Change CloudFront distribution point parameters", "param":"cf://DIST_ID", "func":CfCmd.modify, "argc":1}, #{"cmd":"cfinval", "label":"Invalidate CloudFront objects", "param":"s3://BUCKET/OBJECT [s3://BUCKET/OBJECT ...]", "func":CfCmd.invalidate, "argc":1}, {"cmd":"cfinvalinfo", "label":"Display CloudFront invalidation request(s) status", "param":"cf://DIST_ID[/INVAL_ID]", "func":CfCmd.invalinfo, "argc":1}, ] def format_commands(progname, commands_list): help = "Commands:\n" for cmd in commands_list: help += " %s\n %s %s %s\n" % (cmd["label"], progname, cmd["cmd"], cmd["param"]) return help def update_acl(s3, uri, seq_label=""): something_changed = False acl = s3.get_acl(uri) debug(u"acl: %s - %r" % (uri, acl.grantees)) if cfg.acl_public == True: if acl.isAnonRead(): info(u"%s: already Public, skipping %s" % (uri, seq_label)) else: acl.grantAnonRead() something_changed = True elif cfg.acl_public == False: # we explicitely check for False, because it could be None if not acl.isAnonRead(): info(u"%s: already Private, skipping %s" % (uri, seq_label)) else: acl.revokeAnonRead() something_changed = True # update acl with arguments # grant first and revoke later, because revoke has priority if cfg.acl_grants: something_changed = True for grant in cfg.acl_grants: acl.grant(**grant) if cfg.acl_revokes: something_changed = True for revoke in cfg.acl_revokes: acl.revoke(**revoke) if not something_changed: return retsponse = s3.set_acl(uri, acl) if retsponse['status'] == 200: if cfg.acl_public in (True, False): set_to_acl = cfg.acl_public and "Public" or "Private" output(u"%s: ACL set to %s %s" % (uri, set_to_acl, seq_label)) else: output(u"%s: ACL updated" % uri) class OptionMimeType(Option): def check_mimetype(option, opt, value): if re.compile("^[a-z0-9]+/[a-z0-9+\.-]+(;.*)?$", re.IGNORECASE).match(value): return value raise OptionValueError("option %s: invalid MIME-Type format: %r" % (opt, value)) class OptionS3ACL(Option): def check_s3acl(option, opt, value): permissions = ('read', 'write', 'read_acp', 'write_acp', 'full_control', 'all') try: permission, grantee = re.compile("^(\w+):(.+)$", re.IGNORECASE).match(value).groups() if not permission or not grantee: raise if permission in permissions: return { 'name' : grantee, 'permission' : permission.upper() } else: raise OptionValueError("option %s: invalid S3 ACL permission: %s (valid values: %s)" % (opt, permission, ", ".join(permissions))) except: raise OptionValueError("option %s: invalid S3 ACL format: %r" % (opt, value)) class OptionAll(OptionMimeType, OptionS3ACL): TYPE_CHECKER = copy(Option.TYPE_CHECKER) TYPE_CHECKER["mimetype"] = OptionMimeType.check_mimetype TYPE_CHECKER["s3acl"] = OptionS3ACL.check_s3acl TYPES = Option.TYPES + ("mimetype", "s3acl") class MyHelpFormatter(IndentedHelpFormatter): def format_epilog(self, epilog): if epilog: return "\n" + epilog + "\n" else: return "" def main(): global cfg cfg = Config() commands_list = get_commands_list() commands = {} ## Populate "commands" from "commands_list" for cmd in commands_list: if cmd.has_key("cmd"): commands[cmd["cmd"]] = cmd optparser = OptionParser(option_class=OptionAll, formatter=MyHelpFormatter()) #optparser.disable_interspersed_args() config_file = None if os.getenv("S3CMD_CONFIG"): config_file = os.getenv("S3CMD_CONFIG") elif os.name == "nt" and os.getenv("USERPROFILE"): config_file = os.path.join(os.getenv("USERPROFILE").decode('mbcs'), os.getenv("APPDATA").decode('mbcs') or 'Application Data', "s3cmd.ini") else: from os.path import expanduser config_file = os.path.join(expanduser("~"), ".s3cfg") autodetected_encoding = locale.getpreferredencoding() or "UTF-8" optparser.set_defaults(config = config_file) optparser.add_option( "--configure", dest="run_configure", action="store_true", help="Invoke interactive (re)configuration tool. Optionally use as '--configure s3://some-bucket' to test access to a specific bucket instead of attempting to list them all.") optparser.add_option("-c", "--config", dest="config", metavar="FILE", help="Config file name. Defaults to $HOME/.s3cfg") optparser.add_option( "--dump-config", dest="dump_config", action="store_true", help="Dump current configuration after parsing config files and command line options and exit.") optparser.add_option( "--access_key", dest="access_key", help="AWS Access Key") optparser.add_option( "--secret_key", dest="secret_key", help="AWS Secret Key") optparser.add_option("-n", "--dry-run", dest="dry_run", action="store_true", help="Only show what should be uploaded or downloaded but don't actually do it. May still perform S3 requests to get bucket listings and other information though (only for file transfer commands)") optparser.add_option("-s", "--ssl", dest="use_https", action="store_true", help="Use HTTPS connection when communicating with S3. (default)") optparser.add_option( "--no-ssl", dest="use_https", action="store_false", help="Don't use HTTPS.") optparser.add_option("-e", "--encrypt", dest="encrypt", action="store_true", help="Encrypt files before uploading to S3.") optparser.add_option( "--no-encrypt", dest="encrypt", action="store_false", help="Don't encrypt files.") optparser.add_option("-f", "--force", dest="force", action="store_true", help="Force overwrite and other dangerous operations.") optparser.add_option( "--continue", dest="get_continue", action="store_true", help="Continue getting a partially downloaded file (only for [get] command).") optparser.add_option( "--continue-put", dest="put_continue", action="store_true", help="Continue uploading partially uploaded files or multipart upload parts. Restarts/parts files that don't have matching size and md5. Skips files/parts that do. Note: md5sum checks are not always sufficient to check (part) file equality. Enable this at your own risk.") optparser.add_option( "--upload-id", dest="upload_id", help="UploadId for Multipart Upload, in case you want continue an existing upload (equivalent to --continue-put) and there are multiple partial uploads. Use s3cmd multipart [URI] to see what UploadIds are associated with the given URI.") optparser.add_option( "--skip-existing", dest="skip_existing", action="store_true", help="Skip over files that exist at the destination (only for [get] and [sync] commands).") optparser.add_option("-r", "--recursive", dest="recursive", action="store_true", help="Recursive upload, download or removal.") optparser.add_option( "--check-md5", dest="check_md5", action="store_true", help="Check MD5 sums when comparing files for [sync]. (default)") optparser.add_option( "--no-check-md5", dest="check_md5", action="store_false", help="Do not check MD5 sums when comparing files for [sync]. Only size will be compared. May significantly speed up transfer but may also miss some changed files.") optparser.add_option("-P", "--acl-public", dest="acl_public", action="store_true", help="Store objects with ACL allowing read for anyone.") optparser.add_option( "--acl-private", dest="acl_public", action="store_false", help="Store objects with default ACL allowing access for you only.") optparser.add_option( "--acl-grant", dest="acl_grants", type="s3acl", action="append", metavar="PERMISSION:EMAIL or USER_CANONICAL_ID", help="Grant stated permission to a given amazon user. Permission is one of: read, write, read_acp, write_acp, full_control, all") optparser.add_option( "--acl-revoke", dest="acl_revokes", type="s3acl", action="append", metavar="PERMISSION:USER_CANONICAL_ID", help="Revoke stated permission for a given amazon user. Permission is one of: read, write, read_acp, wr ite_acp, full_control, all") optparser.add_option("-D", "--restore-days", dest="restore_days", action="store", help="Number of days to keep restored file available (only for 'restore' command).", metavar="NUM") optparser.add_option( "--delete-removed", dest="delete_removed", action="store_true", help="Delete remote objects with no corresponding local file [sync]") optparser.add_option( "--no-delete-removed", dest="delete_removed", action="store_false", help="Don't delete remote objects.") optparser.add_option( "--delete-after", dest="delete_after", action="store_true", help="Perform deletes after new uploads [sync]") optparser.add_option( "--delay-updates", dest="delay_updates", action="store_true", help="*OBSOLETE* Put all updated files into place at end [sync]") # OBSOLETE optparser.add_option( "--max-delete", dest="max_delete", action="store", help="Do not delete more than NUM files. [del] and [sync]", metavar="NUM") optparser.add_option( "--add-destination", dest="additional_destinations", action="append", help="Additional destination for parallel uploads, in addition to last arg. May be repeated.") optparser.add_option( "--delete-after-fetch", dest="delete_after_fetch", action="store_true", help="Delete remote objects after fetching to local file (only for [get] and [sync] commands).") optparser.add_option("-p", "--preserve", dest="preserve_attrs", action="store_true", help="Preserve filesystem attributes (mode, ownership, timestamps). Default for [sync] command.") optparser.add_option( "--no-preserve", dest="preserve_attrs", action="store_false", help="Don't store FS attributes") optparser.add_option( "--exclude", dest="exclude", action="append", metavar="GLOB", help="Filenames and paths matching GLOB will be excluded from sync") optparser.add_option( "--exclude-from", dest="exclude_from", action="append", metavar="FILE", help="Read --exclude GLOBs from FILE") optparser.add_option( "--rexclude", dest="rexclude", action="append", metavar="REGEXP", help="Filenames and paths matching REGEXP (regular expression) will be excluded from sync") optparser.add_option( "--rexclude-from", dest="rexclude_from", action="append", metavar="FILE", help="Read --rexclude REGEXPs from FILE") optparser.add_option( "--include", dest="include", action="append", metavar="GLOB", help="Filenames and paths matching GLOB will be included even if previously excluded by one of --(r)exclude(-from) patterns") optparser.add_option( "--include-from", dest="include_from", action="append", metavar="FILE", help="Read --include GLOBs from FILE") optparser.add_option( "--rinclude", dest="rinclude", action="append", metavar="REGEXP", help="Same as --include but uses REGEXP (regular expression) instead of GLOB") optparser.add_option( "--rinclude-from", dest="rinclude_from", action="append", metavar="FILE", help="Read --rinclude REGEXPs from FILE") optparser.add_option( "--files-from", dest="files_from", action="append", metavar="FILE", help="Read list of source-file names from FILE. Use - to read from stdin.") optparser.add_option( "--region", "--bucket-location", metavar="REGION", dest="bucket_location", help="Region to create bucket in. As of now the regions are: us-east-1, us-west-1, us-west-2, eu-west-1, eu-central-1, ap-northeast-1, ap-southeast-1, ap-southeast-2, sa-east-1") optparser.add_option( "--host", metavar="HOSTNAME", dest="host_base", help="HOSTNAME:PORT for S3 endpoint (default: %s, alternatives such as s3-eu-west-1.amazonaws.com). You should also set --host-bucket." % (cfg.host_base)) optparser.add_option( "--host-bucket", dest="host_bucket", help="DNS-style bucket+hostname:port template for accessing a bucket (default: %s)" % (cfg.host_bucket)) optparser.add_option( "--reduced-redundancy", "--rr", dest="reduced_redundancy", action="store_true", help="Store object with 'Reduced redundancy'. Lower per-GB price. [put, cp, mv]") optparser.add_option( "--no-reduced-redundancy", "--no-rr", dest="reduced_redundancy", action="store_false", help="Store object without 'Reduced redundancy'. Higher per-GB price. [put, cp, mv]") optparser.add_option( "--storage-class", dest="storage_class", action="store", metavar="CLASS", help="Store object with specified CLASS (STANDARD, STANDARD_IA, or REDUCED_REDUNDANCY). Lower per-GB price. [put, cp, mv]") optparser.add_option( "--access-logging-target-prefix", dest="log_target_prefix", help="Target prefix for access logs (S3 URI) (for [cfmodify] and [accesslog] commands)") optparser.add_option( "--no-access-logging", dest="log_target_prefix", action="store_false", help="Disable access logging (for [cfmodify] and [accesslog] commands)") optparser.add_option( "--default-mime-type", dest="default_mime_type", type="mimetype", action="store", help="Default MIME-type for stored objects. Application default is binary/octet-stream.") optparser.add_option("-M", "--guess-mime-type", dest="guess_mime_type", action="store_true", help="Guess MIME-type of files by their extension or mime magic. Fall back to default MIME-Type as specified by --default-mime-type option") optparser.add_option( "--no-guess-mime-type", dest="guess_mime_type", action="store_false", help="Don't guess MIME-type and use the default type instead.") optparser.add_option( "--no-mime-magic", dest="use_mime_magic", action="store_false", help="Don't use mime magic when guessing MIME-type.") optparser.add_option("-m", "--mime-type", dest="mime_type", type="mimetype", metavar="MIME/TYPE", help="Force MIME-type. Override both --default-mime-type and --guess-mime-type.") optparser.add_option( "--add-header", dest="add_header", action="append", metavar="NAME:VALUE", help="Add a given HTTP header to the upload request. Can be used multiple times. For instance set 'Expires' or 'Cache-Control' headers (or both) using this option.") optparser.add_option( "--remove-header", dest="remove_headers", action="append", metavar="NAME", help="Remove a given HTTP header. Can be used multiple times. For instance, remove 'Expires' or 'Cache-Control' headers (or both) using this option. [modify]") optparser.add_option( "--server-side-encryption", dest="server_side_encryption", action="store_true", help="Specifies that server-side encryption will be used when putting objects. [put, sync, cp, modify]") optparser.add_option( "--server-side-encryption-kms-id", dest="kms_key", action="store", help="Specifies the key id used for server-side encryption with AWS KMS-Managed Keys (SSE-KMS) when putting objects. [put, sync, cp, modify]") optparser.add_option( "--encoding", dest="encoding", metavar="ENCODING", help="Override autodetected terminal and filesystem encoding (character set). Autodetected: %s" % autodetected_encoding) optparser.add_option( "--add-encoding-exts", dest="add_encoding_exts", metavar="EXTENSIONs", help="Add encoding to these comma delimited extensions i.e. (css,js,html) when uploading to S3 )") optparser.add_option( "--verbatim", dest="urlencoding_mode", action="store_const", const="verbatim", help="Use the S3 name as given on the command line. No pre-processing, encoding, etc. Use with caution!") optparser.add_option( "--disable-multipart", dest="enable_multipart", action="store_false", help="Disable multipart upload on files bigger than --multipart-chunk-size-mb") optparser.add_option( "--multipart-chunk-size-mb", dest="multipart_chunk_size_mb", type="int", action="store", metavar="SIZE", help="Size of each chunk of a multipart upload. Files bigger than SIZE are automatically uploaded as multithreaded-multipart, smaller files are uploaded using the traditional method. SIZE is in Mega-Bytes, default chunk size is 15MB, minimum allowed chunk size is 5MB, maximum is 5GB.") optparser.add_option( "--list-md5", dest="list_md5", action="store_true", help="Include MD5 sums in bucket listings (only for 'ls' command).") optparser.add_option("-H", "--human-readable-sizes", dest="human_readable_sizes", action="store_true", help="Print sizes in human readable form (eg 1kB instead of 1234).") optparser.add_option( "--ws-index", dest="website_index", action="store", help="Name of index-document (only for [ws-create] command)") optparser.add_option( "--ws-error", dest="website_error", action="store", help="Name of error-document (only for [ws-create] command)") optparser.add_option( "--expiry-date", dest="expiry_date", action="store", help="Indicates when the expiration rule takes effect. (only for [expire] command)") optparser.add_option( "--expiry-days", dest="expiry_days", action="store", help="Indicates the number of days after object creation the expiration rule takes effect. (only for [expire] command)") optparser.add_option( "--expiry-prefix", dest="expiry_prefix", action="store", help="Identifying one or more objects with the prefix to which the expiration rule applies. (only for [expire] command)") optparser.add_option( "--progress", dest="progress_meter", action="store_true", help="Display progress meter (default on TTY).") optparser.add_option( "--no-progress", dest="progress_meter", action="store_false", help="Don't display progress meter (default on non-TTY).") optparser.add_option( "--stats", dest="stats", action="store_true", help="Give some file-transfer stats.") optparser.add_option( "--enable", dest="enable", action="store_true", help="Enable given CloudFront distribution (only for [cfmodify] command)") optparser.add_option( "--disable", dest="enable", action="store_false", help="Enable given CloudFront distribution (only for [cfmodify] command)") optparser.add_option( "--cf-invalidate", dest="invalidate_on_cf", action="store_true", help="Invalidate the uploaded filed in CloudFront. Also see [cfinval] command.") # joseprio: adding options to invalidate the default index and the default # index root optparser.add_option( "--cf-invalidate-default-index", dest="invalidate_default_index_on_cf", action="store_true", help="When using Custom Origin and S3 static website, invalidate the default index file.") optparser.add_option( "--cf-no-invalidate-default-index-root", dest="invalidate_default_index_root_on_cf", action="store_false", help="When using Custom Origin and S3 static website, don't invalidate the path to the default index file.") optparser.add_option( "--cf-add-cname", dest="cf_cnames_add", action="append", metavar="CNAME", help="Add given CNAME to a CloudFront distribution (only for [cfcreate] and [cfmodify] commands)") optparser.add_option( "--cf-remove-cname", dest="cf_cnames_remove", action="append", metavar="CNAME", help="Remove given CNAME from a CloudFront distribution (only for [cfmodify] command)") optparser.add_option( "--cf-comment", dest="cf_comment", action="store", metavar="COMMENT", help="Set COMMENT for a given CloudFront distribution (only for [cfcreate] and [cfmodify] commands)") optparser.add_option( "--cf-default-root-object", dest="cf_default_root_object", action="store", metavar="DEFAULT_ROOT_OBJECT", help="Set the default root object to return when no object is specified in the URL. Use a relative path, i.e. default/index.html instead of /default/index.html or s3://bucket/default/index.html (only for [cfcreate] and [cfmodify] commands)") optparser.add_option("-v", "--verbose", dest="verbosity", action="store_const", const=logging.INFO, help="Enable verbose output.") optparser.add_option("-d", "--debug", dest="verbosity", action="store_const", const=logging.DEBUG, help="Enable debug output.") optparser.add_option( "--version", dest="show_version", action="store_true", help="Show s3cmd version (%s) and exit." % (PkgInfo.version)) optparser.add_option("-F", "--follow-symlinks", dest="follow_symlinks", action="store_true", default=False, help="Follow symbolic links as if they are regular files") optparser.add_option( "--cache-file", dest="cache_file", action="store", default="", metavar="FILE", help="Cache FILE containing local source MD5 values") optparser.add_option("-q", "--quiet", dest="quiet", action="store_true", default=False, help="Silence output on stdout") optparser.add_option( "--ca-certs", dest="ca_certs_file", action="store", default=None, help="Path to SSL CA certificate FILE (instead of system default)") optparser.add_option( "--check-certificate", dest="check_ssl_certificate", action="store_true", help="Check SSL certificate validity") optparser.add_option( "--no-check-certificate", dest="check_ssl_certificate", action="store_false", help="Do not check SSL certificate validity") optparser.add_option( "--check-hostname", dest="check_ssl_hostname", action="store_true", help="Check SSL certificate hostname validity") optparser.add_option( "--no-check-hostname", dest="check_ssl_hostname", action="store_false", help="Do not check SSL certificate hostname validity") optparser.add_option( "--signature-v2", dest="signature_v2", action="store_true", help="Use AWS Signature version 2 instead of newer signature methods. Helpful for S3-like systems that don't have AWS Signature v4 yet.") optparser.add_option( "--limit-rate", dest="limitrate", action="store", type="string", help="Limit the upload or download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix") optparser.add_option( "--requester-pays", dest="requester_pays", action="store_true", help="Set the REQUESTER PAYS flag for operations") optparser.add_option("-l", "--long-listing", dest="long_listing", action="store_true", help="Produce long listing [ls]") optparser.add_option( "--stop-on-error", dest="stop_on_error", action="store_true", help="stop if error in transfer") optparser.add_option( "--content-disposition", dest="content_disposition", action="store", help="Provide a Content-Disposition for signed URLs, e.g., \"inline; filename=myvideo.mp4\"") optparser.add_option( "--content-type", dest="content_type", action="store", help="Provide a Content-Type for signed URLs, e.g., \"video/mp4\"") optparser.set_usage(optparser.usage + " COMMAND [parameters]") optparser.set_description('S3cmd is a tool for managing objects in '+ 'Amazon S3 storage. It allows for making and removing '+ '"buckets" and uploading, downloading and removing '+ '"objects" from these buckets.') optparser.epilog = format_commands(optparser.get_prog_name(), commands_list) optparser.epilog += ("\nFor more information, updates and news, visit the s3cmd website:\n%s\n" % PkgInfo.url) (options, args) = optparser.parse_args() ## Some mucking with logging levels to enable ## debugging/verbose output for config file parser on request logging.basicConfig(level=options.verbosity or Config().verbosity, format='%(levelname)s: %(message)s', stream = sys.stderr) if options.show_version: output(u"s3cmd version %s" % PkgInfo.version) sys.exit(EX_OK) debug(u"s3cmd version %s" % PkgInfo.version) if options.quiet: try: f = open("/dev/null", "w") sys.stdout.close() sys.stdout = f except IOError: warning(u"Unable to open /dev/null: --quiet disabled.") ## Now finally parse the config file if not options.config: error(u"Can't find a config file. Please use --config option.") sys.exit(EX_CONFIG) try: cfg = Config(options.config, options.access_key, options.secret_key) except IOError, e: if options.run_configure: cfg = Config() else: error(u"%s: %s" % (options.config, e.strerror)) error(u"Configuration file not available.") error(u"Consider using --configure parameter to create one.") sys.exit(EX_CONFIG) # allow commandline verbosity config to override config file if options.verbosity is not None: cfg.verbosity = options.verbosity logging.root.setLevel(cfg.verbosity) ## Unsupported features on Win32 platform if os.name == "nt": if cfg.preserve_attrs: error(u"Option --preserve is not yet supported on MS Windows platform. Assuming --no-preserve.") cfg.preserve_attrs = False if cfg.progress_meter: error(u"Option --progress is not yet supported on MS Windows platform. Assuming --no-progress.") cfg.progress_meter = False ## Pre-process --add-header's and put them to Config.extra_headers SortedDict() if options.add_header: for hdr in options.add_header: try: key, val = hdr.split(":", 1) except ValueError: raise ParameterError("Invalid header format: %s" % hdr) key_inval = re.sub("[a-zA-Z0-9-.]", "", key) if key_inval: key_inval = key_inval.replace(" ", "") key_inval = key_inval.replace("\t", "") raise ParameterError("Invalid character(s) in header name '%s': \"%s\"" % (key, key_inval)) debug(u"Updating Config.Config extra_headers[%s] -> %s" % (key.strip().lower(), val.strip())) cfg.extra_headers[key.strip().lower()] = val.strip() # Process --remove-header if options.remove_headers: cfg.remove_headers = options.remove_headers ## --acl-grant/--acl-revoke arguments are pre-parsed by OptionS3ACL() if options.acl_grants: for grant in options.acl_grants: cfg.acl_grants.append(grant) if options.acl_revokes: for grant in options.acl_revokes: cfg.acl_revokes.append(grant) ## Process --(no-)check-md5 if options.check_md5 == False: try: cfg.sync_checks.remove("md5") cfg.preserve_attrs_list.remove("md5") except Exception: pass if options.check_md5 == True: if cfg.sync_checks.count("md5") == 0: cfg.sync_checks.append("md5") if cfg.preserve_attrs_list.count("md5") == 0: cfg.preserve_attrs_list.append("md5") ## Update Config with other parameters for option in cfg.option_list(): try: if getattr(options, option) != None: debug(u"Updating Config.Config %s -> %s" % (option, getattr(options, option))) cfg.update_option(option, getattr(options, option)) except AttributeError: ## Some Config() options are not settable from command line pass ## Special handling for tri-state options (True, False, None) cfg.update_option("enable", options.enable) if options.acl_public is not None: cfg.update_option("acl_public", options.acl_public) ## Check multipart chunk constraints if cfg.multipart_chunk_size_mb < MultiPartUpload.MIN_CHUNK_SIZE_MB: raise ParameterError("Chunk size %d MB is too small, must be >= %d MB. Please adjust --multipart-chunk-size-mb" % (cfg.multipart_chunk_size_mb, MultiPartUpload.MIN_CHUNK_SIZE_MB)) if cfg.multipart_chunk_size_mb > MultiPartUpload.MAX_CHUNK_SIZE_MB: raise ParameterError("Chunk size %d MB is too large, must be <= %d MB. Please adjust --multipart-chunk-size-mb" % (cfg.multipart_chunk_size_mb, MultiPartUpload.MAX_CHUNK_SIZE_MB)) ## If an UploadId was provided, set put_continue True if options.upload_id is not None: cfg.upload_id = options.upload_id cfg.put_continue = True if cfg.upload_id and not cfg.multipart_chunk_size_mb: raise ParameterError("Must have --multipart-chunk-size-mb if using --put-continue or --upload-id") ## CloudFront's cf_enable and Config's enable share the same --enable switch options.cf_enable = options.enable ## CloudFront's cf_logging and Config's log_target_prefix share the same --log-target-prefix switch options.cf_logging = options.log_target_prefix ## Update CloudFront options if some were set for option in CfCmd.options.option_list(): try: if getattr(options, option) != None: debug(u"Updating CloudFront.Cmd %s -> %s" % (option, getattr(options, option))) CfCmd.options.update_option(option, getattr(options, option)) except AttributeError: ## Some CloudFront.Cmd.Options() options are not settable from command line pass if options.additional_destinations: cfg.additional_destinations = options.additional_destinations if options.files_from: cfg.files_from = options.files_from ## Set output and filesystem encoding for printing out filenames. sys.stdout = codecs.getwriter(cfg.encoding)(sys.stdout, "replace") sys.stderr = codecs.getwriter(cfg.encoding)(sys.stderr, "replace") ## Process --exclude and --exclude-from patterns_list, patterns_textual = process_patterns(options.exclude, options.exclude_from, is_glob = True, option_txt = "exclude") cfg.exclude.extend(patterns_list) cfg.debug_exclude.update(patterns_textual) ## Process --rexclude and --rexclude-from patterns_list, patterns_textual = process_patterns(options.rexclude, options.rexclude_from, is_glob = False, option_txt = "rexclude") cfg.exclude.extend(patterns_list) cfg.debug_exclude.update(patterns_textual) ## Process --include and --include-from patterns_list, patterns_textual = process_patterns(options.include, options.include_from, is_glob = True, option_txt = "include") cfg.include.extend(patterns_list) cfg.debug_include.update(patterns_textual) ## Process --rinclude and --rinclude-from patterns_list, patterns_textual = process_patterns(options.rinclude, options.rinclude_from, is_glob = False, option_txt = "rinclude") cfg.include.extend(patterns_list) cfg.debug_include.update(patterns_textual) ## Set socket read()/write() timeout socket.setdefaulttimeout(cfg.socket_timeout) if cfg.encrypt and cfg.gpg_passphrase == "": error(u"Encryption requested but no passphrase set in config file.") error(u"Please re-run 's3cmd --configure' and supply it.") sys.exit(EX_CONFIG) if options.dump_config: cfg.dump_config(sys.stdout) sys.exit(EX_OK) if options.run_configure: # 'args' may contain the test-bucket URI run_configure(options.config, args) sys.exit(EX_OK) ## set config if stop_on_error is set if options.stop_on_error: cfg.stop_on_error = options.stop_on_error if options.content_disposition: cfg.content_disposition = options.content_disposition if options.content_type: cfg.content_type = options.content_type if len(args) < 1: optparser.print_help() sys.exit(EX_USAGE) ## Unicodise all remaining arguments: args = [unicodise(arg) for arg in args] command = args.pop(0) try: debug(u"Command: %s" % commands[command]["cmd"]) ## We must do this lookup in extra step to ## avoid catching all KeyError exceptions ## from inner functions. cmd_func = commands[command]["func"] except KeyError, e: error(u"Invalid command: %s", command) sys.exit(EX_USAGE) if len(args) < commands[command]["argc"]: error(u"Not enough parameters for command '%s'" % command) sys.exit(EX_USAGE) rc = cmd_func(args) if rc is None: # if we missed any cmd_*() returns rc = EX_GENERAL return rc def report_exception(e, msg=u''): alert_header = u""" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! An unexpected error has occurred. Please try reproducing the error using the latest s3cmd code from the git master branch found at: https://github.com/s3tools/s3cmd and have a look at the known issues list: https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions If the error persists, please report the %s (removing any private info as necessary) to: s3tools-bugs@lists.sourceforge.net%s !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! """ sys.stderr.write(alert_header % (u"following lines", u"\n\n" + msg)) tb = traceback.format_exc(sys.exc_info()) try: s = u' '.join([unicodise(a) for a in sys.argv]) except NameError: s = u' '.join([(a) for a in sys.argv]) sys.stderr.write(u"Invoked as: %s\n" % s) e_class = str(e.__class__) e_class = e_class[e_class.rfind(".")+1 : -2] sys.stderr.write(u"Problem: %s: %s\n" % (e_class, e)) try: sys.stderr.write(u"S3cmd: %s\n" % PkgInfo.version) except NameError: sys.stderr.write(u"S3cmd: unknown version. Module import problem?\n") sys.stderr.write(u"python: %s\n" % sys.version) sys.stderr.write(u"environment LANG=%s\n" % os.getenv("LANG")) sys.stderr.write(u"\n") sys.stderr.write(unicode(tb, errors="replace")) if type(e) == ImportError: sys.stderr.write("\n") sys.stderr.write("Your sys.path contains these entries:\n") for path in sys.path: sys.stderr.write(u"\t%s\n" % path) sys.stderr.write("Now the question is where have the s3cmd modules been installed?\n") sys.stderr.write(alert_header % (u"above lines", u"")) if __name__ == '__main__': try: ## Our modules ## Keep them in try/except block to ## detect any syntax errors in there from S3.ExitCodes import * from S3.Exceptions import * from S3 import PkgInfo from S3.S3 import S3 from S3.Config import Config from S3.SortedDict import SortedDict from S3.FileDict import FileDict from S3.S3Uri import S3Uri from S3 import Utils from S3 import Crypto from S3.Utils import * from S3.Progress import Progress, StatsInfo from S3.CloudFront import Cmd as CfCmd from S3.CloudFront import CloudFront from S3.FileLists import * from S3.MultiPart import MultiPartUpload except Exception as e: report_exception(e, "Error loading some components of s3cmd (Import Error)") # 1 = EX_GENERAL but be safe in that situation sys.exit(1) try: rc = main() sys.exit(rc) except ImportError, e: report_exception(e) sys.exit(EX_GENERAL) except (ParameterError, InvalidFileError), e: error(u"Parameter problem: %s" % e) sys.exit(EX_USAGE) except (S3DownloadError, S3UploadError, S3RequestError), e: error(u"S3 Temporary Error: %s. Please try again later." % e) sys.exit(EX_TEMPFAIL) except S3Error, e: error(u"S3 error: %s" % e) sys.exit(e.get_error_code()) except (S3Exception, S3ResponseError, CloudFrontError), e: report_exception(e) sys.exit(EX_SOFTWARE) except SystemExit, e: sys.exit(e.code) except KeyboardInterrupt: sys.stderr.write("See ya!\n") sys.exit(EX_BREAK) except SSLError, e: # SSLError is a subtype of IOError error("SSL certificate verification failure: %s" % e) sys.exit(EX_ACCESSDENIED) except IOError, e: if e.errno == errno.EPIPE: # Fail silently on SIGPIPE. This likely means we wrote to a closed # pipe and user does not care for any more output. sys.exit(EX_IOERR) report_exception(e) sys.exit(EX_IOERR) except OSError, e: error(e) sys.exit(EX_OSERR) except MemoryError: msg = """ MemoryError! You have exceeded the amount of memory available for this process. This usually occurs when syncing >750,000 files on a 32-bit python instance. The solutions to this are: 1) sync several smaller subtrees; or 2) use a 64-bit python on a 64-bit OS with >8GB RAM """ sys.stderr.write(msg) sys.exit(EX_OSERR) except UnicodeEncodeError, e: lang = os.getenv("LANG") msg = """ You have encountered a UnicodeEncodeError. Your environment variable LANG=%s may not specify a Unicode encoding (e.g. UTF-8). Please set LANG=en_US.UTF-8 or similar in your environment before invoking s3cmd. """ % lang report_exception(e, msg) sys.exit(EX_GENERAL) except Exception, e: report_exception(e) sys.exit(EX_GENERAL) # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/README.md0000664000175000017500000003512412647745544015146 0ustar mdomschmdomsch00000000000000## S3cmd tool for Amazon Simple Storage Service (S3) * Author: Michal Ludvig, michal@logix.cz * [Project homepage](http://s3tools.org) * (c) [TGRMN Software](http://www.tgrmn.com) and contributors S3tools / S3cmd mailing lists: * Announcements of new releases: s3tools-announce@lists.sourceforge.net * General questions and discussion: s3tools-general@lists.sourceforge.net * Bug reports: s3tools-bugs@lists.sourceforge.net ### What is S3cmd S3cmd (`s3cmd`) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc. S3cmd is written in Python. It's an open source project available under GNU Public License v2 (GPLv2) and is free for both commercial and private use. You will only have to pay Amazon for using their storage. Lots of features and options have been added to S3cmd, since its very first release in 2008.... we recently counted more than 60 command line options, including multipart uploads, encryption, incremental backup, s3 sync, ACL and Metadata management, S3 bucket size, bucket policies, and more! ### What is Amazon S3 Amazon S3 provides a managed internet-accessible storage service where anyone can store any amount of data and retrieve it later again. S3 is a paid service operated by Amazon. Before storing anything into S3 you must sign up for an "AWS" account (where AWS = Amazon Web Services) to obtain a pair of identifiers: Access Key and Secret Key. You will need to give these keys to S3cmd. Think of them as if they were a username and password for your S3 account. ### Amazon S3 pricing explained At the time of this writing the costs of using S3 are (in USD): $0.03 per GB per month of storage space used plus $0.00 per GB - all data uploaded plus $0.000 per GB - first 1GB / month data downloaded $0.090 per GB - up to 10 TB / month data downloaded $0.085 per GB - next 40 TB / month data downloaded $0.070 per GB - data downloaded / month over 50 TB plus $0.005 per 1,000 PUT or COPY or LIST requests $0.004 per 10,000 GET and all other requests If for instance on 1st of January you upload 2GB of photos in JPEG from your holiday in New Zealand, at the end of January you will be charged $0.06 for using 2GB of storage space for a month, $0.0 for uploading 2GB of data, and a few cents for requests. That comes to slightly over $0.06 for a complete backup of your precious holiday pictures. In February you don't touch it. Your data are still on S3 servers so you pay $0.06 for those two gigabytes, but not a single cent will be charged for any transfer. That comes to $0.06 as an ongoing cost of your backup. Not too bad. In March you allow anonymous read access to some of your pictures and your friends download, say, 1500MB of them. As the files are owned by you, you are responsible for the costs incurred. That means at the end of March you'll be charged $0.06 for storage plus $0.045 for the download traffic generated by your friends. There is no minimum monthly contract or a setup fee. What you use is what you pay for. At the beginning my bill used to be like US$0.03 or even nil. That's the pricing model of Amazon S3 in a nutshell. Check the [Amazon S3 homepage](http://aws.amazon.com/s3/pricing/) for more details. Needless to say that all these money are charged by Amazon itself, there is obviously no payment for using S3cmd :-) ### Amazon S3 basics Files stored in S3 are called "objects" and their names are officially called "keys". Since this is sometimes confusing for the users we often refer to the objects as "files" or "remote files". Each object belongs to exactly one "bucket". To describe objects in S3 storage we invented a URI-like schema in the following form: ``` s3://BUCKET ``` or ``` s3://BUCKET/OBJECT ``` ### Buckets Buckets are sort of like directories or folders with some restrictions: 1. each user can only have 100 buckets at the most, 2. bucket names must be unique amongst all users of S3, 3. buckets can not be nested into a deeper hierarchy and 4. a name of a bucket can only consist of basic alphanumeric characters plus dot (.) and dash (-). No spaces, no accented or UTF-8 letters, etc. It is a good idea to use DNS-compatible bucket names. That for instance means you should not use upper case characters. While DNS compliance is not strictly required some features described below are not available for DNS-incompatible named buckets. One more step further is using a fully qualified domain name (FQDN) for a bucket - that has even more benefits. * For example "s3://--My-Bucket--" is not DNS compatible. * On the other hand "s3://my-bucket" is DNS compatible but is not FQDN. * Finally "s3://my-bucket.s3tools.org" is DNS compatible and FQDN provided you own the s3tools.org domain and can create the domain record for "my-bucket.s3tools.org". Look for "Virtual Hosts" later in this text for more details regarding FQDN named buckets. ### Objects (files stored in Amazon S3) Unlike for buckets there are almost no restrictions on object names. These can be any UTF-8 strings of up to 1024 bytes long. Interestingly enough the object name can contain forward slash character (/) thus a `my/funny/picture.jpg` is a valid object name. Note that there are not directories nor buckets called `my` and `funny` - it is really a single object name called `my/funny/picture.jpg` and S3 does not care at all that it _looks_ like a directory structure. The full URI of such an image could be, for example: ``` s3://my-bucket/my/funny/picture.jpg ``` ### Public vs Private files The files stored in S3 can be either Private or Public. The Private ones are readable only by the user who uploaded them while the Public ones can be read by anyone. Additionally the Public files can be accessed using HTTP protocol, not only using `s3cmd` or a similar tool. The ACL (Access Control List) of a file can be set at the time of upload using `--acl-public` or `--acl-private` options with `s3cmd put` or `s3cmd sync` commands (see below). Alternatively the ACL can be altered for existing remote files with `s3cmd setacl --acl-public` (or `--acl-private`) command. ### Simple s3cmd HowTo 1) Register for Amazon AWS / S3 Go to http://aws.amazon.com/s3, click the "Sign up for web service" button in the right column and work through the registration. You will have to supply your Credit Card details in order to allow Amazon charge you for S3 usage. At the end you should have your Access and Secret Keys. If you set up a separate IAM user, that user's access key must have at least the following permissions to do anything: - s3:ListAllMyBuckets - s3:GetBucketLocation - s3:ListBucket Other example policies can be found at https://docs.aws.amazon.com/AmazonS3/latest/dev/example-policies-s3.html 2) Run `s3cmd --configure` You will be asked for the two keys - copy and paste them from your confirmation email or from your Amazon account page. Be careful when copying them! They are case sensitive and must be entered accurately or you'll keep getting errors about invalid signatures or similar. Remember to add s3:ListAllMyBuckets permissions to the keys or you will get an AccessDenied error while testing access. 3) Run `s3cmd ls` to list all your buckets. As you just started using S3 there are no buckets owned by you as of now. So the output will be empty. 4) Make a bucket with `s3cmd mb s3://my-new-bucket-name` As mentioned above the bucket names must be unique amongst _all_ users of S3. That means the simple names like "test" or "asdf" are already taken and you must make up something more original. To demonstrate as many features as possible let's create a FQDN-named bucket `s3://public.s3tools.org`: ``` $ s3cmd mb s3://public.s3tools.org Bucket 's3://public.s3tools.org' created ``` 5) List your buckets again with `s3cmd ls` Now you should see your freshly created bucket: ``` $ s3cmd ls 2009-01-28 12:34 s3://public.s3tools.org ``` 6) List the contents of the bucket: ``` $ s3cmd ls s3://public.s3tools.org $ ``` It's empty, indeed. 7) Upload a single file into the bucket: ``` $ s3cmd put some-file.xml s3://public.s3tools.org/somefile.xml some-file.xml -> s3://public.s3tools.org/somefile.xml [1 of 1] 123456 of 123456 100% in 2s 51.75 kB/s done ``` Upload a two-directory tree into the bucket's virtual 'directory': ``` $ s3cmd put --recursive dir1 dir2 s3://public.s3tools.org/somewhere/ File 'dir1/file1-1.txt' stored as 's3://public.s3tools.org/somewhere/dir1/file1-1.txt' [1 of 5] File 'dir1/file1-2.txt' stored as 's3://public.s3tools.org/somewhere/dir1/file1-2.txt' [2 of 5] File 'dir1/file1-3.log' stored as 's3://public.s3tools.org/somewhere/dir1/file1-3.log' [3 of 5] File 'dir2/file2-1.bin' stored as 's3://public.s3tools.org/somewhere/dir2/file2-1.bin' [4 of 5] File 'dir2/file2-2.txt' stored as 's3://public.s3tools.org/somewhere/dir2/file2-2.txt' [5 of 5] ``` As you can see we didn't have to create the `/somewhere` 'directory'. In fact it's only a filename prefix, not a real directory and it doesn't have to be created in any way beforehand. In stead of using `put` with the `--recursive` option, you could also use the `sync` command: ``` $ s3cmd sync dir1 dir2 s3://public.s3tools.org/somewhere/ ``` 8) Now list the bucket's contents again: ``` $ s3cmd ls s3://public.s3tools.org DIR s3://public.s3tools.org/somewhere/ 2009-02-10 05:10 123456 s3://public.s3tools.org/somefile.xml ``` Use --recursive (or -r) to list all the remote files: ``` $ s3cmd ls --recursive s3://public.s3tools.org 2009-02-10 05:10 123456 s3://public.s3tools.org/somefile.xml 2009-02-10 05:13 18 s3://public.s3tools.org/somewhere/dir1/file1-1.txt 2009-02-10 05:13 8 s3://public.s3tools.org/somewhere/dir1/file1-2.txt 2009-02-10 05:13 16 s3://public.s3tools.org/somewhere/dir1/file1-3.log 2009-02-10 05:13 11 s3://public.s3tools.org/somewhere/dir2/file2-1.bin 2009-02-10 05:13 8 s3://public.s3tools.org/somewhere/dir2/file2-2.txt ``` 9) Retrieve one of the files back and verify that it hasn't been corrupted: ``` $ s3cmd get s3://public.s3tools.org/somefile.xml some-file-2.xml s3://public.s3tools.org/somefile.xml -> some-file-2.xml [1 of 1] 123456 of 123456 100% in 3s 35.75 kB/s done ``` ``` $ md5sum some-file.xml some-file-2.xml 39bcb6992e461b269b95b3bda303addf some-file.xml 39bcb6992e461b269b95b3bda303addf some-file-2.xml ``` Checksums of the original file matches the one of the retrieved ones. Looks like it worked :-) To retrieve a whole 'directory tree' from S3 use recursive get: ``` $ s3cmd get --recursive s3://public.s3tools.org/somewhere File s3://public.s3tools.org/somewhere/dir1/file1-1.txt saved as './somewhere/dir1/file1-1.txt' File s3://public.s3tools.org/somewhere/dir1/file1-2.txt saved as './somewhere/dir1/file1-2.txt' File s3://public.s3tools.org/somewhere/dir1/file1-3.log saved as './somewhere/dir1/file1-3.log' File s3://public.s3tools.org/somewhere/dir2/file2-1.bin saved as './somewhere/dir2/file2-1.bin' File s3://public.s3tools.org/somewhere/dir2/file2-2.txt saved as './somewhere/dir2/file2-2.txt' ``` Since the destination directory wasn't specified, `s3cmd` saved the directory structure in a current working directory ('.'). There is an important difference between: ``` get s3://public.s3tools.org/somewhere ``` and ``` get s3://public.s3tools.org/somewhere/ ``` (note the trailing slash) `s3cmd` always uses the last path part, ie the word after the last slash, for naming files. In the case of `s3://.../somewhere` the last path part is 'somewhere' and therefore the recursive get names the local files as somewhere/dir1, somewhere/dir2, etc. On the other hand in `s3://.../somewhere/` the last path part is empty and s3cmd will only create 'dir1' and 'dir2' without the 'somewhere/' prefix: ``` $ s3cmd get --recursive s3://public.s3tools.org/somewhere/ ~/ File s3://public.s3tools.org/somewhere/dir1/file1-1.txt saved as '~/dir1/file1-1.txt' File s3://public.s3tools.org/somewhere/dir1/file1-2.txt saved as '~/dir1/file1-2.txt' File s3://public.s3tools.org/somewhere/dir1/file1-3.log saved as '~/dir1/file1-3.log' File s3://public.s3tools.org/somewhere/dir2/file2-1.bin saved as '~/dir2/file2-1.bin' ``` See? It's `~/dir1` and not `~/somewhere/dir1` as it was in the previous example. 10) Clean up - delete the remote files and remove the bucket: Remove everything under s3://public.s3tools.org/somewhere/ ``` $ s3cmd del --recursive s3://public.s3tools.org/somewhere/ File s3://public.s3tools.org/somewhere/dir1/file1-1.txt deleted File s3://public.s3tools.org/somewhere/dir1/file1-2.txt deleted ... ``` Now try to remove the bucket: ``` $ s3cmd rb s3://public.s3tools.org ERROR: S3 error: 409 (BucketNotEmpty): The bucket you tried to delete is not empty ``` Ouch, we forgot about `s3://public.s3tools.org/somefile.xml`. We can force the bucket removal anyway: ``` $ s3cmd rb --force s3://public.s3tools.org/ WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time... File s3://public.s3tools.org/somefile.xml deleted Bucket 's3://public.s3tools.org/' removed ``` ### Hints The basic usage is as simple as described in the previous section. You can increase the level of verbosity with `-v` option and if you're really keen to know what the program does under its bonnet run it with `-d` to see all 'debugging' output. After configuring it with `--configure` all available options are spitted into your `~/.s3cfg` file. It's a text file ready to be modified in your favourite text editor. The Transfer commands (put, get, cp, mv, and sync) continue transferring even if an object fails. If a failure occurs the failure is output to stderr and the exit status will be EX_PARTIAL (2). If the option `--stop-on-error` is specified, or the config option stop_on_error is true, the transfers stop and an appropriate error code is returned. For more information refer to the [S3cmd / S3tools homepage](http://s3tools.org). ### License Copyright (C) 2007-2015 TGRMN Software - http://www.tgrmn.com - and contributors This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. s3cmd-1.6.1/setup.py0000664000175000017500000000657212647745544015406 0ustar mdomschmdomsch00000000000000#!/usr/bin/env python2 # -*- coding=utf-8 -*- import sys import os from setuptools import setup import S3.PkgInfo if float("%d.%d" % sys.version_info[:2]) < 2.6: sys.stderr.write("Your Python version %d.%d.%d is not supported.\n" % sys.version_info[:3]) sys.stderr.write("S3cmd requires Python 2.6 or newer.\n") sys.exit(1) try: import xml.etree.ElementTree as ET print "Using xml.etree.ElementTree for XML processing" except ImportError, e: sys.stderr.write(str(e) + "\n") try: import elementtree.ElementTree as ET print "Using elementtree.ElementTree for XML processing" except ImportError, e: sys.stderr.write(str(e) + "\n") sys.stderr.write("Please install ElementTree module from\n") sys.stderr.write("http://effbot.org/zone/element-index.htm\n") sys.exit(1) try: ## Remove 'MANIFEST' file to force ## distutils to recreate it. ## Only in "sdist" stage. Otherwise ## it makes life difficult to packagers. if sys.argv[1] == "sdist": os.unlink("MANIFEST") except: pass ## Re-create the manpage ## (Beware! Perl script on the loose!!) if sys.argv[1] == "sdist": if os.stat_result(os.stat("s3cmd.1")).st_mtime < os.stat_result(os.stat("s3cmd")).st_mtime: sys.stderr.write("Re-create man page first!\n") sys.stderr.write("Run: ./s3cmd --help | ./format-manpage.pl > s3cmd.1\n") sys.exit(1) ## Don't install manpages and docs when $S3CMD_PACKAGING is set ## This was a requirement of Debian package maintainer. if not os.getenv("S3CMD_PACKAGING"): man_path = os.getenv("S3CMD_INSTPATH_MAN") or "share/man" doc_path = os.getenv("S3CMD_INSTPATH_DOC") or "share/doc/packages" data_files = [ (doc_path+"/s3cmd", [ "README.md", "INSTALL", "NEWS" ]), (man_path+"/man1", [ "s3cmd.1" ] ), ] else: data_files = None ## Main distutils info setup( ## Content description name = S3.PkgInfo.package, version = S3.PkgInfo.version, packages = [ 'S3' ], scripts = ['s3cmd'], data_files = data_files, ## Packaging details author = "Michal Ludvig", author_email = "michal@logix.cz", maintainer = "github.com/mdomsch, github.com/matteobar", maintainer_email = "s3tools-bugs@lists.sourceforge.net", url = S3.PkgInfo.url, license = S3.PkgInfo.license, description = S3.PkgInfo.short_description, long_description = """ %s Authors: -------- Michal Ludvig """ % (S3.PkgInfo.long_description), classifiers = [ 'Development Status :: 5 - Production/Stable', 'Environment :: Console', 'Environment :: MacOS X', 'Environment :: Win32 (MS Windows)', 'Intended Audience :: End Users/Desktop', 'Intended Audience :: System Administrators', 'License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)', 'Natural Language :: English', 'Operating System :: MacOS :: MacOS X', 'Operating System :: Microsoft :: Windows', 'Operating System :: POSIX', 'Operating System :: Unix', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 2 :: Only', 'Topic :: System :: Archiving', 'Topic :: Utilities', ], install_requires = ["python-dateutil", "python-magic"] ) # vim:et:ts=4:sts=4:ai s3cmd-1.6.1/PKG-INFO0000664000175000017500000000274512647747124014763 0ustar mdomschmdomsch00000000000000Metadata-Version: 1.1 Name: s3cmd Version: 1.6.1 Summary: Command line tool for managing Amazon S3 and CloudFront services Home-page: http://s3tools.org Author: github.com/mdomsch, github.com/matteobar Author-email: s3tools-bugs@lists.sourceforge.net License: GNU GPL v2+ Description: S3cmd lets you copy files from/to Amazon S3 (Simple Storage Service) using a simple to use command line client. Supports rsync-like backup, GPG encryption, and more. Also supports management of Amazon's CloudFront content delivery network. Authors: -------- Michal Ludvig Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Environment :: MacOS X Classifier: Environment :: Win32 (MS Windows) Classifier: Intended Audience :: End Users/Desktop Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+) Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: Unix Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 2 :: Only Classifier: Topic :: System :: Archiving Classifier: Topic :: Utilities s3cmd-1.6.1/MANIFEST.in0000664000175000017500000000005712647745544015422 0ustar mdomschmdomsch00000000000000include INSTALL README.md NEWS include s3cmd.1 s3cmd-1.6.1/NEWS0000664000175000017500000003067112647746765014376 0ustar mdomschmdomsch00000000000000s3cmd-1.6.1 - 2016-01-20 =============== * Added --host and --host-bucket * Added --stats * Fix for newer python 2.7.x SSL library updates * Many other bug fixes s3cmd-1.6.0 - 2015-09-18 =============== * Support signed URL content disposition type * Added 'ls -l' long listing including storage class * Added --limit-rate=RATE * Added --server-side-encryption-kms-id=KEY_ID * Added --storage-class=CLASS * Added --requester-pays, [payer] command * Added --[no-]check-hostname * Added --stop-on-error, removed --ignore-failed-copy * Added [setcors], [delcors] commands * Added support for cn-north-1 region hostname checks * Output strings may have changed. Scripts calling s3cmd expecting specific text may need to be updated. * HTTPS is now the default * Many unicode fixes * Many other bug fixes s3cmd-1.5.2 - 2015-02-08 =============== * Handle unvalidated SSL certificate. Necessary on Ubuntu 14.04 for SSL to function at all. * packaging fixes (require python-magic, drop ez_setup) s3cmd-1.5.1.2 - 2015-02-04 =============== * fix PyPi install s3cmd-1.5.1 - 2015-02-04 =============== * Sort s3cmd ls output by bucket name (Andrew Gaul) * Support relative expiry times in signurl. (Chris Lamb) * Fixed issue with mixed path separators with s3cmd get --recursive on Windows. (Luke Winslow) * fix S3 wildcard certificate checking * Handle headers with spaces in their values properly (#460) * Fix lack of SSL certificate checking libraries on older python * set content-type header for stdin from command line or Config() * fix uploads from stdin (#464) * Fix directory exclusions (#467) * fix signurl * Don't retry in response to HTTP 405 error (#422) * Don't crash when a proxy returns an invalid XML error document s3cmd-1.5.0 - 2015-01-12 =============== * add support for newer regions such as Frankfurt that require newer authorization signature v4 support (Vasileios Mitrousis, Michal Ludvig, Matt Domsch) * drop support for python 2.4 due to signature v4 code. python 2.6 is now the minimum, and python 3 is still not supported. * handle redirects to the "right" region for a bucket. * add --ca-cert=FILE for self-signed certs (Matt Domsch) * allow proxied SSL connections with python >= 2.7 (Damian Gerow) * add --remove-headers for [modify] command (Matt Domsch) * add -s/--ssl and --no-ssl options (Viktor Szakáts) * add --signature-v2 for backwards compatibility with S3 clones. * bugfixes by 17 contributors s3cmd 1.5.0-rc1 - 2014-06-29 =============== * add environment variable S3CMD_CONFIG (Devon Jones), access key and secre keys (Vasileios Mitrousis) * added modify command (Francois Gaudin) * better debug messages (Matt Domsch) * faster batch deletes (Matt Domsch) * Added support for restoring files from Glacier storage (Robert Palmer) * Add and remove full lifecycle policies (Sam Rudge) * Add support for object expiration (hrchu) * bugfixes by 26 contributors s3cmd 1.5.0-beta1 - 2013-12-02 ================= * Brougt to you by Matt Domsch and contributors, thanks guys! :) * Multipart upload improvements (Eugene Brevdo, UENISHI Kota) * Allow --acl-grant on AWS groups (Dale Lovelace) * Added Server-Side Encryption support (Kevin Daub) * Improved MIME types detections and content encoding (radomir, Eric Drechsel, George Melika) * Various smaller changes and bugfixes from many contributors s3cmd 1.5.0-alpha3 - 2013-03-11 ================== * Persistent HTTP/HTTPS connections for massive speedup (Michal Ludvig) * New switch --quiet for suppressing all output (Siddarth Prakash) * Honour "umask" on file downloads (Jason Dalton) * Various bugfixes from many contributors s3cmd 1.5.0-alpha2 - 2013-03-04 ================== * IAM roles support (David Kohen, Eric Dowd) * Manage bucket policies (Kota Uenishi) * Various bugfixes from many contributors s3cmd 1.5.0-alpha1 - 2013-02-19 ================== * Server-side copy for hardlinks/softlinks to improve performance (Matt Domsch) * New [signurl] command (Craig Ringer) * Improved symlink-loop detection (Michal Ludvig) * Add --delete-after option for sync (Matt Domsch) * Handle empty return bodies when processing S3 errors. (Kelly McLaughlin) * Upload from STDIN (Eric Connell) * Updated bucket locations (Stefhen Hovland) * Support custom HTTP headers (Brendan O'Connor, Karl Matthias) * Improved MIME support (Karsten Sperling, Christopher Noyes) * Added support for --acl-grant/--acl-revoke to 'sync' command (Michael Tyson) * CloudFront: Support default index and default root invalidation (Josep del Rio) * Command line options for access/secret keys (Matt Sweeney) * Support [setpolicy] for setting bucket policies (Joe Fiorini) * Respect the $TZ environment variable (James Brown) * Reduce memory consumption for [s3cmd du] (Charlie Schluting) * Rate limit progress updates (Steven Noonan) * Download from S3 to a temp file first (Sumit Kumar) * Reuse a single connection when doing a bucket list (Kelly McLaughlin) * Delete empty files if object_get() failed (Oren Held) s3cmd 1.1.0 - (never released) =========== * MultiPart upload enabled for both [put] and [sync]. Default chunk size is 15MB. * CloudFront invalidation via [sync --cf-invalidate] and [cfinvalinfo]. * Increased socket_timeout from 10 secs to 5 mins. * Added "Static WebSite" support [ws-create / ws-delete / ws-info] (contributed by Jens Braeuer) * Force MIME type with --mime-type=abc/xyz, also --guess-mime-type is now on by default, -M is no longer shorthand for --guess-mime-type * Allow parameters in MIME types, for example: --mime-type="text/plain; charset=utf-8" * MIME type can be guessed by python-magic which is a lot better than relying on the extension. Contributed by Karsten Sperling. * Support for environment variables as config values. For instance in ~/.s3cmd put "access_key=$S3_ACCESS_KEY". Contributed by Ori Bar. * Support for --configure checking access to a specific bucket instead of listing all buckets. Listing buckets requires the S3 ListAllMyBuckets permission which is typically not available to delegated IAM accounts. With this change, s3cmd --configure accepts an (optional) bucket uri as a parameter and if it's provided, the access check will just verify access to this bucket individually. Contributed by Mike Repass. * Allow STDOUT as a destination even for downloading multiple files. They will be output one after another without any delimiters! Contributed by Rob Wills. s3cmd 1.0.0 - 2011-01-18 =========== * [sync] now supports --no-check-md5 * Network connections now have 10s timeout * [sync] now supports bucket-to-bucket synchronisation * Added [accesslog] command. * Added access logging for CloudFront distributions using [cfmodify --log] * Added --acl-grant and --acl-revoke [Timothee Groleau] * Allow s3:// URI as well as cf:// URI as a distribution name for most CloudFront related commands. * Support for Reduced Redundancy Storage (--reduced-redundancy) * Follow symlinks in [put] and [sync] with --follow-symlinks * Support for CloudFront DefaultRootObject [Luke Andrew] s3cmd 0.9.9.91 - 2009-10-08 ============== * Fixed invalid reference to a variable in failed upload handling. s3cmd 0.9.9.90 - 2009-10-06 ============== * New command 'sign' for signing e.g. POST upload policies. * Fixed handling of filenames that differ only in capitalisation (eg blah.txt vs Blah.TXT). * Added --verbatim mode, preventing most filenames pre-processing. Good for fixing unreadable buckets. * Added --recursive support for [cp] and [mv], including multiple-source arguments, --include/--exclude, --dry-run, etc. * Added --exclude/--include and --dry-run for [del], [setacl]. * Neutralise characters that are invalid in XML to avoid ExpatErrors. http://boodebr.org/main/python/all-about-python-and-unicode * New command [fixbucket] for for fixing invalid object names in a given Bucket. For instance names with  in them (not sure how people manage to upload them but they do). s3cmd 0.9.9 - 2009-02-17 =========== New commands: * Commands for copying and moving objects, within or between buckets: [cp] and [mv] (Andrew Ryan) * CloudFront support through [cfcreate], [cfdelete], [cfmodify] and [cfinfo] commands. (sponsored by Joseph Denne) * New command [setacl] for setting ACL on existing objects, use together with --acl-public/--acl-private (sponsored by Joseph Denne) Other major features: * Improved source dirname handling for [put], [get] and [sync]. * Recursive and wildcard support for [put], [get] and [del]. * Support for non-recursive [ls]. * Enabled --dry-run for [put], [get] and [sync]. * Allowed removal of non-empty buckets with [rb --force]. * Implemented progress meter (--progress / --no-progress) * Added --include / --rinclude / --(r)include-from options to override --exclude exclusions. * Added --add-header option for [put], [sync], [cp] and [mv]. Good for setting e.g. Expires or Cache-control headers. * Added --list-md5 option for [ls]. * Continue [get] partially downloaded files with --continue * New option --skip-existing for [get] and [sync]. Minor features and bugfixes: * Fixed GPG (--encrypt) compatibility with Python 2.6. * Always send Content-Length header to satisfy some http proxies. * Fixed installation on Windows and Mac OS X. * Don't print nasty backtrace on KeyboardInterrupt. * Should work fine on non-UTF8 systems, provided all the files are in current system encoding. * System encoding can be overridden using --encoding. * Improved resistance to communication errors (Connection reset by peer, etc.) s3cmd 0.9.8.4 - 2008-11-07 ============= * Stabilisation / bugfix release: * Restored access to upper-case named buckets. * Improved handling of filenames with Unicode characters. * Avoid ZeroDivisionError on ultrafast links (for instance on Amazon EC2) * Re-issue failed requests (e.g. connection errors, internal server errors, etc). * Sync skips over files that can't be open instead of terminating the sync completely. * Doesn't run out of open files quota on sync with lots of files. s3cmd 0.9.8.3 - 2008-07-29 ============= * Bugfix release. Avoid running out-of-memory in MD5'ing large files. s3cmd 0.9.8.2 - 2008-06-27 ============= * Bugfix release. Re-upload file if Amazon doesn't send ETag back. s3cmd 0.9.8.1 - 2008-06-27 ============= * Bugfix release. Fixed 'mb' and 'rb' commands again. s3cmd 0.9.8 - 2008-06-23 =========== * Added --exclude / --rexclude options for sync command. * Doesn't require $HOME env variable to be set anymore. * Better checking of bucket names to Amazon S3 rules. s3cmd 0.9.7 - 2008-06-05 =========== * Implemented 'sync' from S3 back to local folder, including file attribute restoration. * Failed uploads are retried on lower speed to improve error resilience. * Compare MD5 of the uploaded file, compare with checksum reported by S3 and re-upload on mismatch. s3cmd 0.9.6 - 2008-02-28 =========== * Support for setting / guessing MIME-type of uploaded file * Correctly follow redirects when accessing buckets created in Europe. * Introduced 'info' command both for buckets and objects * Correctly display public URL on uploads * Updated TODO list for everyone to see where we're heading * Various small fixes. See ChangeLog for details. s3cmd 0.9.5 - 2007-11-13 =========== * Support for buckets created in Europe * Initial 'sync' support, for now local to s3 direction only * Much better handling of multiple args to put, get and del * Tries to use ElementTree from any available module * Support for buckets with over 1000 objects. s3cmd 0.9.4 - 2007-08-13 =========== * Support for transparent GPG encryption of uploaded files * HTTP proxy support * HTTPS protocol support * Support for non-ASCII characters in uploaded filenames s3cmd 0.9.3 - 2007-05-26 =========== * New command "du" for displaying size of your data in S3. (Basil Shubin) s3cmd 0.9.2 - 2007-04-09 =========== * Lots of new documentation * Allow "get" to stdout (use "-" in place of destination file to get the file contents on stdout) * Better compatibility with Python 2.4 * Output public HTTP URL for objects stored with Public ACL * Various bugfixes and improvements s3cmd 0.9.1 - 2007-02-06 =========== * All commands now use S3-URIs * Removed hard dependency on Python 2.5 * Experimental support for Python 2.4 (requires external ElementTree module) s3cmd 0.9.0 - 2007-01-18 =========== * First public release brings support for all basic Amazon S3 operations: Creation and Removal of buckets, Upload (put), Download (get) and Removal (del) of files/objects.