s3cmd-2.4.0/0000775000175100017510000000000014535744737012137 5ustar floflo00000000000000s3cmd-2.4.0/NEWS0000664000175100017510000006341014535744272012634 0ustar floflo00000000000000s3cmd-2.4.0 - 2023-12-12 =============== * Added "setversioning" command for versioning configuration (Kuan-Chun Wang) * Added "settagging", "gettagging", and "deltagging" commands for bucket/object tagging (Kuan-Chun Wang) * Added "setobjectretention" and "setobjectlegalhold" commands (Etienne Adam/Withings SAS) * Added "setownership" and "setblockpublicaccess" commands * Added "cfinval" command to request Cloudfront to invalidate paths (#1256) * Added "--keep-dirs" option to have the folder structure preserved on remote side * Added --skip-destination-validation option for "setnotification" command (Kuan-Chun Wang) * Added "--max-retries" flag and "max_retries" config option (#914) * Added FIPS support (Michael Roth) * Added "object ownership" and block public access" values to "info" command output for buckets * Added to "ls" command a "DIROBJ" tag for directory objects in S3 remote * Added server profiles to run-tests.py to skip tests depending on the server type * Fixed "TypeError: sequence item 1: expected str instance, bytes found" error with Python 3.12 (#1343) * Fixed a missing return for "object_batch_delete" of S3.py (James Hewitt) * Fixed "object is not callable" error because of md5 FIPS test (#1005) * Fixed "compute_content_md5 is not defined" error for "setversioning" (#1312) (Gavin John) * Fixed list objects to use NextMarker when only prefixes are returned (Albin Parou) * Fixed upload to not retry when an S3 compatible server is full * Fixed recursive delete of objects named with whitespace (#976) * Fixed the mime type when uploading directories to be "application/x-directory" * Fixed "string indices must be integers" error for sync when in dry-run mode (#1313) (Shohei Tanaka) * Fixed SignatureDoesNotMatch error when modifying an object on Cloudflare R2 (#1332) (Philip C Nilsson) * Fixed Cloudfront invalidation issue for paths with wildcard or special characters * Fixed Cloudfront crash because of error reporting for retries * Fixed Cloudfront "unable to parse URL" error (#1292) * Improved the handling of "empty" files on the remote side to sync with local folders * Improved "abortmp" command by requiring an object to avoid bad accidents when using Ceph (Joshua Haas) * Improved file download by retrying when encountering SlowDown or TooManyRequests errors (Robin Geiger) * Improved error messages in case of connection error or host unreachable * Improved error messages to be more explicit for upload errors after retries * Improved remote2local attributes setting code * Improved remote2local with more explicit error messages when setting attributes (#1288) * Improved remote2local output messages by using the "mkdir" prefix instead of "make dir" * Improved the SortedDict class * Improved run-test.py by using "--include" when calling Curl instead of "-include" (Matthew James Kraai) * Improved GitHub CI by enabling pip cache in actions/setup-python (Anton Yakutovich) * Improved GitHub CI by adding a "codespell" check on push and PRs (Yaroslav Halchenko) * Updated GitHub CI tests to use more recent versions of Minio and Python * Upgraded GitHub actions (Anton Yakutovich) * Cleanup and update of copyright headers, docs, comments and setup.py * Cleanup to fix "invalid escape sequence" syntax warnings * Many other bug fixes and cleanups s3cmd-2.3.0 - 2022-10-03 =============== * Added "getnotification", "setnotification", and "delnotification" commands for notification policies (hrchu) * Added support for AWS_STS_REGIONAL_ENDPOINTS (#1218, #1228) (Johan Lanzrein) * Added ConnectionRefused [111] exit code to handle connection errors (Salar Nosrati-Ershad) * Added support for IMDSv2. Should work automatically on ec2 (Anthony Foiani) * Added --list-allow-unordered to list objects unordered. Only supported by Ceph based s3-compatible services (#1269) (Salar Nosrati-Ershad) * Fixed --exclude dir behavior for python >= 3.6 (Daniil Tararukhin) * Fixed Cloudfront invalidate retry issue (Yuan-Hsiang Lee) * Fixed 0 byte cache files crashing s3cmd (#1234) (Carlos Laviola) * Fixed --continue behavior for the "get" command (#1009) (Anton Ustyugov) * Fixed unicode issue with fixbucket (#1259) * Fixed CannotSendRequest and ConnectionRefusedError errors at startup (#1261) * Fixed error reporting for object info when the object does not exist * Fixed "setup.py test" to do nothing to avoid failure that could be problematic for distribution packaging (#996) * Improved expire command to use Rule/Filter/Prefix for LifecycleConfiguration (#1247) * Improved PASS/CHECK/INCLUDE/EXCLUDE debug log messages * Improved setup.py with python 3.9 and 3.10 support info(Ori Avtalion) * Many other bug fixes s3cmd-2.2.0 - 2021-09-27 =============== * Added support for metadata modification of files bigger than 5 GiB * Added support for remote copy of files bigger than 5 GiB using MultiPart copy (Damian Martinez, Florent Viard) * Added progress info output for multipart copy and current-total info in output for cp, mv and modify * Added support for all special/foreign character names in object names to cp/mv/modify * Added support for SSL authentication (Aleksandr Chazov) * Added the http error 429 to the list of retryable errors (#1096) * Added support for listing and resuming of multipart uploads of more than 1000 parts (#346) * Added time based expiration for idle pool connections in order to avoid random broken pipe errors (#1114) * Added support for STS webidentity authentication (ie AssumeRole and AssumeRoleWithWebIdentity) (Samskeyti, Florent Viard) * Added support for custom headers to the mb command (#1197) (Sébastien Vajda) * Improved MultiPart copy to preserve acl and metadata of objects * Improved the server errors catching and reporting for cp/mv/modify commands * Improved resiliency against servers sending garbage responses (#1088, #1090, #1093) * Improved remote copy to have consistent copy of metadata in all cases: multipart or not, aws or not * Improved security by revoking public-write acl when private acl is set (#1151) (ruanzitao) * Improved speed when running on an EC2 instance (#1117) (Patrick Allain) * Reduced connection_max_age to 5s to avoid broken pipes as AWS closes https conns after around 6s (#1114) * Ensure that KeyboardInterrupt are always properly raised (#1089) * Changed sized of multipart copy chunks to 1 GiB * Fixed ValueError when using more than one ":" inside add_header in config file (#1087) * Fixed extra label issue when stdin used as source of a MultiPart upload * Fixed remote copy to allow changing the mime-type (ie content-type) of the new object * Fixed remote_copy to ensure that meta-s3cmd-attrs will be set based on the real source and not on the copy source * Fixed deprecation warnings due to invalid escape sequences (Karthikeyan Singaravelan) * Fixed getbucketinfo that was broken when the bucket lifecycle uses the filter element (Liu Lan) * Fixed RestoreRequest XML namespace URL (#1203) (Akete) * Fixed PARTIAL exit code that was not properly set when needed for object_get (#1190) * Fixed a possible infinite loop when a file is truncated during hashsum or upload (#1125) (Matthew Krokosz, Florent Viard) * Fixed report_exception wrong error when LANG env var was not set (#1113) * Fixed wrong wiki url in error messages (Alec Barrett) * Py3: Fixed an AttributeError when using the "files-from" option * Py3: Fixed compatibility issues due to the removal of getchildren() from ElementTree in python 3.9 (#1146, #1157, #1162, # 1182, #1210) (Ondřej Budai) * Py3: Fixed compatibility issues due to the removal of encodestring() in python 3.9 (#1161, #1174) (Kentaro Kaneki) * Fixed a crash when the AWS_ACCESS_KEY env var is set but not AWS_SECRET_KEY (#1201) * Cleanup of check_md5 (Riccardo Magliocchetti) * Removed legacy code for dreamhost that should be necessary anymore * Migrated CI tests to use github actions (Arnaud Jaffre) * Improved README with a link to INSTALL.md (Sia Karamalegos) * Improved help content (Dmitrii Korostelev, Roland Van Laar) * Improvements for setup and build configurations * Many other bug fixes s3cmd-2.1.0 - 2020-04-07 =============== * Changed size reporting using k instead of K as it a multiple of 1024 (#956) * Added "public_url_use_https" config to generate public url using https (#551, #666) (Jukka Nousiainen) * Added option to make connection pooling configurable and improvements (Arto Jantunen) * Added support for path-style bucket access to signurl (Zac Medico) * Added docker configuration and help to run test cases with multiple Python versions (Doug Crozier) * Relaxed limitation on special chars for --add-header key names (#1054) * Fixed all regions that were automatically converted to lower case (Harshavardhana) * Fixed size and alignment of DU and LS output reporting (#956) * Fixes for SignatureDoesNotMatch error when host port 80 or 443 is specified, due to stupid servers (#1059) * Fixed the useless retries of requests that fail because of ssl cert checks * Fixed a possible crash when a file disappears during cache generation (#377) * Fixed unicode issues with IAM (#987) * Fixed unicode errors with bucked Policy/CORS requests (#847) (Alex Offshore) * Fixed unicode issues when loading aws_credential_file (#989) * Fixed an issue with the tenant feature of CephRGW. Url encode bucket_name for path-style requests (#1080) * Fixed signature v2 always used when bucket_name had special chars (#1081) * Allow to use signature v4 only, even for commands without buckets specified (#1082) * Fixed small open file descriptor leaks. * Py3: Fixed hash-bang in headers to not force using python2 when setup/s3cmd/run-test scripts are executed directly. * Py3: Fixed unicode issues with Cloudfront (#1006) * Py3: Fixed http.client.RemoteDisconnected errors (#1014) (Ryan Huddleston) * Py3: Fixed 'dictionary changed size during iteration' error when using a cache-file (#945) (Doug Crozier) * Py3: Fixed the display of file sizes (Vlad Presnyak) * Py3: Python 3.8 compatibility fixes (Konstantin Shalygin) * Py2: Fixed unicode errors sometimes crashing remote2remote sync (#847) * Added s3cmd.egg-info to .gitignore (Philip Dubé) * Improved run-test script to not use hard-coded bucket names(#1066) (Doug Crozier) * Renamed INSTALL to INSTALL.md and improvements (Nitro, Prabhakar Gupta) * Improved the restore command help (Hrchu) * Updated the storage-class command help with the recent aws s3 classes (#1020) * Fixed typo in the --continue-put help message (Pengyu Chen) * Fixed typo (#1062) (Tim Gates) * Improvements for setup and build configurations * Many other bug fixes s3cmd-2.0.2 - 2018-07-15 =============== * Fixed unexpected timeouts encountered during requests or transfers due to AWS strange connection short timeouts (#941) * Fixed a throttle issue slowing down too much transfers in some cases (#913) * Added support for $AWS_PROFILE (#966) (Taras Postument) * Added clarification comment for the socket_timeout configuration value OS limit * Avoid distutils usage at runtime (Matthias Klose) * Python 2 compatibility: Fixed import error of which with fallback code (Gianfranco Costamagna) * Fixed Python 3 bytes string encoding when getting IAM credentials (Alexander Allakhverdiyev) * Fixed handling of config tri-state bool values (like acl_public) (Brian C. Lane) * Fixed V2 signature when restore command is used (Jan Kasiak) * Fixed setting full_control on objects with public read access (Matthew Vernon) * Fixed a bug when only one path is supplied with Cloudfront. (Mikael Svensson) * Fixed signature errors with 'modify' requests (Radek Simko) * Fixed #936 - Fix setacl command exception (Robert Moucha) * Fixed error reporting if deleting a source object failed after a move (#929) * Many other bug fixes (#525, #933, #940, #947, #957, #958, #960, #967) Important info: AWS S3 doesn't allow anymore uppercases and underscores in bucket names since march 1, 2018 s3cmd-2.0.1 - 2017-10-21 =============== * Support for Python 3 is now stable * Fixed signature issues due to upper cases in hostname (#920) * Improved support for Minio Azure gateway (Julien Maitrehenry, Harshavardhana) * Added signurl_use_https option to use https prefix for signurl (Julien Recurt) * Fixed a lot of remaining issues and regressions for Python 3 (#922, #921, #908) * Fixed --configure option with Python 3 * Fixed non string cmdline parameters being ignored * Windows support fixes (#922) * Don't force anymore to have a / on last parameter for the "modify" command (#886) * Removed the python3 support warning * Detect and report error 403 in getpolicy for info command (#894) * Added a specific error message when getting policy by non owner (#885) * Many other bug fixes (#905, #892, #890, #888, #889, #887) s3cmd-2.0.0 - 2017-06-26 =============== * Added support for Python 3 (Shaform, Florent Viard) * Added getlifecycle command (Daniel Gryniewicz) * Added --cf-inval for invalidating multiple CF distributions (Joe Mifsud) * Added --limit to "ls" and "la" commands to return the specified number of objects (Masashi Ozawa) * Added --token-refresh and --no-token-refresh and get the access token from the environment (Marco Jakob) * Added --restore-priority and --restore-days for S3 Glacier (Robert Palmer) * Fixed requester pays header with HEAD requests (Christian Rodriguez) * Don't allow mv/cp of multiple files to single file (Guy Gur-Ari) * Generalize wildcard certificate forgiveness (Mark Titorenko) * Multiple fixes for SSL connections and proxies * Added support for HTTP 100-CONTINUE * Fixes for s3-like servers * Big cleanup and many unicode fixes * Many other bug fixes s3cmd-1.6.1 - 2016-01-20 =============== * Added --host and --host-bucket * Added --stats * Fix for newer python 2.7.x SSL library updates * Many other bug fixes s3cmd-1.6.0 - 2015-09-18 =============== * Support signed URL content disposition type * Added 'ls -l' long listing including storage class * Added --limit-rate=RATE * Added --server-side-encryption-kms-id=KEY_ID * Added --storage-class=CLASS * Added --requester-pays, [payer] command * Added --[no-]check-hostname * Added --stop-on-error, removed --ignore-failed-copy * Added [setcors], [delcors] commands * Added support for cn-north-1 region hostname checks * Output strings may have changed. Scripts calling s3cmd expecting specific text may need to be updated. * HTTPS is now the default * Many unicode fixes * Many other bug fixes s3cmd-1.5.2 - 2015-02-08 =============== * Handle unvalidated SSL certificate. Necessary on Ubuntu 14.04 for SSL to function at all. * packaging fixes (require python-magic, drop ez_setup) s3cmd-1.5.1.2 - 2015-02-04 =============== * fix PyPi install s3cmd-1.5.1 - 2015-02-04 =============== * Sort s3cmd ls output by bucket name (Andrew Gaul) * Support relative expiry times in signurl. (Chris Lamb) * Fixed issue with mixed path separators with s3cmd get --recursive on Windows. (Luke Winslow) * fix S3 wildcard certificate checking * Handle headers with spaces in their values properly (#460) * Fix lack of SSL certificate checking libraries on older python * set content-type header for stdin from command line or Config() * fix uploads from stdin (#464) * Fix directory exclusions (#467) * fix signurl * Don't retry in response to HTTP 405 error (#422) * Don't crash when a proxy returns an invalid XML error document s3cmd-1.5.0 - 2015-01-12 =============== * add support for newer regions such as Frankfurt that require newer authorization signature v4 support (Vasileios Mitrousis, Michal Ludvig, Matt Domsch) * drop support for python 2.4 due to signature v4 code. python 2.6 is now the minimum, and python 3 is still not supported. * handle redirects to the "right" region for a bucket. * add --ca-cert=FILE for self-signed certs (Matt Domsch) * allow proxied SSL connections with python >= 2.7 (Damian Gerow) * add --remove-headers for [modify] command (Matt Domsch) * add -s/--ssl and --no-ssl options (Viktor Szakáts) * add --signature-v2 for backwards compatibility with S3 clones. * bugfixes by 17 contributors s3cmd 1.5.0-rc1 - 2014-06-29 =============== * add environment variable S3CMD_CONFIG (Devon Jones), access key and secre keys (Vasileios Mitrousis) * added modify command (Francois Gaudin) * better debug messages (Matt Domsch) * faster batch deletes (Matt Domsch) * Added support for restoring files from Glacier storage (Robert Palmer) * Add and remove full lifecycle policies (Sam Rudge) * Add support for object expiration (hrchu) * bugfixes by 26 contributors s3cmd 1.5.0-beta1 - 2013-12-02 ================= * Brougt to you by Matt Domsch and contributors, thanks guys! :) * Multipart upload improvements (Eugene Brevdo, UENISHI Kota) * Allow --acl-grant on AWS groups (Dale Lovelace) * Added Server-Side Encryption support (Kevin Daub) * Improved MIME types detections and content encoding (radomir, Eric Drechsel, George Melika) * Various smaller changes and bugfixes from many contributors s3cmd 1.5.0-alpha3 - 2013-03-11 ================== * Persistent HTTP/HTTPS connections for massive speedup (Michal Ludvig) * New switch --quiet for suppressing all output (Siddarth Prakash) * Honour "umask" on file downloads (Jason Dalton) * Various bugfixes from many contributors s3cmd 1.5.0-alpha2 - 2013-03-04 ================== * IAM roles support (David Kohen, Eric Dowd) * Manage bucket policies (Kota Uenishi) * Various bugfixes from many contributors s3cmd 1.5.0-alpha1 - 2013-02-19 ================== * Server-side copy for hardlinks/softlinks to improve performance (Matt Domsch) * New [signurl] command (Craig Ringer) * Improved symlink-loop detection (Michal Ludvig) * Add --delete-after option for sync (Matt Domsch) * Handle empty return bodies when processing S3 errors. (Kelly McLaughlin) * Upload from STDIN (Eric Connell) * Updated bucket locations (Stefhen Hovland) * Support custom HTTP headers (Brendan O'Connor, Karl Matthias) * Improved MIME support (Karsten Sperling, Christopher Noyes) * Added support for --acl-grant/--acl-revoke to 'sync' command (Michael Tyson) * CloudFront: Support default index and default root invalidation (Josep del Rio) * Command line options for access/secret keys (Matt Sweeney) * Support [setpolicy] for setting bucket policies (Joe Fiorini) * Respect the $TZ environment variable (James Brown) * Reduce memory consumption for [s3cmd du] (Charlie Schluting) * Rate limit progress updates (Steven Noonan) * Download from S3 to a temp file first (Sumit Kumar) * Reuse a single connection when doing a bucket list (Kelly McLaughlin) * Delete empty files if object_get() failed (Oren Held) s3cmd 1.1.0 - (never released) =========== * MultiPart upload enabled for both [put] and [sync]. Default chunk size is 15MB. * CloudFront invalidation via [sync --cf-invalidate] and [cfinvalinfo]. * Increased socket_timeout from 10 secs to 5 mins. * Added "Static WebSite" support [ws-create / ws-delete / ws-info] (contributed by Jens Braeuer) * Force MIME type with --mime-type=abc/xyz, also --guess-mime-type is now on by default, -M is no longer shorthand for --guess-mime-type * Allow parameters in MIME types, for example: --mime-type="text/plain; charset=utf-8" * MIME type can be guessed by python-magic which is a lot better than relying on the extension. Contributed by Karsten Sperling. * Support for environment variables as config values. For instance in ~/.s3cmd put "access_key=$S3_ACCESS_KEY". Contributed by Ori Bar. * Support for --configure checking access to a specific bucket instead of listing all buckets. Listing buckets requires the S3 ListAllMyBuckets permission which is typically not available to delegated IAM accounts. With this change, s3cmd --configure accepts an (optional) bucket uri as a parameter and if it's provided, the access check will just verify access to this bucket individually. Contributed by Mike Repass. * Allow STDOUT as a destination even for downloading multiple files. They will be output one after another without any delimiters! Contributed by Rob Wills. s3cmd 1.0.0 - 2011-01-18 =========== * [sync] now supports --no-check-md5 * Network connections now have 10s timeout * [sync] now supports bucket-to-bucket synchronisation * Added [accesslog] command. * Added access logging for CloudFront distributions using [cfmodify --log] * Added --acl-grant and --acl-revoke [Timothee Groleau] * Allow s3:// URI as well as cf:// URI as a distribution name for most CloudFront related commands. * Support for Reduced Redundancy Storage (--reduced-redundancy) * Follow symlinks in [put] and [sync] with --follow-symlinks * Support for CloudFront DefaultRootObject [Luke Andrew] s3cmd 0.9.9.91 - 2009-10-08 ============== * Fixed invalid reference to a variable in failed upload handling. s3cmd 0.9.9.90 - 2009-10-06 ============== * New command 'sign' for signing e.g. POST upload policies. * Fixed handling of filenames that differ only in capitalisation (eg blah.txt vs Blah.TXT). * Added --verbatim mode, preventing most filenames pre-processing. Good for fixing unreadable buckets. * Added --recursive support for [cp] and [mv], including multiple-source arguments, --include/--exclude, --dry-run, etc. * Added --exclude/--include and --dry-run for [del], [setacl]. * Neutralise characters that are invalid in XML to avoid ExpatErrors. http://boodebr.org/main/python/all-about-python-and-unicode * New command [fixbucket] for for fixing invalid object names in a given Bucket. For instance names with  in them (not sure how people manage to upload them but they do). s3cmd 0.9.9 - 2009-02-17 =========== New commands: * Commands for copying and moving objects, within or between buckets: [cp] and [mv] (Andrew Ryan) * CloudFront support through [cfcreate], [cfdelete], [cfmodify] and [cfinfo] commands. (sponsored by Joseph Denne) * New command [setacl] for setting ACL on existing objects, use together with --acl-public/--acl-private (sponsored by Joseph Denne) Other major features: * Improved source dirname handling for [put], [get] and [sync]. * Recursive and wildcard support for [put], [get] and [del]. * Support for non-recursive [ls]. * Enabled --dry-run for [put], [get] and [sync]. * Allowed removal of non-empty buckets with [rb --force]. * Implemented progress meter (--progress / --no-progress) * Added --include / --rinclude / --(r)include-from options to override --exclude exclusions. * Added --add-header option for [put], [sync], [cp] and [mv]. Good for setting e.g. Expires or Cache-control headers. * Added --list-md5 option for [ls]. * Continue [get] partially downloaded files with --continue * New option --skip-existing for [get] and [sync]. Minor features and bugfixes: * Fixed GPG (--encrypt) compatibility with Python 2.6. * Always send Content-Length header to satisfy some http proxies. * Fixed installation on Windows and Mac OS X. * Don't print nasty backtrace on KeyboardInterrupt. * Should work fine on non-UTF8 systems, provided all the files are in current system encoding. * System encoding can be overridden using --encoding. * Improved resistance to communication errors (Connection reset by peer, etc.) s3cmd 0.9.8.4 - 2008-11-07 ============= * Stabilisation / bugfix release: * Restored access to upper-case named buckets. * Improved handling of filenames with Unicode characters. * Avoid ZeroDivisionError on ultrafast links (for instance on Amazon EC2) * Re-issue failed requests (e.g. connection errors, internal server errors, etc). * Sync skips over files that can't be open instead of terminating the sync completely. * Doesn't run out of open files quota on sync with lots of files. s3cmd 0.9.8.3 - 2008-07-29 ============= * Bugfix release. Avoid running out-of-memory in MD5'ing large files. s3cmd 0.9.8.2 - 2008-06-27 ============= * Bugfix release. Re-upload file if Amazon doesn't send ETag back. s3cmd 0.9.8.1 - 2008-06-27 ============= * Bugfix release. Fixed 'mb' and 'rb' commands again. s3cmd 0.9.8 - 2008-06-23 =========== * Added --exclude / --rexclude options for sync command. * Doesn't require $HOME env variable to be set anymore. * Better checking of bucket names to Amazon S3 rules. s3cmd 0.9.7 - 2008-06-05 =========== * Implemented 'sync' from S3 back to local folder, including file attribute restoration. * Failed uploads are retried on lower speed to improve error resilience. * Compare MD5 of the uploaded file, compare with checksum reported by S3 and re-upload on mismatch. s3cmd 0.9.6 - 2008-02-28 =========== * Support for setting / guessing MIME-type of uploaded file * Correctly follow redirects when accessing buckets created in Europe. * Introduced 'info' command both for buckets and objects * Correctly display public URL on uploads * Updated TODO list for everyone to see where we're heading * Various small fixes. See ChangeLog for details. s3cmd 0.9.5 - 2007-11-13 =========== * Support for buckets created in Europe * Initial 'sync' support, for now local to s3 direction only * Much better handling of multiple args to put, get and del * Tries to use ElementTree from any available module * Support for buckets with over 1000 objects. s3cmd 0.9.4 - 2007-08-13 =========== * Support for transparent GPG encryption of uploaded files * HTTP proxy support * HTTPS protocol support * Support for non-ASCII characters in uploaded filenames s3cmd 0.9.3 - 2007-05-26 =========== * New command "du" for displaying size of your data in S3. (Basil Shubin) s3cmd 0.9.2 - 2007-04-09 =========== * Lots of new documentation * Allow "get" to stdout (use "-" in place of destination file to get the file contents on stdout) * Better compatibility with Python 2.4 * Output public HTTP URL for objects stored with Public ACL * Various bugfixes and improvements s3cmd 0.9.1 - 2007-02-06 =========== * All commands now use S3-URIs * Removed hard dependency on Python 2.5 * Experimental support for Python 2.4 (requires external ElementTree module) s3cmd 0.9.0 - 2007-01-18 =========== * First public release brings support for all basic Amazon S3 operations: Creation and Removal of buckets, Upload (put), Download (get) and Removal (del) of files/objects. s3cmd-2.4.0/setup.py0000664000175100017510000001007014534034713013630 0ustar floflo00000000000000#!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function import sys import os try: import xml.etree.ElementTree print("Using xml.etree.ElementTree for XML processing") except ImportError as e: sys.stderr.write(str(e) + "\n") try: import elementtree.ElementTree print("Using elementtree.ElementTree for XML processing") except ImportError as e: sys.stderr.write(str(e) + "\n") sys.stderr.write("Please install ElementTree module from\n") sys.stderr.write("http://effbot.org/zone/element-index.htm\n") sys.exit(1) from setuptools import setup import S3.PkgInfo if float("%d.%d" % sys.version_info[:2]) < 2.6: sys.stderr.write("Your Python version %d.%d.%d is not supported.\n" % sys.version_info[:3]) sys.stderr.write("S3cmd requires Python 2.6 or newer.\n") sys.exit(1) ## Remove 'MANIFEST' file to force ## distutils to recreate it. ## Only in "sdist" stage. Otherwise ## it makes life difficult to packagers. if len(sys.argv) > 1 and sys.argv[1] == "sdist": try: os.unlink("MANIFEST") except OSError as e: pass ## Re-create the manpage ## (Beware! Perl script on the loose!!) if len(sys.argv) > 1 and sys.argv[1] == "sdist": if os.stat_result(os.stat("s3cmd.1")).st_mtime \ < os.stat_result(os.stat("s3cmd")).st_mtime: sys.stderr.write("Re-create man page first!\n") sys.stderr.write("Run: ./s3cmd --help | ./format-manpage.pl > s3cmd.1\n") sys.exit(1) ## Don't install manpages and docs when $S3CMD_PACKAGING is set ## This was a requirement of Debian package maintainer. if not os.getenv("S3CMD_PACKAGING"): man_path = os.getenv("S3CMD_INSTPATH_MAN") or "share/man" doc_path = os.getenv("S3CMD_INSTPATH_DOC") or "share/doc/packages" data_files = [ (doc_path+"/s3cmd", ["README.md", "INSTALL.md", "LICENSE", "NEWS"]), (man_path+"/man1", ["s3cmd.1"]), ] else: data_files = None ## Main distutils info setup( ## Content description name=S3.PkgInfo.package, version=S3.PkgInfo.version, packages=['S3'], scripts=['s3cmd'], data_files=data_files, test_suite='S3.PkgInfo', ## Packaging details author="Michal Ludvig", author_email="michal@logix.cz", maintainer="github.com/fviard, github.com/matteobar", maintainer_email="s3tools-bugs@lists.sourceforge.net", url=S3.PkgInfo.url, license=S3.PkgInfo.license, description=S3.PkgInfo.short_description, long_description=""" %s Authors: -------- Florent Viard Michal Ludvig Matt Domsch (github.com/mdomsch) """ % (S3.PkgInfo.long_description), classifiers=[ 'Development Status :: 5 - Production/Stable', 'Environment :: Console', 'Environment :: MacOS X', 'Environment :: Win32 (MS Windows)', 'Intended Audience :: End Users/Desktop', 'Intended Audience :: System Administrators', 'License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)', 'Natural Language :: English', 'Operating System :: MacOS :: MacOS X', 'Operating System :: Microsoft :: Windows', 'Operating System :: POSIX', 'Operating System :: Unix', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9', 'Programming Language :: Python :: 3.10', 'Programming Language :: Python :: 3.11', 'Programming Language :: Python :: 3.12', 'Topic :: System :: Archiving', 'Topic :: Utilities', ], install_requires=["python-dateutil", "python-magic"] ) # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/s3cmd0000775000175100017510000050752714535730271013102 0ustar floflo00000000000000#!/usr/bin/env python # -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## s3cmd - S3 client ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## -------------------------------------------------------------------- from __future__ import absolute_import, print_function, division import sys if sys.version_info < (2, 6): sys.stderr.write(u"ERROR: Python 2.6 or higher required, sorry.\n") # 72 == EX_OSFILE sys.exit(72) PY3 = (sys.version_info >= (3, 0)) import codecs import errno import glob import io import locale import logging import os import re import shutil import socket import subprocess import tempfile import datetime import time import traceback from copy import copy from optparse import OptionParser, Option, OptionValueError, IndentedHelpFormatter from logging import debug, info, warning, error try: import htmlentitydefs except Exception: # python 3 support import html.entities as htmlentitydefs try: unicode except NameError: # python 3 support # In python 3, unicode -> str, and str -> bytes unicode = str try: unichr except NameError: # python 3 support # In python 3, unichr was removed as chr can now do the job unichr = chr try: from shutil import which except ImportError: # python2 fallback code from distutils.spawn import find_executable as which if not PY3: # ConnectionRefusedError does not exist in python2 class ConnectionError(OSError): pass class ConnectionRefusedError(ConnectionError): pass def output(message): sys.stdout.write(message + "\n") sys.stdout.flush() def check_args_type(args, type, verbose_type): """NOTE: This function looks like to not be used.""" for arg in args: if S3Uri(arg).type != type: raise ParameterError("Expecting %s instead of '%s'" % (verbose_type, arg)) def cmd_du(args): s3 = S3(Config()) if len(args) > 0: uri = S3Uri(args[0]) if uri.type == "s3" and uri.has_bucket(): subcmd_bucket_usage(s3, uri) return EX_OK subcmd_bucket_usage_all(s3) return EX_OK def subcmd_bucket_usage_all(s3): """ Returns: sum of bucket sizes as integer Raises: S3Error """ cfg = Config() response = s3.list_all_buckets() buckets_size = 0 for bucket in response["list"]: size = subcmd_bucket_usage(s3, S3Uri("s3://" + bucket["Name"])) if size != None: buckets_size += size total_size, size_coeff = formatSize(buckets_size, cfg.human_readable_sizes) total_size_str = str(total_size) + size_coeff output(u"".rjust(12, "-")) output(u"%s Total" % (total_size_str.ljust(12))) return size def subcmd_bucket_usage(s3, uri): """ Returns: bucket size as integer Raises: S3Error """ bucket_size = 0 object_count = 0 extra_info = u'' bucket = uri.bucket() prefix = uri.object() try: for _, _, objects in s3.bucket_list_streaming(bucket, prefix=prefix, recursive=True): for obj in objects: bucket_size += int(obj["Size"]) object_count += 1 except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % bucket) raise except KeyboardInterrupt as e: extra_info = u' [interrupted]' total_size_str = u"%d%s" % formatSize(bucket_size, Config().human_readable_sizes) if Config().human_readable_sizes: total_size_str = total_size_str.rjust(5) else: total_size_str = total_size_str.rjust(12) output(u"%s %7s objects %s%s" % (total_size_str, object_count, uri, extra_info)) return bucket_size def cmd_ls(args): cfg = Config() s3 = S3(cfg) if len(args) > 0: uri = S3Uri(args[0]) if uri.type == "s3" and uri.has_bucket(): subcmd_bucket_list(s3, uri, cfg.limit) return EX_OK # If not a s3 type uri or no bucket was provided, list all the buckets subcmd_all_buckets_list(s3) return EX_OK def subcmd_all_buckets_list(s3): response = s3.list_all_buckets() for bucket in sorted(response["list"], key=lambda b:b["Name"]): output(u"%s s3://%s" % (formatDateTime(bucket["CreationDate"]), bucket["Name"])) def cmd_all_buckets_list_all_content(args): cfg = Config() s3 = S3(cfg) response = s3.list_all_buckets() for bucket in response["list"]: subcmd_bucket_list(s3, S3Uri("s3://" + bucket["Name"]), cfg.limit) output(u"") return EX_OK def subcmd_bucket_list(s3, uri, limit): cfg = Config() bucket = uri.bucket() prefix = uri.object() debug(u"Bucket 's3://%s':" % bucket) if prefix.endswith('*'): prefix = prefix[:-1] try: response = s3.bucket_list(bucket, prefix = prefix, limit = limit) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % bucket) raise # md5 are 32 char long, but for multipart there could be a suffix if Config().human_readable_sizes: # %(size)5s%(coeff)1s format_size = u"%5d%1s" dir_str = u"DIR".rjust(6) dirobj_str = u"DIROBJ".rjust(6) else: format_size = u"%12d%s" dir_str = u"DIR".rjust(12) dirobj_str = u"DIROBJ".rjust(12) if cfg.long_listing: format_string = u"%(timestamp)16s %(size)s %(md5)-35s %(storageclass)-11s %(uri)s" elif cfg.list_md5: format_string = u"%(timestamp)16s %(size)s %(md5)-35s %(uri)s" else: format_string = u"%(timestamp)16s %(size)s %(uri)s" for prefix in response['common_prefixes']: output(format_string % { "timestamp": "", "size": dir_str, "md5": "", "storageclass": "", "uri": uri.compose_uri(bucket, prefix["Prefix"])}) for object in response["list"]: md5 = object.get('ETag', '').strip('"\'') storageclass = object.get('StorageClass','') object_key = object['Key'] if cfg.list_md5: if '-' in md5: # need to get md5 from the object object_uri = uri.compose_uri(bucket, object_key) info_response = s3.object_info(S3Uri(object_uri)) try: md5 = info_response['s3cmd-attrs']['md5'] except KeyError: pass if object_key[-1] == '/': size_str = dirobj_str else: size_and_coeff = formatSize(object["Size"], Config().human_readable_sizes) size_str = format_size % size_and_coeff output(format_string % { "timestamp": formatDateTime(object["LastModified"]), "size" : size_str, "md5" : md5, "storageclass" : storageclass, "uri": uri.compose_uri(bucket, object_key), }) if response["truncated"]: warning(u"The list is truncated because the settings limit was reached.") def cmd_bucket_create(args): cfg = Config() s3 = S3(cfg) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.bucket_create(uri.bucket(), cfg.bucket_location, cfg.extra_headers) output(u"Bucket '%s' created" % uri.uri()) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_website_info(args): cfg = Config() s3 = S3(cfg) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.website_info(uri, cfg.bucket_location) if response: output(u"Bucket %s: Website configuration" % uri.uri()) output(u"Website endpoint: %s" % response['website_endpoint']) output(u"Index document: %s" % response['index_document']) output(u"Error document: %s" % response['error_document']) else: output(u"Bucket %s: No website configuration found." % (uri.uri())) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_website_create(args): cfg = Config() s3 = S3(cfg) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.website_create(uri, cfg.bucket_location) output(u"Bucket '%s': website configuration created." % (uri.uri())) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_website_delete(args): cfg = Config() s3 = S3(cfg) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.website_delete(uri, cfg.bucket_location) output(u"Bucket '%s': website configuration deleted." % (uri.uri())) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_expiration_set(args): cfg = Config() s3 = S3(cfg) for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) try: response = s3.expiration_set(uri, cfg.bucket_location) if response["status"] == 200: output(u"Bucket '%s': expiration configuration is set." % (uri.uri())) elif response["status"] == 204: output(u"Bucket '%s': expiration configuration is deleted." % (uri.uri())) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def cmd_bucket_delete(args): cfg = Config() s3 = S3(cfg) def _bucket_delete_one(uri, retry=True): try: response = s3.bucket_delete(uri.bucket()) output(u"Bucket '%s' removed" % uri.uri()) except S3Error as e: if e.info['Code'] == 'NoSuchBucket': if cfg.force: return EX_OK else: raise if e.info['Code'] == 'BucketNotEmpty' and retry and (cfg.force or cfg.recursive): warning(u"Bucket is not empty. Removing all the objects from it first. This may take some time...") rc = subcmd_batch_del(uri_str = uri.uri()) if rc == EX_OK: return _bucket_delete_one(uri, False) else: output(u"Bucket was not removed") elif e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK for arg in args: uri = S3Uri(arg) if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) rc = _bucket_delete_one(uri) if rc != EX_OK: return rc return EX_OK def cmd_object_put(args): cfg = Config() s3 = S3(cfg) if len(args) == 0: raise ParameterError("Nothing to upload. Expecting a local file or directory and a S3 URI destination.") ## Normalize URI to convert s3://bkt to s3://bkt/ (trailing slash) destination_base_uri = S3Uri(args.pop()) if destination_base_uri.type != 's3': raise ParameterError("Destination must be S3Uri. Got: %s" % destination_base_uri) destination_base = destination_base_uri.uri() if len(args) == 0: raise ParameterError("Nothing to upload. Expecting a local file or directory.") local_list, single_file_local, exclude_list, total_size_local = fetch_local_list( args, is_src=True, with_dirs=cfg.keep_dirs) local_count = len(local_list) info(u"Summary: %d local files to upload" % local_count) if local_count == 0: raise ParameterError("Nothing to upload.") if local_count > 0: if not single_file_local and '-' in local_list.keys(): raise ParameterError("Cannot specify multiple local files if uploading from '-' (ie stdin)") elif single_file_local and local_list.keys()[0] == "-" and destination_base.endswith("/"): raise ParameterError("Destination S3 URI must not end with '/' when uploading from stdin.") elif not destination_base.endswith("/"): if not single_file_local: raise ParameterError("Destination S3 URI must end with '/' (ie must refer to a directory on the remote side).") local_list[local_list.keys()[0]]['remote_uri'] = destination_base else: for key in local_list: local_list[key]['remote_uri'] = destination_base + key if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in local_list: if key != "-": nicekey = local_list[key]['full_name'] else: nicekey = "" output(u"upload: '%s' -> '%s'" % (nicekey, local_list[key]['remote_uri'])) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 ret = EX_OK for key in local_list: seq += 1 uri_final = S3Uri(local_list[key]['remote_uri']) try: src_md5 = local_list.get_md5(key) except IOError: src_md5 = None extra_headers = copy(cfg.extra_headers) full_name_orig = local_list[key]['full_name'] full_name = full_name_orig seq_label = "[%d of %d]" % (seq, local_count) if Config().encrypt: gpg_exitcode, full_name, extra_headers["x-amz-meta-s3tools-gpgenc"] = gpg_encrypt(full_name_orig) attr_header = _build_attr_header(local_list[key], key, src_md5) debug(u"attr_header: %s" % attr_header) extra_headers.update(attr_header) try: response = s3.object_put(full_name, uri_final, extra_headers, extra_label = seq_label) except S3UploadError as exc: error(u"Upload of '%s' failed too many times (Last reason: %s)" % (full_name_orig, exc)) if cfg.stop_on_error: ret = EX_DATAERR error(u"Exiting now because of --stop-on-error") break ret = EX_PARTIAL continue except InvalidFileError as exc: error(u"Upload of '%s' is not possible (Reason: %s)" % (full_name_orig, exc)) ret = EX_PARTIAL if cfg.stop_on_error: ret = EX_OSFILE error(u"Exiting now because of --stop-on-error") break continue if response is not None: speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) if not Config().progress_meter: if full_name_orig != "-": nicekey = full_name_orig else: nicekey = "" output(u"upload: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s) %s" % (nicekey, uri_final, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1], seq_label)) if Config().acl_public: output(u"Public URL of the object is: %s" % (uri_final.public_url())) if Config().encrypt and full_name != full_name_orig: debug(u"Removing temporary encrypted file: %s" % full_name) os.remove(deunicodise(full_name)) return ret def cmd_object_get(args): cfg = Config() s3 = S3(cfg) ## Check arguments: ## if not --recursive: ## - first N arguments must be S3Uri ## - if the last one is S3 make current dir the destination_base ## - if the last one is a directory: ## - take all 'basenames' of the remote objects and ## make the destination name be 'destination_base'+'basename' ## - if the last one is a file or not existing: ## - if the number of sources (N, above) == 1 treat it ## as a filename and save the object there. ## - if there's more sources -> Error ## if --recursive: ## - first N arguments must be S3Uri ## - for each Uri get a list of remote objects with that Uri as a prefix ## - apply exclude/include rules ## - each list item will have MD5sum, Timestamp and pointer to S3Uri ## used as a prefix. ## - the last arg may be '-' (stdout) ## - the last arg may be a local directory - destination_base ## - if the last one is S3 make current dir the destination_base ## - if the last one doesn't exist check remote list: ## - if there is only one item and its_prefix==its_name ## download that item to the name given in last arg. ## - if there are more remote items use the last arg as a destination_base ## and try to create the directory (incl. all parents). ## ## In both cases we end up with a list mapping remote object names (keys) to local file names. ## Each item will be a dict with the following attributes # {'remote_uri', 'local_filename'} download_list = [] if len(args) == 0: raise ParameterError("Nothing to download. Expecting S3 URI.") if S3Uri(args[-1]).type == 'file': destination_base = args.pop() else: destination_base = "." if len(args) == 0: raise ParameterError("Nothing to download. Expecting S3 URI.") try: remote_list, exclude_list, remote_total_size = fetch_remote_list( args, require_attribs = True) except S3Error as exc: if exc.code == 'NoSuchKey': raise ParameterError("Source object '%s' does not exist." % exc.resource) raise remote_count = len(remote_list) info(u"Summary: %d remote files to download" % remote_count) if remote_count > 0: if destination_base == "-": ## stdout is ok for multiple remote files! for key in remote_list: remote_list[key]['local_filename'] = "-" elif not os.path.isdir(deunicodise(destination_base)): ## We were either given a file name (existing or not) if remote_count > 1: raise ParameterError("Destination must be a directory or stdout when downloading multiple sources.") remote_list[remote_list.keys()[0]]['local_filename'] = destination_base else: if destination_base[-1] != os.path.sep: destination_base += os.path.sep for key in remote_list: local_filename = destination_base + key if os.path.sep != "/": local_filename = os.path.sep.join(local_filename.split("/")) remote_obj = remote_list[key] remote_obj['local_filename'] = local_filename if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"download: '%s' -> '%s'" % (remote_list[key]['object_uri_str'], remote_list[key]['local_filename'])) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 ret = EX_OK for key in remote_list: seq += 1 item = remote_list[key] uri = S3Uri(item['object_uri_str']) ## Encode / Decode destination with "replace" to make sure it's compatible with current encoding destination = unicodise_safe(item['local_filename']) destination_bytes = deunicodise(destination) last_modified_ts = item['timestamp'] seq_label = "[%d of %d]" % (seq, remote_count) is_dir_obj = item['is_dir'] response = None start_position = 0 if destination == "-": ## stdout dst_stream = io.open(sys.__stdout__.fileno(), mode='wb', closefd=False) dst_stream.stream_name = u'' file_exists = True elif is_dir_obj: ## Folder try: file_exists = os.path.exists(destination_bytes) if not file_exists: info(u"Creating directory: %s" % destination) os.makedirs(destination_bytes) except IOError as e: # If dir was created at the same time by a race condition, it is ok. if e.errno != errno.EEXIST: error(u"Creation of directory '%s' failed (Reason: %s)" % (destination, e.strerror)) if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue if file_exists and not cfg.force: # Directory already exists and we don't want to update metadata continue dst_stream = None else: ## File try: file_exists = os.path.exists(destination_bytes) try: dst_stream = io.open(destination_bytes, mode='ab') dst_stream.stream_name = destination except IOError as e: if e.errno != errno.ENOENT: raise dst_dir_bytes = os.path.dirname(destination) info(u"Creating directory: %s" % unicodise(dst_dir_bytes)) os.makedirs(dst_dir_bytes) dst_stream = io.open(destination_bytes, mode='ab') dst_stream.stream_name = destination if file_exists: force = False skip = False if cfg.get_continue: start_position = dst_stream.tell() item_size = item['size'] if start_position == item_size: skip = True elif start_position > item_size: info(u"Download forced for '%s' as source is " "smaller than local file" % destination) force = True elif cfg.force: force = True elif cfg.skip_existing: skip = True else: dst_stream.close() raise ParameterError( u"File '%s' already exists. Use either of --force /" " --continue / --skip-existing or give it a new" " name." % destination ) if skip: dst_stream.close() info(u"Skipping over existing file: '%s'" % destination) continue if force: start_position = 0 dst_stream.seek(0) dst_stream.truncate() except IOError as e: error(u"Creation of file '%s' failed (Reason: %s)" % (destination, e.strerror)) if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue try: # Retrieve the file content if dst_stream: try: response = s3.object_get(uri, dst_stream, destination, start_position=start_position, extra_label=seq_label) finally: dst_stream.close() except S3DownloadError as e: error(u"Download of '%s' failed (Reason: %s)" % (destination, e)) # Delete, only if file didn't exist before! if not file_exists: debug(u"object_get failed for '%s', deleting..." % (destination,)) os.unlink(destination_bytes) if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue except S3Error as e: error(u"Download of '%s' failed (Reason: %s)" % (destination, e)) if not file_exists: # Delete, only if file didn't exist before! debug(u"object_get failed for '%s', deleting..." % (destination,)) os.unlink(destination_bytes) raise """ # TODO Enable once we add restoring s3cmd-attrs in get command if is_dir_obj and cfg.preserve_attrs: # Retrieve directory info to restore s3cmd-attrs metadata try: response = s3.object_info(uri) except S3Error as exc: error(u"Retrieving directory metadata for '%s' failed (Reason: %s)" % (destination, exc)) if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue """ if response: if "x-amz-meta-s3tools-gpgenc" in response["headers"]: gpg_decrypt(destination, response["headers"]["x-amz-meta-s3tools-gpgenc"]) response["size"] = os.stat(destination_bytes)[6] if "last-modified" in response["headers"]: last_modified_ts = time.mktime(time.strptime(response["headers"]["last-modified"], "%a, %d %b %Y %H:%M:%S GMT")) if last_modified_ts and destination != "-": os.utime(destination_bytes, (last_modified_ts, last_modified_ts)) debug("set mtime to %s" % last_modified_ts) if not Config().progress_meter and destination != "-" and not is_dir_obj: speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) output(u"download: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s)" % (uri, destination, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1])) if Config().delete_after_fetch: s3.object_delete(uri) output(u"File '%s' removed after fetch" % (uri)) return ret def cmd_object_del(args): cfg = Config() recursive = cfg.recursive for uri_str in args: uri = S3Uri(uri_str) if uri.type != "s3": raise ParameterError("Expecting S3 URI instead of '%s'" % uri_str) if not uri.has_object(): if recursive and not cfg.force: raise ParameterError("Please use --force to delete ALL contents of %s" % uri_str) elif not recursive: raise ParameterError("File name required, not only the bucket name. Alternatively use --recursive") if not recursive: rc = subcmd_object_del_uri(uri_str) elif cfg.exclude or cfg.include or cfg.max_delete > 0: # subcmd_batch_del_iterative does not support file exclusion and can't # accurately know how many total files will be deleted, so revert to batch delete. rc = subcmd_batch_del(uri_str = uri_str) else: rc = subcmd_batch_del_iterative(uri_str = uri_str) if not rc: return rc return EX_OK def subcmd_batch_del_iterative(uri_str = None, bucket = None): """ Streaming version of batch deletion (doesn't realize whole list in memory before deleting). Differences from subcmd_batch_del: - Does not obey --exclude directives or obey cfg.max_delete (use subcmd_batch_del in those cases) """ if bucket and uri_str: raise ValueError("Pass only one of uri_str or bucket") if bucket: # bucket specified uri_str = "s3://%s" % bucket cfg = Config() s3 = S3(cfg) uri = S3Uri(uri_str) bucket = uri.bucket() deleted_bytes = deleted_count = 0 for _, _, to_delete in s3.bucket_list_streaming(bucket, prefix=uri.object(), recursive=True): if not to_delete: continue if not cfg.dry_run: response = s3.object_batch_delete_uri_strs([uri.compose_uri(bucket, item['Key']) for item in to_delete]) deleted_bytes += sum(int(item["Size"]) for item in to_delete) deleted_count += len(to_delete) output(u'\n'.join(u"delete: '%s'" % uri.compose_uri(bucket, p['Key']) for p in to_delete)) if deleted_count: # display summary data of deleted files if cfg.stats: stats_info = StatsInfo() stats_info.files_deleted = deleted_count stats_info.size_deleted = deleted_bytes output(stats_info.format_output()) else: total_size, size_coeff = formatSize(deleted_bytes, Config().human_readable_sizes) total_size_str = str(total_size) + size_coeff info(u"Deleted %s objects (%s) from %s" % (deleted_count, total_size_str, uri)) else: warning(u"Remote list is empty.") return EX_OK def subcmd_batch_del(uri_str = None, bucket = None, remote_list = None): """ Returns: EX_OK Raises: ValueError """ cfg = Config() s3 = S3(cfg) def _batch_del(remote_list): to_delete = remote_list[:1000] remote_list = remote_list[1000:] while len(to_delete): debug(u"Batch delete %d, remaining %d" % (len(to_delete), len(remote_list))) if not cfg.dry_run: response = s3.object_batch_delete(to_delete) output(u'\n'.join((u"delete: '%s'" % to_delete[p]['object_uri_str']) for p in to_delete)) to_delete = remote_list[:1000] remote_list = remote_list[1000:] if remote_list is not None and len(remote_list) == 0: return False if len([item for item in [uri_str, bucket, remote_list] if item]) != 1: raise ValueError("One and only one of 'uri_str', 'bucket', 'remote_list' can be specified.") if bucket: # bucket specified uri_str = "s3://%s" % bucket if remote_list is None: # uri_str specified remote_list, exclude_list, remote_total_size = fetch_remote_list(uri_str, require_attribs = False) if len(remote_list) == 0: warning(u"Remote list is empty.") return EX_OK if cfg.max_delete > 0 and len(remote_list) > cfg.max_delete: warning(u"delete: maximum requested number of deletes would be exceeded, none performed.") return EX_OK _batch_del(remote_list) if cfg.dry_run: warning(u"Exiting now because of --dry-run") return EX_OK def subcmd_object_del_uri(uri_str, recursive = None): """ Returns: True if XXX, False if XXX Raises: ValueError """ cfg = Config() s3 = S3(cfg) if recursive is None: recursive = cfg.recursive remote_list, exclude_list, remote_total_size = fetch_remote_list(uri_str, require_attribs = False, recursive = recursive) remote_count = len(remote_list) info(u"Summary: %d remote files to delete" % remote_count) if cfg.max_delete > 0 and remote_count > cfg.max_delete: warning(u"delete: maximum requested number of deletes would be exceeded, none performed.") return False if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"delete: %s" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return True for key in remote_list: item = remote_list[key] response = s3.object_delete(S3Uri(item['object_uri_str'])) output(u"delete: '%s'" % item['object_uri_str']) return True def cmd_object_restore(args): cfg = Config() s3 = S3(cfg) if cfg.restore_days < 1: raise ParameterError("You must restore a file for 1 or more days") # accept case-insensitive argument but fix it to match S3 API if cfg.restore_priority.title() not in ['Standard', 'Expedited', 'Bulk']: raise ParameterError("Valid restoration priorities: bulk, standard, expedited") else: cfg.restore_priority = cfg.restore_priority.title() remote_list, exclude_list, remote_total_size = fetch_remote_list(args, require_attribs = False, recursive = cfg.recursive) remote_count = len(remote_list) info(u"Summary: Restoring %d remote files for %d days at %s priority" % (remote_count, cfg.restore_days, cfg.restore_priority)) if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"restore: '%s'" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return EX_OK for key in remote_list: item = remote_list[key] uri = S3Uri(item['object_uri_str']) if not item['object_uri_str'].endswith("/"): try: response = s3.object_restore(S3Uri(item['object_uri_str'])) output(u"restore: '%s'" % item['object_uri_str']) except S3Error as e: if e.code == "RestoreAlreadyInProgress": warning("%s: %s" % (e.message, item['object_uri_str'])) else: raise e else: debug(u"Skipping directory since only files may be restored") return EX_OK def subcmd_cp_mv(args, process_fce, action_str, message): cfg = Config() if action_str == 'modify': if len(args) < 1: raise ParameterError("Expecting one or more S3 URIs for " + action_str) destination_base = None else: if len(args) < 2: raise ParameterError("Expecting two or more S3 URIs for " + action_str) dst_base_uri = S3Uri(args.pop()) if dst_base_uri.type != "s3": raise ParameterError("Destination must be S3 URI. To download a " "file use 'get' or 'sync'.") destination_base = dst_base_uri.uri() scoreboard = ExitScoreboard() remote_list, exclude_list, remote_total_size = \ fetch_remote_list(args, require_attribs=False) remote_count = len(remote_list) info(u"Summary: %d remote files to %s" % (remote_count, action_str)) if destination_base: # Trying to mv dir1/ to dir2 will not pass a test in S3.FileLists, # so we don't need to test for it here. if not destination_base.endswith('/') \ and (len(remote_list) > 1 or cfg.recursive): raise ParameterError("Destination must be a directory and end with" " '/' when acting on a folder content or on " "multiple sources.") if cfg.recursive: for key in remote_list: remote_list[key]['dest_name'] = destination_base + key else: for key in remote_list: if destination_base.endswith("/"): remote_list[key]['dest_name'] = destination_base + key else: remote_list[key]['dest_name'] = destination_base else: for key in remote_list: remote_list[key]['dest_name'] = remote_list[key]['object_uri_str'] if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"%s: '%s' -> '%s'" % (action_str, remote_list[key]['object_uri_str'], remote_list[key]['dest_name'])) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 for key in remote_list: seq += 1 seq_label = "[%d of %d]" % (seq, remote_count) item = remote_list[key] src_uri = S3Uri(item['object_uri_str']) dst_uri = S3Uri(item['dest_name']) src_size = item.get('size') extra_headers = copy(cfg.extra_headers) try: response = process_fce(src_uri, dst_uri, extra_headers, src_size=src_size, extra_label=seq_label) output(message % {"src": src_uri, "dst": dst_uri, "extra": seq_label}) if Config().acl_public: info(u"Public URL is: %s" % dst_uri.public_url()) scoreboard.success() except (S3Error, S3UploadError) as exc: if isinstance(exc, S3Error) and exc.code == "NoSuchKey": scoreboard.notfound() warning(u"Key not found %s" % item['object_uri_str']) else: scoreboard.failed() error(u"Copy failed for: '%s' (%s)", item['object_uri_str'], exc) if cfg.stop_on_error: break return scoreboard.rc() def cmd_cp(args): s3 = S3(Config()) return subcmd_cp_mv(args, s3.object_copy, "copy", u"remote copy: '%(src)s' -> '%(dst)s' %(extra)s") def cmd_modify(args): s3 = S3(Config()) return subcmd_cp_mv(args, s3.object_modify, "modify", u"modify: '%(src)s' %(extra)s") def cmd_mv(args): s3 = S3(Config()) return subcmd_cp_mv(args, s3.object_move, "move", u"move: '%(src)s' -> '%(dst)s' %(extra)s") def cmd_info(args): cfg = Config() s3 = S3(cfg) while (len(args)): uri_arg = args.pop(0) uri = S3Uri(uri_arg) if uri.type != "s3" or not uri.has_bucket(): raise ParameterError("Expecting S3 URI instead of '%s'" % uri_arg) try: if uri.has_object(): info = s3.object_info(uri) output(u"%s (object):" % uri.uri()) output(u" File size: %s" % info['headers']['content-length']) output(u" Last mod: %s" % info['headers']['last-modified']) output(u" MIME type: %s" % info['headers'].get('content-type', 'none')) output(u" Storage: %s" % info['headers'].get('x-amz-storage-class', 'STANDARD')) md5 = info['headers'].get('etag', '').strip('"\'') try: md5 = info['s3cmd-attrs']['md5'] except KeyError: pass output(u" MD5 sum: %s" % md5) if 'x-amz-server-side-encryption' in info['headers']: output(u" SSE: %s" % info['headers']['x-amz-server-side-encryption']) else: output(u" SSE: none") else: info = s3.bucket_info(uri) output(u"%s (bucket):" % uri.uri()) output(u" Location: %s" % (info['bucket-location'] or 'none')) output(u" Payer: %s" % (info['requester-pays'] or 'none')) output(u" Ownership: %s" % (info['ownership'] or 'none')) output(u" Versioning:%s" % (info['versioning'] or 'none')) expiration = s3.expiration_info(uri, cfg.bucket_location) if expiration and expiration['prefix'] is not None: expiration_desc = "Expiration Rule: " if expiration['prefix'] == "": expiration_desc += "all objects in this bucket " elif expiration['prefix'] is not None: expiration_desc += "objects with key prefix '" + expiration['prefix'] + "' " expiration_desc += "will expire in '" if expiration['days']: expiration_desc += expiration['days'] + "' day(s) after creation" elif expiration['date']: expiration_desc += expiration['date'] + "' " output(u" %s" % expiration_desc) else: output(u" Expiration rule: none") public_access_block = ','.join([ key for key, val in info['public-access-block'].items() if val ]) output(u" Block Public Access: %s" % (public_access_block or 'none')) try: policy = s3.get_policy(uri) output(u" Policy: %s" % policy) except S3Error as exc: # Ignore the exception and don't fail the info # if the server doesn't support setting ACLs if exc.status == 403: output(u" Policy: Not available: GetPolicy permission is needed to read the policy") elif exc.status == 405: output(u" Policy: Not available: Only the bucket owner can read the policy") elif exc.status not in [404, 501]: raise exc else: output(u" Policy: none") try: cors = s3.get_cors(uri) output(u" CORS: %s" % cors) except S3Error as exc: # Ignore the exception and don't fail the info # if the server doesn't support setting ACLs if exc.status not in [404, 501]: raise exc output(u" CORS: none") try: acl = s3.get_acl(uri) acl_grant_list = acl.getGrantList() for grant in acl_grant_list: output(u" ACL: %s: %s" % (grant['grantee'], grant['permission'])) if acl.isAnonRead(): output(u" URL: %s" % uri.public_url()) except S3Error as exc: # Ignore the exception and don't fail the info # if the server doesn't support setting ACLs if exc.status not in [404, 501]: raise exc else: output(u" ACL: none") if uri.has_object(): # Temporary hack for performance + python3 compatibility if PY3: info_headers_iter = info['headers'].items() else: info_headers_iter = info['headers'].iteritems() for header, value in info_headers_iter: if header.startswith('x-amz-meta-'): output(u" %s: %s" % (header, value)) except S3Error as e: if e.info["Code"] in S3.codes: error(S3.codes[e.info["Code"]] % uri.bucket()) raise return EX_OK def filedicts_to_keys(*args): keys = set() for a in args: keys.update(a.keys()) keys = list(keys) keys.sort() return keys def cmd_sync_remote2remote(args): cfg = Config() s3 = S3(cfg) # Normalise s3://uri (e.g. assert trailing slash) destination_base = S3Uri(args[-1]).uri() destbase_with_source_list = set() for source_arg in args[:-1]: if source_arg.endswith('/'): destbase_with_source_list.add(destination_base) else: destbase_with_source_list.add(s3path.join( destination_base, s3path.basename(source_arg) )) stats_info = StatsInfo() src_list, src_exclude_list, remote_total_size = fetch_remote_list(args[:-1], recursive = True, require_attribs = True) dst_list, dst_exclude_list, _ = fetch_remote_list(destbase_with_source_list, recursive = True, require_attribs = True) src_count = len(src_list) orig_src_count = src_count dst_count = len(dst_list) deleted_count = 0 info(u"Found %d source files, %d destination files" % (src_count, dst_count)) src_list, dst_list, update_list, copy_pairs = compare_filelists(src_list, dst_list, src_remote = True, dst_remote = True) src_count = len(src_list) update_count = len(update_list) dst_count = len(dst_list) print(u"Summary: %d source files to copy, %d files at destination to delete" % (src_count + update_count, dst_count)) ### Populate 'target_uri' only if we've got something to sync from src to dst for key in src_list: src_list[key]['target_uri'] = destination_base + key for key in update_list: update_list[key]['target_uri'] = destination_base + key if cfg.dry_run: keys = filedicts_to_keys(src_exclude_list, dst_exclude_list) for key in keys: output(u"exclude: %s" % key) if cfg.delete_removed: for key in dst_list: output(u"delete: '%s'" % dst_list[key]['object_uri_str']) for key in src_list: output(u"remote copy: '%s' -> '%s'" % (src_list[key]['object_uri_str'], src_list[key]['target_uri'])) for key in update_list: output(u"remote copy: '%s' -> '%s'" % (update_list[key]['object_uri_str'], update_list[key]['target_uri'])) warning(u"Exiting now because of --dry-run") return EX_OK # if there are copy pairs, we can't do delete_before, on the chance # we need one of the to-be-deleted files as a copy source. if len(copy_pairs) > 0: cfg.delete_after = True if cfg.delete_removed and orig_src_count == 0 and len(dst_list) and not cfg.force: warning(u"delete: cowardly refusing to delete because no source files were found. Use --force to override.") cfg.delete_removed = False # Delete items in destination that are not in source if cfg.delete_removed and not cfg.delete_after: subcmd_batch_del(remote_list = dst_list) deleted_count = len(dst_list) def _upload(src_list, seq, src_count): file_list = src_list.keys() file_list.sort() ret = EX_OK total_nb_files = 0 total_size = 0 for file in file_list: seq += 1 item = src_list[file] src_uri = S3Uri(item['object_uri_str']) dst_uri = S3Uri(item['target_uri']) src_size = item.get('size') seq_label = "[%d of %d]" % (seq, src_count) extra_headers = copy(cfg.extra_headers) try: response = s3.object_copy(src_uri, dst_uri, extra_headers, src_size=src_size, extra_label=seq_label) output(u"remote copy: '%s' -> '%s' %s" % (src_uri, dst_uri, seq_label)) total_nb_files += 1 total_size += item.get(u'size', 0) except (S3Error, S3UploadError) as exc: ret = EX_PARTIAL error(u"File '%s' could not be copied: %s", src_uri, exc) if cfg.stop_on_error: raise return ret, seq, total_nb_files, total_size # Perform the synchronization of files timestamp_start = time.time() seq = 0 ret, seq, nb_files, size = _upload(src_list, seq, src_count + update_count) total_files_copied = nb_files total_size_copied = size status, seq, nb_files, size = _upload(update_list, seq, src_count + update_count) if ret == EX_OK: ret = status total_files_copied += nb_files total_size_copied += size n_copied, bytes_saved, failed_copy_files = remote_copy( s3, copy_pairs, destination_base, None, False) total_files_copied += n_copied total_size_copied += bytes_saved #process files not copied debug("Process files that were not remotely copied") failed_copy_count = len(failed_copy_files) for key in failed_copy_files: failed_copy_files[key]['target_uri'] = destination_base + key status, seq, nb_files, size = _upload(failed_copy_files, seq, src_count + update_count + failed_copy_count) if ret == EX_OK: ret = status total_files_copied += nb_files total_size_copied += size # Delete items in destination that are not in source if cfg.delete_removed and cfg.delete_after: subcmd_batch_del(remote_list = dst_list) deleted_count = len(dst_list) stats_info.files = orig_src_count stats_info.size = remote_total_size stats_info.files_copied = total_files_copied stats_info.size_copied = total_size_copied stats_info.files_deleted = deleted_count total_elapsed = max(1.0, time.time() - timestamp_start) outstr = "Done. Copied %d files in %0.1f seconds, %0.2f files/s." % (total_files_copied, total_elapsed, seq / total_elapsed) if cfg.stats: outstr += stats_info.format_output() output(outstr) elif seq > 0: output(outstr) else: info(outstr) return ret def cmd_sync_remote2local(args): cfg = Config() s3 = S3(cfg) def _do_deletes(local_list): total_size = 0 if cfg.max_delete > 0 and len(local_list) > cfg.max_delete: warning(u"delete: maximum requested number of deletes would be exceeded, none performed.") return total_size # Reverse used to delete children before parent folders for key in reversed(local_list): item = local_list[key] full_path = item['full_name'] if item.get('is_dir', True): os.rmdir(deunicodise(full_path)) else: os.unlink(deunicodise(full_path)) output(u"delete: '%s'" % full_path) total_size += item.get(u'size', 0) return len(local_list), total_size destination_base = args[-1] source_args = args[:-1] fetch_source_args = args[:-1] if not destination_base.endswith(os.path.sep): if fetch_source_args[0].endswith(u'/') or len(fetch_source_args) > 1: raise ParameterError("Destination must be a directory and end with '/' when downloading multiple sources.") stats_info = StatsInfo() remote_list, src_exclude_list, remote_total_size = fetch_remote_list(fetch_source_args, recursive = True, require_attribs = True) # - The source path is either like "/myPath/my_src_folder" and # the user want to download this single folder and Optionally only delete # things that have been removed inside this folder. For this case, we only # have to look inside destination_base/my_src_folder and not at the root of # destination_base. # - Or like "/myPath/my_src_folder/" and the user want to have the sync # with the content of this folder destbase_with_source_list = set() for source_arg in fetch_source_args: if source_arg.endswith('/'): if destination_base.endswith(os.path.sep): destbase_with_source_list.add(destination_base) else: destbase_with_source_list.add(destination_base + os.path.sep) else: destbase_with_source_list.add(os.path.join(destination_base, os.path.basename(source_arg))) # with_dirs is True, as we always want to compare source with the actual full local content local_list, single_file_local, dst_exclude_list, local_total_size = fetch_local_list( destbase_with_source_list, is_src=False, recursive=True, with_dirs=True ) local_count = len(local_list) remote_count = len(remote_list) orig_remote_count = remote_count info(u"Found %d remote file objects, %d local files and directories" % (remote_count, local_count)) remote_list, local_list, update_list, copy_pairs = compare_filelists(remote_list, local_list, src_remote = True, dst_remote = False) dir_cache = {} def _set_local_filename(remote_list, destination_base, source_args, dir_cache): if len(remote_list) == 0: return if destination_base.endswith(os.path.sep): if not os.path.exists(deunicodise(destination_base)): if not cfg.dry_run: os.makedirs(deunicodise(destination_base)) if not os.path.isdir(deunicodise(destination_base)): raise ParameterError("Destination is not an existing directory") elif len(remote_list) == 1 and \ source_args[0] == remote_list[remote_list.keys()[0]].get(u'object_uri_str', ''): if os.path.isdir(deunicodise(destination_base)): raise ParameterError("Destination already exists and is a directory") remote_list[remote_list.keys()[0]]['local_filename'] = destination_base return if destination_base[-1] != os.path.sep: destination_base += os.path.sep for key in remote_list: local_filename = destination_base + key if os.path.sep != "/": local_filename = os.path.sep.join(local_filename.split("/")) item = remote_list[key] item['local_filename'] = local_filename # Create parent folders if needed # Extract key dirname key_dir_path = key.rsplit('/', 1)[0] dst_dir = None if key_dir_path not in dir_cache: if cfg.dry_run: mkdir_ret = True else: dst_dir = unicodise(os.path.dirname(deunicodise(local_filename))) mkdir_ret = Utils.mkdir_with_parents(dst_dir) # Also add to cache, all the parent dirs path = key_dir_path while path and path not in dir_cache: dir_cache[path] = mkdir_ret last_slash_idx = path.rfind('/') if last_slash_idx in [-1, 0]: break path = path[:last_slash_idx] if dir_cache[key_dir_path] == False: if not dst_dir: dst_dir = unicodise(os.path.dirname(deunicodise(local_filename))) if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise OSError("Download of '%s' failed (Reason: %s destination directory is not writable)" % (key, dst_dir)) error(u"Download of '%s' failed (Reason: %s destination directory is not writable)" % (key, dst_dir)) item['mark_failed'] = True ret = EX_PARTIAL continue _set_local_filename(remote_list, destination_base, source_args, dir_cache) _set_local_filename(update_list, destination_base, source_args, dir_cache) local_count = len(local_list) remote_count = len(remote_list) update_count = len(update_list) copy_pairs_count = len(copy_pairs) info(u"Summary: %d remote files to download, %d local files to delete, %d local files to hardlink" % (remote_count + update_count, local_count, copy_pairs_count)) if cfg.dry_run: keys = filedicts_to_keys(src_exclude_list, dst_exclude_list) for key in keys: output(u"exclude: %s" % key) if cfg.delete_removed: for key in local_list: output(u"delete: '%s'" % local_list[key]['full_name']) for key in remote_list: output(u"download: '%s' -> '%s'" % (remote_list[key]['object_uri_str'], remote_list[key]['local_filename'])) for key in update_list: output(u"download: '%s' -> '%s'" % (update_list[key]['object_uri_str'], update_list[key]['local_filename'])) warning(u"Exiting now because of --dry-run") return EX_OK # if there are copy pairs, we can't do delete_before, on the chance # we need one of the to-be-deleted files as a copy source. if len(copy_pairs) > 0: cfg.delete_after = True if cfg.delete_removed and orig_remote_count == 0 and len(local_list) and not cfg.force: warning(u"delete: cowardly refusing to delete because no source files were found. Use --force to override.") cfg.delete_removed = False if cfg.delete_removed and not cfg.delete_after: deleted_count, deleted_size = _do_deletes(local_list) else: deleted_count, deleted_size = (0, 0) def _download(remote_list, seq, total, total_size): original_umask = os.umask(0) os.umask(original_umask) file_list = remote_list.keys() file_list.sort() ret = EX_OK for file in file_list: seq += 1 item = remote_list[file] uri = S3Uri(item['object_uri_str']) dst_file = item['local_filename'] last_modified_ts = item['timestamp'] is_dir = item['is_dir'] seq_label = "[%d of %d]" % (seq, total) if item.get('mark_failed', False): # Item is skipped because there was previously an issue with # its destination directory. continue response = None dst_files_b = deunicodise(dst_file) try: chkptfname_b = '' # ignore empty directory at S3: if not is_dir: debug(u"dst_file=%s" % dst_file) # create temporary files (of type .s3cmd.XXXX.tmp) in the same directory # for downloading and then rename once downloaded # unicode provided to mkstemp argument chkptfd, chkptfname_b = tempfile.mkstemp( u".tmp", u".s3cmd.", os.path.dirname(dst_file) ) with io.open(chkptfd, mode='wb') as dst_stream: dst_stream.stream_name = unicodise(chkptfname_b) debug(u"created chkptfname=%s" % dst_stream.stream_name) response = s3.object_get(uri, dst_stream, dst_file, extra_label = seq_label) # download completed, rename the file to destination if os.name == "nt": # Windows is really a bad OS. Rename can't overwrite an existing file try: os.unlink(dst_files_b) except OSError: pass os.rename(chkptfname_b, dst_files_b) debug(u"renamed chkptfname=%s to dst_file=%s" % (dst_stream.stream_name, dst_file)) except OSError as exc: allow_partial = True if exc.errno == errno.EISDIR: error(u"Download of '%s' failed (Reason: %s is a directory)" % (file, dst_file)) elif os.name != "nt" and exc.errno == errno.ETXTBSY: error(u"Download of '%s' failed (Reason: %s is currently open for execute, cannot be overwritten)" % (file, dst_file)) elif exc.errno == errno.EPERM or exc.errno == errno.EACCES: error(u"Download of '%s' failed (Reason: %s permission denied)" % (file, dst_file)) elif exc.errno == errno.EBUSY: error(u"Download of '%s' failed (Reason: %s is busy)" % (file, dst_file)) elif exc.errno == errno.EFBIG: error(u"Download of '%s' failed (Reason: %s is too big)" % (file, dst_file)) elif exc.errno == errno.ENAMETOOLONG: error(u"Download of '%s' failed (Reason: File Name is too long)" % file) elif (exc.errno == errno.ENOSPC or (os.name != "nt" and exc.errno == errno.EDQUOT)): error(u"Download of '%s' failed (Reason: No space left)" % file) allow_partial = False else: error(u"Download of '%s' failed (Reason: Unknown OsError %d)" % (file, exc.errno or 0)) allow_partial = False try: # Try to remove the temp file if it exists if chkptfname_b: os.unlink(chkptfname_b) except Exception: pass if allow_partial and not cfg.stop_on_error: ret = EX_PARTIAL continue ret = EX_OSFILE if allow_partial: error(u"Exiting now because of --stop-on-error") else: error(u"Exiting now because of fatal error") raise except S3DownloadError as exc: error(u"Download of '%s' failed too many times (Last Reason: %s). " "This is usually a transient error, please try again " "later." % (file, exc)) try: os.unlink(chkptfname_b) except Exception as sub_exc: warning(u"Error deleting temporary file %s (Reason: %s)", (dst_stream.stream_name, sub_exc)) if cfg.stop_on_error: ret = EX_DATAERR error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue except S3Error as exc: warning(u"Remote file '%s'. S3Error: %s" % (exc.resource, exc)) try: os.unlink(chkptfname_b) except Exception as sub_exc: warning(u"Error deleting temporary file %s (Reason: %s)", (dst_stream.stream_name, sub_exc)) if cfg.stop_on_error: raise ret = EX_PARTIAL continue try: # set permissions on destination file if not is_dir: # a normal file mode = 0o777 - original_umask else: # an empty directory, make them readable/executable mode = 0o775 debug(u"mode=%s" % oct(mode)) os.chmod(dst_files_b, mode) except: raise # We can't get metadata for directories from an object_get, so we have to # request them explicitly if is_dir and cfg.preserve_attrs: try: response = s3.object_info(uri) except S3Error as exc: error(u"Retrieving directory metadata for '%s' failed (Reason: %s)" % (dst_file, exc)) if cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue try: if response and 's3cmd-attrs' in response and cfg.preserve_attrs: attrs = response['s3cmd-attrs'] attr_mode = attrs.get('mode') attr_mtime = attrs.get('mtime') attr_atime = attrs.get('atime') attr_uid = attrs.get('uid') attr_gid = attrs.get('gid') if attr_mode is not None: os.chmod(dst_files_b, int(attr_mode)) if attr_mtime is not None or attr_atime is not None: default_time = int(time.time()) mtime = attr_mtime is not None and int(attr_mtime) or default_time atime = attr_atime is not None and int(attr_atime) or default_time os.utime(dst_files_b, (atime, mtime)) if attr_uid is not None and attr_gid is not None: uid = int(attr_uid) gid = int(attr_gid) try: os.lchown(dst_files_b, uid, gid) except Exception as exc: exc.failed_step = 'lchown' raise else: if response and 'last-modified' in response['headers']: last_modified_ts = time.mktime(time.strptime( response["headers"]["last-modified"], "%a, %d %b %Y %H:%M:%S GMT" )) if last_modified_ts: os.utime(dst_files_b, (last_modified_ts, last_modified_ts)) debug("set mtime to %s" % last_modified_ts) except OSError as e: ret = EX_PARTIAL if e.errno == errno.EEXIST: warning(u"'%s' exists - not overwriting" % dst_file) continue if e.errno in (errno.EPERM, errno.EACCES): if getattr(e, 'failed_step') == 'lchown': warning(u"Can't set owner/group: '%s' (%s)" % (dst_file, e.strerror)) else: warning(u"Attrs not writable: '%s' (%s)" % (dst_file, e.strerror)) if cfg.stop_on_error: raise e continue raise e except KeyboardInterrupt: warning(u"Exiting after keyboard interrupt") return except Exception as e: ret = EX_PARTIAL error(u"%s: %s" % (file, e)) if cfg.stop_on_error: raise OSError(e) continue if is_dir: output(u"mkdir: '%s' -> '%s' %s" % (uri, dst_file, seq_label)) else: speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) if not Config().progress_meter: output(u"download: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s) %s" % (uri, dst_file, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1], seq_label)) total_size += response["size"] if Config().delete_after_fetch: s3.object_delete(uri) output(u"File '%s' removed after syncing" % (uri)) return ret, seq, total_size size_transferred = 0 total_elapsed = 0.0 timestamp_start = time.time() seq = 0 ret, seq, size_transferred = _download(remote_list, seq, remote_count + update_count, size_transferred) remote_list = None status, seq, size_transferred = _download(update_list, seq, remote_count + update_count, size_transferred) if ret == EX_OK: ret = status update_list = None _set_local_filename(copy_pairs, destination_base, source_args, dir_cache) n_copies, size_copies, failed_copy_list = local_copy(copy_pairs, destination_base) copy_pairs = None dir_cache = None # Download files that failed during local_copy status, seq, size_transferred = _download(failed_copy_list, seq, len(failed_copy_list) + remote_count + update_count, size_transferred) if ret == EX_OK: ret = status if cfg.delete_removed and cfg.delete_after: deleted_count, deleted_size = _do_deletes(local_list) total_elapsed = max(1.0, time.time() - timestamp_start) speed_fmt = formatSize(size_transferred / total_elapsed, human_readable = True, floating_point = True) stats_info.files = orig_remote_count stats_info.size = remote_total_size stats_info.files_transferred = len(failed_copy_list) + remote_count + update_count stats_info.size_transferred = size_transferred stats_info.files_copied = n_copies stats_info.size_copied = size_copies stats_info.files_deleted = deleted_count stats_info.size_deleted = deleted_size # Only print out the result if any work has been done or # if the user asked for verbose output outstr = "Done. Downloaded %d bytes in %0.1f seconds, %0.2f %sB/s." % (size_transferred, total_elapsed, speed_fmt[0], speed_fmt[1]) if cfg.stats: outstr += stats_info.format_output() output(outstr) elif size_transferred > 0: output(outstr) else: info(outstr) return ret def local_copy(copy_pairs, destination_base): # Do NOT hardlink local files by default, that'd be silly # For instance all empty files would become hardlinked together! saved_bytes = 0 failed_copy_list = FileDict() if destination_base[-1] != os.path.sep: destination_base += os.path.sep for relative_file, src_obj in copy_pairs.items(): src_file = destination_base + src_obj['copy_src'] if os.path.sep != "/": src_file = os.path.sep.join(src_file.split("/")) dst_file = src_obj['local_filename'] try: debug(u"Copying %s to %s" % (src_file, dst_file)) shutil.copy2(deunicodise(src_file), deunicodise(dst_file)) saved_bytes += src_obj.get(u'size', 0) except (IOError, OSError) as e: warning(u'Unable to copy or hardlink files %s -> %s (Reason: %s)' % (src_file, dst_file, e)) failed_copy_list[relative_file] = src_obj return len(copy_pairs), saved_bytes, failed_copy_list def remote_copy(s3, copy_pairs, destination_base, uploaded_objects_list=None, metadata_update=False): cfg = Config() saved_bytes = 0 failed_copy_list = FileDict() seq = 0 src_count = len(copy_pairs) for relative_file, src_obj in copy_pairs.items(): copy_src_file = src_obj['copy_src'] src_md5 = src_obj['md5'] seq += 1 debug(u"Remote Copying from %s to %s" % (copy_src_file, relative_file)) src_uri = S3Uri(destination_base + copy_src_file) dst_uri = S3Uri(destination_base + relative_file) src_obj_size = src_obj.get(u'size', 0) seq_label = "[%d of %d]" % (seq, src_count) extra_headers = copy(cfg.extra_headers) if metadata_update: # source is a real local file with its own personal metadata attr_header = _build_attr_header(src_obj, relative_file, src_md5) debug(u"attr_header: %s" % attr_header) extra_headers.update(attr_header) extra_headers['content-type'] = \ s3.content_type(filename=src_obj['full_name']) try: s3.object_copy(src_uri, dst_uri, extra_headers, src_size=src_obj_size, extra_label=seq_label) output(u"remote copy: '%s' -> '%s' %s" % (copy_src_file, relative_file, seq_label)) saved_bytes += src_obj_size if uploaded_objects_list is not None: uploaded_objects_list.append(relative_file) except Exception: warning(u"Unable to remote copy files '%s' -> '%s'" % (src_uri, dst_uri)) failed_copy_list[relative_file] = src_obj return (len(copy_pairs), saved_bytes, failed_copy_list) def _build_attr_header(src_obj, src_relative_name, md5=None): cfg = Config() attrs = {} if cfg.preserve_attrs: for attr in cfg.preserve_attrs_list: val = None if attr == 'uname': try: val = Utils.urlencode_string(Utils.getpwuid_username(src_obj['uid']), unicode_output=True) except (KeyError, TypeError): attr = "uid" val = src_obj.get('uid') if val: warning(u"%s: Owner username not known. Storing UID=%d instead." % (src_relative_name, val)) elif attr == 'gname': try: val = Utils.urlencode_string(Utils.getgrgid_grpname(src_obj.get('gid')), unicode_output=True) except (KeyError, TypeError): attr = "gid" val = src_obj.get('gid') if val: warning(u"%s: Owner groupname not known. Storing GID=%d instead." % (src_relative_name, val)) elif attr != "md5": try: val = getattr(src_obj['sr'], 'st_' + attr) except Exception: val = None if val is not None: attrs[attr] = val if 'md5' in cfg.preserve_attrs_list and md5: attrs['md5'] = md5 if attrs: attr_str_list = [] for k in sorted(attrs.keys()): attr_str_list.append(u"%s:%s" % (k, attrs[k])) attr_header = {'x-amz-meta-s3cmd-attrs': u'/'.join(attr_str_list)} else: attr_header = {} return attr_header def cmd_sync_local2remote(args): cfg = Config() s3 = S3(cfg) def _single_process(source_args): for dest in destinations: ## Normalize URI to convert s3://bkt to s3://bkt/ (trailing slash) destination_base_uri = S3Uri(dest) if destination_base_uri.type != 's3': raise ParameterError("Destination must be S3Uri. Got: %s" % destination_base_uri) destination_base = destination_base_uri.uri() return _child(destination_base, source_args) def _parent(source_args): # Now that we've done all the disk I/O to look at the local file system and # calculate the md5 for each file, fork for each destination to upload to them separately # and in parallel child_pids = [] ret = EX_OK for dest in destinations: ## Normalize URI to convert s3://bkt to s3://bkt/ (trailing slash) destination_base_uri = S3Uri(dest) if destination_base_uri.type != 's3': raise ParameterError("Destination must be S3Uri. Got: %s" % destination_base_uri) destination_base = destination_base_uri.uri() child_pid = os.fork() if child_pid == 0: os._exit(_child(destination_base, source_args)) else: child_pids.append(child_pid) while len(child_pids): (pid, status) = os.wait() child_pids.remove(pid) if ret == EX_OK: ret = os.WEXITSTATUS(status) return ret def _child(destination_base, source_args): def _set_remote_uri(local_list, destination_base, single_file_local): if len(local_list) > 0: ## Populate 'remote_uri' only if we've got something to upload if not destination_base.endswith("/"): if not single_file_local: raise ParameterError("Destination S3 URI must end with '/' (ie must refer to a directory on the remote side).") local_list[local_list.keys()[0]]['remote_uri'] = destination_base else: for key in local_list: local_list[key]['remote_uri'] = destination_base + key def _upload(local_list, seq, total, total_size): file_list = local_list.keys() file_list.sort() ret = EX_OK for file in file_list: seq += 1 item = local_list[file] src = item['full_name'] try: src_md5 = local_list.get_md5(file) except IOError: src_md5 = None uri = S3Uri(item['remote_uri']) seq_label = "[%d of %d]" % (seq, total) extra_headers = copy(cfg.extra_headers) try: attr_header = _build_attr_header(local_list[file], file, src_md5) debug(u"attr_header: %s" % attr_header) extra_headers.update(attr_header) response = s3.object_put(src, uri, extra_headers, extra_label = seq_label) except S3UploadError as exc: error(u"Upload of '%s' failed too many times (Last reason: %s)" % (item['full_name'], exc)) if cfg.stop_on_error: ret = EX_DATAERR error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue except InvalidFileError as exc: error(u"Upload of '%s' is not possible (Reason: %s)" % (item['full_name'], exc)) if cfg.stop_on_error: ret = EX_OSFILE error(u"Exiting now because of --stop-on-error") raise ret = EX_PARTIAL continue speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) if not cfg.progress_meter: output(u"upload: '%s' -> '%s' (%d bytes in %0.1f seconds, %0.2f %sB/s) %s" % (item['full_name'], uri, response["size"], response["elapsed"], speed_fmt[0], speed_fmt[1], seq_label)) total_size += response["size"] uploaded_objects_list.append(uri.object()) return ret, seq, total_size stats_info = StatsInfo() local_list, single_file_local, src_exclude_list, local_total_size = fetch_local_list( args[:-1], is_src=True, recursive=True, with_dirs=cfg.keep_dirs ) # - The source path is either like "/myPath/my_src_folder" and # the user want to upload this single folder and optionally only delete # things that have been removed inside this folder. For this case, # we only have to look inside destination_base/my_src_folder and not at # the root of destination_base. # - Or like "/myPath/my_src_folder/" and the user want to have the sync # with the content of this folder # Special case, "." for current folder. destbase_with_source_list = set() for source_arg in source_args: if not source_arg.endswith('/') and os.path.basename(source_arg) != '.' \ and not single_file_local: destbase_with_source_list.add(s3path.join( destination_base, os.path.basename(source_arg) )) else: destbase_with_source_list.add(destination_base) remote_list, dst_exclude_list, remote_total_size = fetch_remote_list(destbase_with_source_list, recursive = True, require_attribs = True) local_count = len(local_list) orig_local_count = local_count remote_count = len(remote_list) info(u"Found %d local files, %d remote files" % (local_count, remote_count)) if single_file_local and len(local_list) == 1 and len(remote_list) == 1: ## Make remote_key same as local_key for comparison if we're dealing with only one file remote_list_entry = remote_list[remote_list.keys()[0]] # Flush remote_list, by the way remote_list = FileDict() remote_list[local_list.keys()[0]] = remote_list_entry local_list, remote_list, update_list, copy_pairs = compare_filelists(local_list, remote_list, src_remote = False, dst_remote = True) local_count = len(local_list) update_count = len(update_list) copy_count = len(copy_pairs) remote_count = len(remote_list) upload_count = local_count + update_count info(u"Summary: %d local files to upload, %d files to remote copy, %d remote files to delete" % (upload_count, copy_count, remote_count)) _set_remote_uri(local_list, destination_base, single_file_local) _set_remote_uri(update_list, destination_base, single_file_local) if cfg.dry_run: keys = filedicts_to_keys(src_exclude_list, dst_exclude_list) for key in keys: output(u"exclude: %s" % key) for key in local_list: output(u"upload: '%s' -> '%s'" % (local_list[key]['full_name'], local_list[key]['remote_uri'])) for key in update_list: output(u"upload: '%s' -> '%s'" % (update_list[key]['full_name'], update_list[key]['remote_uri'])) for relative_file, item in copy_pairs.items(): output(u"remote copy: '%s' -> '%s'" % (item['copy_src'], relative_file)) if cfg.delete_removed: for key in remote_list: output(u"delete: '%s'" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return EX_OK # if there are copy pairs, we can't do delete_before, on the chance # we need one of the to-be-deleted files as a copy source. if len(copy_pairs) > 0: cfg.delete_after = True if cfg.delete_removed and orig_local_count == 0 and len(remote_list) and not cfg.force: warning(u"delete: cowardly refusing to delete because no source files were found. Use --force to override.") cfg.delete_removed = False if cfg.delete_removed and not cfg.delete_after and remote_list: subcmd_batch_del(remote_list = remote_list) size_transferred = 0 total_elapsed = 0.0 timestamp_start = time.time() ret, n, size_transferred = _upload(local_list, 0, upload_count, size_transferred) status, n, size_transferred = _upload(update_list, n, upload_count, size_transferred) if ret == EX_OK: ret = status # uploaded_objects_list reference is passed so it can be filled with # destination object of successful copies so that they can be # invalidated by cf n_copies, saved_bytes, failed_copy_files = remote_copy( s3, copy_pairs, destination_base, uploaded_objects_list, True) #upload file that could not be copied debug("Process files that were not remotely copied") failed_copy_count = len(failed_copy_files) _set_remote_uri(failed_copy_files, destination_base, single_file_local) status, n, size_transferred = _upload(failed_copy_files, n, upload_count + failed_copy_count, size_transferred) if ret == EX_OK: ret = status if cfg.delete_removed and cfg.delete_after and remote_list: subcmd_batch_del(remote_list = remote_list) total_elapsed = max(1.0, time.time() - timestamp_start) total_speed = total_elapsed and size_transferred / total_elapsed or 0.0 speed_fmt = formatSize(total_speed, human_readable = True, floating_point = True) stats_info.files = orig_local_count stats_info.size = local_total_size stats_info.files_transferred = upload_count + failed_copy_count stats_info.size_transferred = size_transferred stats_info.files_copied = n_copies stats_info.size_copied = saved_bytes stats_info.files_deleted = remote_count # Only print out the result if any work has been done or # if the user asked for verbose output outstr = "Done. Uploaded %d bytes in %0.1f seconds, %0.2f %sB/s." % (size_transferred, total_elapsed, speed_fmt[0], speed_fmt[1]) if cfg.stats: outstr += stats_info.format_output() output(outstr) elif size_transferred + saved_bytes > 0: output(outstr) else: info(outstr) return ret def _invalidate_on_cf(destination_base_uri): cf = CloudFront(cfg) default_index_file = None if cfg.invalidate_default_index_on_cf or cfg.invalidate_default_index_root_on_cf: info_response = s3.website_info(destination_base_uri, cfg.bucket_location) if info_response: default_index_file = info_response['index_document'] if len(default_index_file) < 1: default_index_file = None results = cf.InvalidateObjects(destination_base_uri, uploaded_objects_list, default_index_file, cfg.invalidate_default_index_on_cf, cfg.invalidate_default_index_root_on_cf) for result in results: if result['status'] == 201: output(u"Created invalidation request for %d paths" % len(uploaded_objects_list)) output(u"Check progress with: s3cmd cfinvalinfo cf://%s/%s" % (result['dist_id'], result['request_id'])) # main execution uploaded_objects_list = [] if cfg.encrypt: error(u"S3cmd 'sync' doesn't yet support GPG encryption, sorry.") error(u"Either use unconditional 's3cmd put --recursive'") error(u"or disable encryption with --no-encrypt parameter.") sys.exit(EX_USAGE) for arg in args[:-1]: if not os.path.exists(deunicodise(arg)): raise ParameterError("Invalid source: '%s' is not an existing file or directory" % arg) destinations = [args[-1]] if cfg.additional_destinations: destinations = destinations + cfg.additional_destinations if 'fork' not in os.__all__ or len(destinations) < 2: ret = _single_process(args[:-1]) destination_base_uri = S3Uri(destinations[-1]) if cfg.invalidate_on_cf: if len(uploaded_objects_list) == 0: info("Nothing to invalidate in CloudFront") else: _invalidate_on_cf(destination_base_uri) else: ret = _parent(args[:-1]) if cfg.invalidate_on_cf: error(u"You cannot use both --cf-invalidate and --add-destination.") return(EX_USAGE) return ret def cmd_sync(args): cfg = Config() if (len(args) < 2): syntax_msg = '' commands_list = get_commands_list() for cmd in commands_list: if cmd.get('cmd') == 'sync': syntax_msg = cmd.get('param', '') break raise ParameterError("Too few parameters! Expected: %s" % syntax_msg) if cfg.delay_updates: warning(u"`delay-updates` is obsolete.") for arg in args: if arg == u'-': raise ParameterError("Stdin or stdout ('-') can't be used for a source or a destination with the sync command.") if S3Uri(args[0]).type == "file" and S3Uri(args[-1]).type == "s3": return cmd_sync_local2remote(args) if S3Uri(args[0]).type == "s3" and S3Uri(args[-1]).type == "file": return cmd_sync_remote2local(args) if S3Uri(args[0]).type == "s3" and S3Uri(args[-1]).type == "s3": return cmd_sync_remote2remote(args) raise ParameterError("Invalid source/destination: '%s'" % "' '".join(args)) def cmd_setacl(args): cfg = Config() s3 = S3(cfg) set_to_acl = cfg.acl_public and "Public" or "Private" if not cfg.recursive: old_args = args args = [] for arg in old_args: uri = S3Uri(arg) if not uri.has_object(): if cfg.acl_public != None: info("Setting bucket-level ACL for %s to %s" % (uri.uri(), set_to_acl)) else: info("Setting bucket-level ACL for %s" % (uri.uri())) if not cfg.dry_run: update_acl(s3, uri) else: args.append(arg) remote_list, exclude_list, _ = fetch_remote_list(args) remote_count = len(remote_list) info(u"Summary: %d remote files to update" % remote_count) if cfg.dry_run: for key in exclude_list: output(u"exclude: %s" % key) for key in remote_list: output(u"setacl: '%s'" % remote_list[key]['object_uri_str']) warning(u"Exiting now because of --dry-run") return EX_OK seq = 0 for key in remote_list: seq += 1 seq_label = "[%d of %d]" % (seq, remote_count) uri = S3Uri(remote_list[key]['object_uri_str']) update_acl(s3, uri, seq_label) return EX_OK def cmd_setobjectlegalhold(args): cfg = Config() s3 = S3(cfg) legal_hold_status = args[0] uri = S3Uri(args[1]) if legal_hold_status not in ["ON", "OFF"]: raise ParameterError("Incorrect status") if cfg.dry_run: return EX_OK response = s3.set_object_legal_hold(uri, legal_hold_status) debug(u"response - %s" % response['status']) if response['status'] == 204: output(u"%s: Legal Hold updated" % uri) return EX_OK def cmd_setobjectretention(args): cfg = Config() s3 = S3(cfg) mode = args[0] retain_until_date = args[1] uri = S3Uri(args[2]) if mode not in ["COMPLIANCE", "GOVERNANCE"]: raise ParameterError("Incorrect mode") try: datetime.datetime.strptime(retain_until_date, '%Y-%m-%dT%H:%M:%SZ') except ValueError: raise ParameterError("Incorrect data format, should be YYYY-MM-DDTHH:MM:SSZ") if cfg.dry_run: return EX_OK response = s3.set_object_retention(uri, mode, retain_until_date) debug(u"response - %s" % response['status']) if response['status'] == 204: output(u"%s: Retention updated" % uri) def cmd_setversioning(args): cfg = Config() s3 = S3(cfg) bucket_uri = S3Uri(args[0]) if bucket_uri.object(): raise ParameterError("Only bucket name is required for [setversioning] command") status = args[1] if status not in ["enable", "disable"]: raise ParameterError("Must be 'enable' or 'disable'. Got: %s" % status) enabled = True if status == "enable" else False response = s3.set_versioning(bucket_uri, enabled) debug(u"response - %s" % response['status']) if response['status'] == 200: output(u"%s: Versioning status updated" % bucket_uri) return EX_OK def cmd_setownership(args): cfg = Config() s3 = S3(cfg) bucket_uri = S3Uri(args[0]) if bucket_uri.object(): raise ParameterError("Only bucket name is required for [setownership] command") valid_values = {x.lower():x for x in [ 'BucketOwnerPreferred', 'BucketOwnerEnforced', 'ObjectWriter' ]} value = valid_values.get(args[1].lower()) if not value: choices = " or ".join(['%s' % x for x in valid_values.keys()]) raise ParameterError("Must be %s. Got: %s" % (choices, args[1])) response = s3.set_bucket_ownership(bucket_uri, value) debug(u"response - %s" % response['status']) if response['status'] == 200: output(u"%s: Bucket Object Ownership updated" % bucket_uri) return EX_OK def cmd_setblockpublicaccess(args): cfg = Config() s3 = S3(cfg) bucket_uri = S3Uri(args[0]) if bucket_uri.object(): raise ParameterError("Only bucket name is required for [setblockpublicaccess] command") valid_values = {x.lower():x for x in [ 'BlockPublicAcls', 'IgnorePublicAcls', 'BlockPublicPolicy', 'RestrictPublicBuckets' ]} flags = {} raw_flags = args[1].split(',') for raw_value in raw_flags: if not raw_value: continue value = valid_values.get(raw_value.lower()) if not value: choices = " or ".join(['%s' % x for x in valid_values.keys()]) raise ParameterError("Must be %s. Got: %s" % (choices, raw_value)) flags[value] = True response = s3.set_bucket_public_access_block(bucket_uri, flags) debug(u"response - %s" % response['status']) if response['status'] == 200: output(u"%s: Block Public Access updated" % bucket_uri) return EX_OK def cmd_setpolicy(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[1]) policy_file = args[0] with open(deunicodise(policy_file), 'r') as fp: policy = fp.read() if cfg.dry_run: return EX_OK response = s3.set_policy(uri, policy) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) if response['status'] == 204: output(u"%s: Policy updated" % uri) return EX_OK def cmd_delpolicy(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.delete_policy(uri) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) output(u"%s: Policy deleted" % uri) return EX_OK def cmd_setcors(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[1]) cors_file = args[0] with open(deunicodise(cors_file), 'r') as fp: cors = fp.read() if cfg.dry_run: return EX_OK response = s3.set_cors(uri, cors) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) if response['status'] == 204: output(u"%s: CORS updated" % uri) return EX_OK def cmd_delcors(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.delete_cors(uri) #if retsponse['status'] == 200: debug(u"response - %s" % response['status']) output(u"%s: CORS deleted" % uri) return EX_OK def cmd_set_payer(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.set_payer(uri) if response['status'] == 200: output(u"%s: Payer updated" % uri) return EX_OK else: output(u"%s: Payer NOT updated" % uri) return EX_CONFLICT def cmd_setlifecycle(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[1]) lifecycle_policy_file = args[0] with open(deunicodise(lifecycle_policy_file), 'r') as fp: lifecycle_policy = fp.read() if cfg.dry_run: return EX_OK response = s3.set_lifecycle_policy(uri, lifecycle_policy) debug(u"response - %s" % response['status']) if response['status'] == 200: output(u"%s: Lifecycle Policy updated" % uri) return EX_OK def cmd_getlifecycle(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) response = s3.get_lifecycle_policy(uri) output(u"%s" % getPrettyFromXml(response['data'])) return EX_OK def cmd_dellifecycle(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) if cfg.dry_run: return EX_OK response = s3.delete_lifecycle_policy(uri) debug(u"response - %s" % response['status']) output(u"%s: Lifecycle Policy deleted" % uri) return EX_OK def cmd_setnotification(args): s3 = S3(Config()) uri = S3Uri(args[1]) notification_policy_file = args[0] with open(deunicodise(notification_policy_file), 'r') as fp: notification_policy = fp.read() response = s3.set_notification_policy(uri, notification_policy) debug(u"response - %s" % response['status']) if response['status'] == 200: output(u"%s: Notification Policy updated" % uri) return EX_OK def cmd_getnotification(args): s3 = S3(Config()) uri = S3Uri(args[0]) response = s3.get_notification_policy(uri) output(getPrettyFromXml(response['data'])) return EX_OK def cmd_delnotification(args): s3 = S3(Config()) uri = S3Uri(args[0]) response = s3.delete_notification_policy(uri) debug(u"response - %s" % response['status']) output(u"%s: Notification Policy deleted" % uri) return EX_OK def cmd_settagging(args): s3 = S3(Config()) uri = S3Uri(args[0]) tag_set_string = args[1] tagsets = [] for tagset in tag_set_string.split("&"): keyval = tagset.split("=", 1) key = keyval[0] if not key: raise ParameterError("Tag key should not be empty") value = len(keyval) > 1 and keyval[1] or "" tagsets.append((key, value)) debug(tagsets) response = s3.set_tagging(uri, tagsets) debug(u"response - %s" % response['status']) if response['status'] in [200, 204]: output(u"%s: Tagging updated" % uri) return EX_OK def cmd_gettagging(args): s3 = S3(Config()) uri = S3Uri(args[0]) tagsets = s3.get_tagging(uri) if uri.has_object(): output(u"%s (object):" % uri) else: output(u"%s (bucket):" % uri) debug(tagsets) for tag in tagsets: try: output(u"\t%s:\t%s" % ( tag['Key'], tag['Value'])) except KeyError: pass return EX_OK def cmd_deltagging(args): s3 = S3(Config()) uri = S3Uri(args[0]) response = s3.delete_tagging(uri) debug(u"response - %s" % response['status']) output(u"%s: Tagging deleted" % uri) return EX_OK def cmd_multipart(args): cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) #id = '' #if(len(args) > 1): id = args[1] upload_list = s3.get_multipart(uri) output(u"%s" % uri) debug(upload_list) output(u"Initiated\tPath\tId") for mpupload in upload_list: try: output(u"%s\t%s\t%s" % ( mpupload['Initiated'], "s3://" + uri.bucket() + "/" + mpupload['Key'], mpupload['UploadId'])) except KeyError: pass return EX_OK def cmd_abort_multipart(args): '''{"cmd":"abortmp", "label":"abort a multipart upload", "param":"s3://BUCKET/OBJECT Id", "func":cmd_abort_multipart, "argc":2},''' cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) if not uri.object(): raise ParameterError(u"Expecting S3 URI with a filename: %s" % uri.uri()) id = args[1] response = s3.abort_multipart(uri, id) debug(u"response - %s" % response['status']) output(u"%s" % uri) return EX_OK def cmd_list_multipart(args): '''{"cmd":"abortmp", "label":"list a multipart upload", "param":"s3://BUCKET Id", "func":cmd_list_multipart, "argc":2},''' cfg = Config() s3 = S3(cfg) uri = S3Uri(args[0]) id = args[1] part_list = s3.list_multipart(uri, id) output(u"LastModified\t\t\tPartNumber\tETag\tSize") for mpupload in part_list: try: output(u"%s\t%s\t%s\t%s" % (mpupload['LastModified'], mpupload['PartNumber'], mpupload['ETag'], mpupload['Size'])) except KeyError: pass return EX_OK def cmd_accesslog(args): cfg = Config() s3 = S3(cfg) bucket_uri = S3Uri(args.pop()) if bucket_uri.object(): raise ParameterError("Only bucket name is required for [accesslog] command") if cfg.log_target_prefix == False: accesslog, response = s3.set_accesslog(bucket_uri, enable = False) elif cfg.log_target_prefix: log_target_prefix_uri = S3Uri(cfg.log_target_prefix) if log_target_prefix_uri.type != "s3": raise ParameterError("--log-target-prefix must be a S3 URI") accesslog, response = s3.set_accesslog(bucket_uri, enable = True, log_target_prefix_uri = log_target_prefix_uri, acl_public = cfg.acl_public) else: # cfg.log_target_prefix == None accesslog = s3.get_accesslog(bucket_uri) output(u"Access logging for: %s" % bucket_uri.uri()) output(u" Logging Enabled: %s" % accesslog.isLoggingEnabled()) if accesslog.isLoggingEnabled(): output(u" Target prefix: %s" % accesslog.targetPrefix().uri()) #output(u" Public Access: %s" % accesslog.isAclPublic()) return EX_OK def cmd_sign(args): string_to_sign = args.pop() debug(u"string-to-sign: %r" % string_to_sign) signature = sign_string_v2(encode_to_s3(string_to_sign)) output(u"Signature: %s" % decode_from_s3(signature)) return EX_OK def cmd_signurl(args): expiry = args.pop() url_to_sign = S3Uri(args.pop()) if url_to_sign.type != 's3': raise ParameterError("Must be S3Uri. Got: %s" % url_to_sign) debug("url to sign: %r" % url_to_sign) signed_url = sign_url_v2(url_to_sign, expiry) output(signed_url) return EX_OK def cmd_fixbucket(args): def _unescape(text): ## # Removes HTML or XML character references and entities from a text string. # # @param text The HTML (or XML) source text. # @return The plain text, as a Unicode string, if necessary. # # From: http://effbot.org/zone/re-sub.htm#unescape-html def _unescape_fixup(m): text = m.group(0) if not 'apos' in htmlentitydefs.name2codepoint: htmlentitydefs.name2codepoint['apos'] = ord("'") if text[:2] == "&#": # character reference try: if text[:3] == "&#x": return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is text = text.encode('ascii', 'xmlcharrefreplace') return re.sub(r"&#?\w+;", _unescape_fixup, text) cfg = Config() cfg.urlencoding_mode = "fixbucket" s3 = S3(cfg) count = 0 for arg in args: culprit = S3Uri(arg) if culprit.type != "s3": raise ParameterError("Expecting S3Uri instead of: %s" % arg) response = s3.bucket_list_noparse(culprit.bucket(), culprit.object(), recursive = True) r_xent = re.compile(r"&#x[\da-fA-F]+;") data = decode_from_s3(response['data']) keys = re.findall("(.*?)", data, re.MULTILINE | re.UNICODE) debug("Keys: %r" % keys) for key in keys: if r_xent.search(key): info("Fixing: %s" % key) debug("Step 1: Transforming %s" % key) key_bin = _unescape(key) debug("Step 2: ... to %s" % key_bin) key_new = replace_nonprintables(key_bin) debug("Step 3: ... then to %s" % key_new) src = S3Uri("s3://%s/%s" % (culprit.bucket(), key_bin)) dst = S3Uri("s3://%s/%s" % (culprit.bucket(), key_new)) if cfg.dry_run: output(u"[--dry-run] File %r would be renamed to %s" % (key_bin, key_new)) continue try: resp_move = s3.object_move(src, dst) if resp_move['status'] == 200: output(u"File '%r' renamed to '%s'" % (key_bin, key_new)) count += 1 else: error(u"Something went wrong for: %r" % key) error(u"Please report the problem to s3tools-bugs@lists.sourceforge.net") except S3Error: error(u"Something went wrong for: %r" % key) error(u"Please report the problem to s3tools-bugs@lists.sourceforge.net") if count > 0: warning(u"Fixed %d files' names. Their ACL were reset to Private." % count) warning(u"Use 's3cmd setacl --acl-public s3://...' to make") warning(u"them publicly readable if required.") return EX_OK def resolve_list(lst, args): retval = [] for item in lst: retval.append(item % args) return retval def gpg_command(command, passphrase = ""): debug(u"GPG command: " + " ".join(command)) command = [deunicodise(cmd_entry) for cmd_entry in command] p = subprocess.Popen(command, stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.STDOUT, close_fds = True) p_stdout, p_stderr = p.communicate(deunicodise(passphrase + "\n")) debug(u"GPG output:") for line in unicodise(p_stdout).split("\n"): debug(u"GPG: " + line) p_exitcode = p.wait() return p_exitcode def gpg_encrypt(filename): cfg = Config() tmp_filename = Utils.mktmpfile() args = { "gpg_command" : cfg.gpg_command, "passphrase_fd" : "0", "input_file" : filename, "output_file" : tmp_filename, } info(u"Encrypting file %s to %s..." % (filename, tmp_filename)) command = resolve_list(cfg.gpg_encrypt.split(" "), args) code = gpg_command(command, cfg.gpg_passphrase) return (code, tmp_filename, "gpg") def gpg_decrypt(filename, gpgenc_header = "", in_place = True): cfg = Config() tmp_filename = Utils.mktmpfile(filename) args = { "gpg_command" : cfg.gpg_command, "passphrase_fd" : "0", "input_file" : filename, "output_file" : tmp_filename, } info(u"Decrypting file %s to %s..." % (filename, tmp_filename)) command = resolve_list(cfg.gpg_decrypt.split(" "), args) code = gpg_command(command, cfg.gpg_passphrase) if code == 0 and in_place: debug(u"Renaming %s to %s" % (tmp_filename, filename)) os.unlink(deunicodise(filename)) os.rename(deunicodise(tmp_filename), deunicodise(filename)) tmp_filename = filename return (code, tmp_filename) def run_configure(config_file, args): cfg = Config() options = [ ("access_key", "Access Key", "Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables."), ("secret_key", "Secret Key"), ("bucket_location", "Default Region"), ("host_base", "S3 Endpoint", "Use \"s3.amazonaws.com\" for S3 Endpoint and not modify it to the target Amazon S3."), ("host_bucket", "DNS-style bucket+hostname:port template for accessing a bucket", "Use \"%(bucket)s.s3.amazonaws.com\" to the target Amazon S3. \"%(bucket)s\" and \"%(location)s\" vars can be used\nif the target S3 system supports dns based buckets."), ("gpg_passphrase", "Encryption password", "Encryption password is used to protect your files from reading\nby unauthorized persons while in transfer to S3"), ("gpg_command", "Path to GPG program"), ("use_https", "Use HTTPS protocol", "When using secure HTTPS protocol all communication with Amazon S3\nservers is protected from 3rd party eavesdropping. This method is\nslower than plain HTTP, and can only be proxied with Python 2.7 or newer"), ("proxy_host", "HTTP Proxy server name", "On some networks all internet access must go through a HTTP proxy.\nTry setting it here if you can't connect to S3 directly"), ("proxy_port", "HTTP Proxy server port"), ] ## Option-specfic defaults if getattr(cfg, "gpg_command") == "": setattr(cfg, "gpg_command", which("gpg")) if getattr(cfg, "proxy_host") == "" and os.getenv("http_proxy"): autodetected_encoding = locale.getpreferredencoding() or "UTF-8" re_match=re.match(r"(http://)?([^:]+):(\d+)", unicodise_s(os.getenv("http_proxy"), autodetected_encoding)) if re_match: setattr(cfg, "proxy_host", re_match.groups()[1]) setattr(cfg, "proxy_port", re_match.groups()[2]) try: # Support for python3 # raw_input only exists in py2 and was renamed to input in py3 global input input = raw_input except NameError: pass try: while True: output(u"\nEnter new values or accept defaults in brackets with Enter.") output(u"Refer to user manual for detailed description of all options.") for option in options: prompt = option[1] ## Option-specific handling if option[0] == 'proxy_host' and getattr(cfg, 'use_https') == True and sys.hexversion < 0x02070000: setattr(cfg, option[0], "") continue if option[0] == 'proxy_port' and getattr(cfg, 'proxy_host') == "": setattr(cfg, option[0], 0) continue try: val = getattr(cfg, option[0]) if type(val) is bool: val = val and "Yes" or "No" if val not in (None, ""): prompt += " [%s]" % val except AttributeError: pass if len(option) >= 3: output(u"\n%s" % option[2]) val = unicodise_s(input(prompt + ": ")) if val != "": if type(getattr(cfg, option[0])) is bool: # Turn 'Yes' into True, everything else into False val = val.lower().startswith('y') setattr(cfg, option[0], val) output(u"\nNew settings:") for option in options: output(u" %s: %s" % (option[1], getattr(cfg, option[0]))) val = input("\nTest access with supplied credentials? [Y/n] ") if val.lower().startswith("y") or val == "": try: # Default, we try to list 'all' buckets which requires # ListAllMyBuckets permission if len(args) == 0: output(u"Please wait, attempting to list all buckets...") S3(Config()).bucket_list("", "") else: # If user specified a bucket name directly, we check it and only it. # Thus, access check can succeed even if user only has access to # to a single bucket and not ListAllMyBuckets permission. output(u"Please wait, attempting to list bucket: " + args[0]) uri = S3Uri(args[0]) if uri.type == "s3" and uri.has_bucket(): S3(Config()).bucket_list(uri.bucket(), "") else: raise Exception(u"Invalid bucket uri: " + args[0]) output(u"Success. Your access key and secret key worked fine :-)") output(u"\nNow verifying that encryption works...") if not getattr(cfg, "gpg_command") or not getattr(cfg, "gpg_passphrase"): output(u"Not configured. Never mind.") else: if not getattr(cfg, "gpg_command"): raise Exception("Path to GPG program not set") if not os.path.isfile(deunicodise(getattr(cfg, "gpg_command"))): raise Exception("GPG program not found") filename = Utils.mktmpfile() with open(deunicodise(filename), "w") as fp: fp.write(os.sys.copyright) ret_enc = gpg_encrypt(filename) ret_dec = gpg_decrypt(ret_enc[1], ret_enc[2], False) hash = [ hash_file_md5(filename), hash_file_md5(ret_enc[1]), hash_file_md5(ret_dec[1]), ] os.unlink(deunicodise(filename)) os.unlink(deunicodise(ret_enc[1])) os.unlink(deunicodise(ret_dec[1])) if hash[0] == hash[2] and hash[0] != hash[1]: output(u"Success. Encryption and decryption worked fine :-)") else: raise Exception("Encryption verification error.") except S3Error as e: error(u"Test failed: %s" % (e)) if e.code == "AccessDenied": error(u"Are you sure your keys have s3:ListAllMyBuckets permissions?") val = input("\nRetry configuration? [Y/n] ") if val.lower().startswith("y") or val == "": continue except Exception as e: error(u"Test failed: %s" % (e)) val = input("\nRetry configuration? [Y/n] ") if val.lower().startswith("y") or val == "": continue val = input("\nSave settings? [y/N] ") if val.lower().startswith("y"): break val = input("Retry configuration? [Y/n] ") if val.lower().startswith("n"): raise EOFError() ## Overwrite existing config file, make it user-readable only old_mask = os.umask(0o077) try: os.remove(deunicodise(config_file)) except OSError as e: if e.errno != errno.ENOENT: raise try: with io.open(deunicodise(config_file), "w", encoding=cfg.encoding) as fp: cfg.dump_config(fp) finally: os.umask(old_mask) output(u"Configuration saved to '%s'" % config_file) except (EOFError, KeyboardInterrupt): output(u"\nConfiguration aborted. Changes were NOT saved.") return except IOError as e: error(u"Writing config file failed: %s: %s" % (config_file, e.strerror)) sys.exit(EX_IOERR) def process_patterns_from_file(fname, patterns_list): try: with open(deunicodise(fname), "rt") as fn: for pattern in fn: pattern = unicodise(pattern).strip() if re.match("^#", pattern) or re.match(r"^\s*$", pattern): continue debug(u"%s: adding rule: %s" % (fname, pattern)) patterns_list.append(pattern) except IOError as e: error(e) sys.exit(EX_IOERR) return patterns_list def process_patterns(patterns_list, patterns_from, is_glob, option_txt = ""): r""" process_patterns(patterns, patterns_from, is_glob, option_txt = "") Process --exclude / --include GLOB and REGEXP patterns. 'option_txt' is 'exclude' / 'include' / 'rexclude' / 'rinclude' Returns: patterns_compiled, patterns_text Note: process_patterns_from_file will ignore lines starting with # as these are comments. To target escape the initial #, to use it in a file name, one can use: "[#]" (for exclude) or "\#" (for rexclude). """ patterns_compiled = [] patterns_textual = {} if patterns_list is None: patterns_list = [] if patterns_from: ## Append patterns from glob_from for fname in patterns_from: debug(u"processing --%s-from %s" % (option_txt, fname)) patterns_list = process_patterns_from_file(fname, patterns_list) for pattern in patterns_list: debug(u"processing %s rule: %s" % (option_txt, patterns_list)) if is_glob: pattern = glob.fnmatch.translate(pattern) r = re.compile(pattern) patterns_compiled.append(r) patterns_textual[r] = pattern return patterns_compiled, patterns_textual def get_commands_list(): return [ {"cmd":"mb", "label":"Make bucket", "param":"s3://BUCKET", "func":cmd_bucket_create, "argc":1}, {"cmd":"rb", "label":"Remove bucket", "param":"s3://BUCKET", "func":cmd_bucket_delete, "argc":1}, {"cmd":"ls", "label":"List objects or buckets", "param":"[s3://BUCKET[/PREFIX]]", "func":cmd_ls, "argc":0}, {"cmd":"la", "label":"List all object in all buckets", "param":"", "func":cmd_all_buckets_list_all_content, "argc":0}, {"cmd":"put", "label":"Put file into bucket", "param":"FILE [FILE...] s3://BUCKET[/PREFIX]", "func":cmd_object_put, "argc":2}, {"cmd":"get", "label":"Get file from bucket", "param":"s3://BUCKET/OBJECT LOCAL_FILE", "func":cmd_object_get, "argc":1}, {"cmd":"del", "label":"Delete file from bucket", "param":"s3://BUCKET/OBJECT", "func":cmd_object_del, "argc":1}, {"cmd":"rm", "label":"Delete file from bucket (alias for del)", "param":"s3://BUCKET/OBJECT", "func":cmd_object_del, "argc":1}, #{"cmd":"mkdir", "label":"Make a virtual S3 directory", "param":"s3://BUCKET/path/to/dir", "func":cmd_mkdir, "argc":1}, {"cmd":"restore", "label":"Restore file from Glacier storage", "param":"s3://BUCKET/OBJECT", "func":cmd_object_restore, "argc":1}, {"cmd":"sync", "label":"Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)", "param":"LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]", "func":cmd_sync, "argc":2}, {"cmd":"du", "label":"Disk usage by buckets", "param":"[s3://BUCKET[/PREFIX]]", "func":cmd_du, "argc":0}, {"cmd":"info", "label":"Get various information about Buckets or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_info, "argc":1}, {"cmd":"cp", "label":"Copy object", "param":"s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]", "func":cmd_cp, "argc":2}, {"cmd":"modify", "label":"Modify object metadata", "param":"s3://BUCKET1/OBJECT", "func":cmd_modify, "argc":1}, {"cmd":"mv", "label":"Move object", "param":"s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]", "func":cmd_mv, "argc":2}, {"cmd":"setacl", "label":"Modify Access control list for Bucket or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_setacl, "argc":1}, {"cmd":"setversioning", "label":"Modify Bucket Versioning", "param":"s3://BUCKET enable|disable", "func":cmd_setversioning, "argc":2}, {"cmd":"setownership", "label":"Modify Bucket Object Ownership", "param":"s3://BUCKET BucketOwnerPreferred|BucketOwnerEnforced|ObjectWriter", "func":cmd_setownership, "argc":2}, {"cmd":"setblockpublicaccess", "label":"Modify Block Public Access rules", "param":"s3://BUCKET BlockPublicAcls,IgnorePublicAcls,BlockPublicPolicy,RestrictPublicBuckets", "func":cmd_setblockpublicaccess, "argc":2}, {"cmd":"setobjectlegalhold", "label":"Modify Object Legal Hold", "param":"STATUS s3://BUCKET/OBJECT", "func":cmd_setobjectlegalhold, "argc":2}, {"cmd":"setobjectretention", "label":"Modify Object Retention", "param":"MODE RETAIN_UNTIL_DATE s3://BUCKET/OBJECT", "func":cmd_setobjectretention, "argc":3}, {"cmd":"setpolicy", "label":"Modify Bucket Policy", "param":"FILE s3://BUCKET", "func":cmd_setpolicy, "argc":2}, {"cmd":"delpolicy", "label":"Delete Bucket Policy", "param":"s3://BUCKET", "func":cmd_delpolicy, "argc":1}, {"cmd":"setcors", "label":"Modify Bucket CORS", "param":"FILE s3://BUCKET", "func":cmd_setcors, "argc":2}, {"cmd":"delcors", "label":"Delete Bucket CORS", "param":"s3://BUCKET", "func":cmd_delcors, "argc":1}, {"cmd":"payer", "label":"Modify Bucket Requester Pays policy", "param":"s3://BUCKET", "func":cmd_set_payer, "argc":1}, {"cmd":"multipart", "label":"Show multipart uploads", "param":"s3://BUCKET [Id]", "func":cmd_multipart, "argc":1}, {"cmd":"abortmp", "label":"Abort a multipart upload", "param":"s3://BUCKET/OBJECT Id", "func":cmd_abort_multipart, "argc":2}, {"cmd":"listmp", "label":"List parts of a multipart upload", "param":"s3://BUCKET/OBJECT Id", "func":cmd_list_multipart, "argc":2}, {"cmd":"accesslog", "label":"Enable/disable bucket access logging", "param":"s3://BUCKET", "func":cmd_accesslog, "argc":1}, {"cmd":"sign", "label":"Sign arbitrary string using the secret key", "param":"STRING-TO-SIGN", "func":cmd_sign, "argc":1}, {"cmd":"signurl", "label":"Sign an S3 URL to provide limited public access with expiry", "param":"s3://BUCKET/OBJECT ", "func":cmd_signurl, "argc":2}, {"cmd":"fixbucket", "label":"Fix invalid file names in a bucket", "param":"s3://BUCKET[/PREFIX]", "func":cmd_fixbucket, "argc":1}, ## Tagging commands {"cmd":"settagging", "label":"Modify tagging for Bucket or Files", "param":"s3://BUCKET[/OBJECT] \"KEY=VALUE[&KEY=VALUE ...]\"", "func":cmd_settagging, "argc":2}, {"cmd":"gettagging", "label":"Get tagging for Bucket or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_gettagging, "argc":1}, {"cmd":"deltagging", "label":"Delete tagging for Bucket or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_deltagging, "argc":1}, ## Website commands {"cmd":"ws-create", "label":"Create Website from bucket", "param":"s3://BUCKET", "func":cmd_website_create, "argc":1}, {"cmd":"ws-delete", "label":"Delete Website", "param":"s3://BUCKET", "func":cmd_website_delete, "argc":1}, {"cmd":"ws-info", "label":"Info about Website", "param":"s3://BUCKET", "func":cmd_website_info, "argc":1}, ## Lifecycle commands {"cmd":"expire", "label":"Set or delete expiration rule for the bucket", "param":"s3://BUCKET", "func":cmd_expiration_set, "argc":1}, {"cmd":"setlifecycle", "label":"Upload a lifecycle policy for the bucket", "param":"FILE s3://BUCKET", "func":cmd_setlifecycle, "argc":2}, {"cmd":"getlifecycle", "label":"Get a lifecycle policy for the bucket", "param":"s3://BUCKET", "func":cmd_getlifecycle, "argc":1}, {"cmd":"dellifecycle", "label":"Remove a lifecycle policy for the bucket", "param":"s3://BUCKET", "func":cmd_dellifecycle, "argc":1}, ## Notification commands {"cmd":"setnotification", "label":"Upload a notification policy for the bucket", "param":"FILE s3://BUCKET", "func":cmd_setnotification, "argc":2}, {"cmd":"getnotification", "label":"Get a notification policy for the bucket", "param":"s3://BUCKET", "func":cmd_getnotification, "argc":1}, {"cmd":"delnotification", "label":"Remove a notification policy for the bucket", "param":"s3://BUCKET", "func":cmd_delnotification, "argc":1}, ## CloudFront commands {"cmd":"cflist", "label":"List CloudFront distribution points", "param":"", "func":CfCmd.info, "argc":0}, {"cmd":"cfinfo", "label":"Display CloudFront distribution point parameters", "param":"[cf://DIST_ID]", "func":CfCmd.info, "argc":0}, {"cmd":"cfcreate", "label":"Create CloudFront distribution point", "param":"s3://BUCKET", "func":CfCmd.create, "argc":1}, {"cmd":"cfdelete", "label":"Delete CloudFront distribution point", "param":"cf://DIST_ID", "func":CfCmd.delete, "argc":1}, {"cmd":"cfmodify", "label":"Change CloudFront distribution point parameters", "param":"cf://DIST_ID", "func":CfCmd.modify, "argc":1}, {"cmd":"cfinval", "label":"Invalidate CloudFront objects", "param":"s3://BUCKET/OBJECT [s3://BUCKET/OBJECT ...]", "func":CfCmd.invalidate, "argc":1}, {"cmd":"cfinvalinfo", "label":"Display CloudFront invalidation request(s) status", "param":"cf://DIST_ID[/INVAL_ID]", "func":CfCmd.invalinfo, "argc":1}, ] def format_commands(progname, commands_list): help = "Commands:\n" for cmd in commands_list: help += " %s\n %s %s %s\n" % (cmd["label"], progname, cmd["cmd"], cmd["param"]) return help def update_acl(s3, uri, seq_label=""): cfg = Config() something_changed = False acl = s3.get_acl(uri) debug(u"acl: %s - %r" % (uri, acl.grantees)) if cfg.acl_public == True: if acl.isAnonRead(): info(u"%s: already Public, skipping %s" % (uri, seq_label)) else: acl.grantAnonRead() something_changed = True elif cfg.acl_public == False: # we explicitly check for False, because it could be None if not acl.isAnonRead() and not acl.isAnonWrite(): info(u"%s: already Private, skipping %s" % (uri, seq_label)) else: acl.revokeAnonRead() acl.revokeAnonWrite() something_changed = True # update acl with arguments # grant first and revoke later, because revoke has priority if cfg.acl_grants: something_changed = True for grant in cfg.acl_grants: acl.grant(**grant) if cfg.acl_revokes: something_changed = True for revoke in cfg.acl_revokes: acl.revoke(**revoke) if not something_changed: return retsponse = s3.set_acl(uri, acl) if retsponse['status'] == 200: if cfg.acl_public in (True, False): set_to_acl = cfg.acl_public and "Public" or "Private" output(u"%s: ACL set to %s %s" % (uri, set_to_acl, seq_label)) else: output(u"%s: ACL updated" % uri) class OptionMimeType(Option): def check_mimetype(self, opt, value): if re.compile(r"^[a-z0-9]+/[a-z0-9+\.-]+(;.*)?$", re.IGNORECASE).match(value): return value raise OptionValueError("option %s: invalid MIME-Type format: %r" % (opt, value)) class OptionS3ACL(Option): def check_s3acl(self, opt, value): permissions = ('read', 'write', 'read_acp', 'write_acp', 'full_control', 'all') try: permission, grantee = re.compile(r"^(\w+):(.+)$", re.IGNORECASE).match(value).groups() if not permission or not grantee: raise OptionValueError("option %s: invalid S3 ACL format: %r" % (opt, value)) if permission in permissions: return { 'name' : grantee, 'permission' : permission.upper() } else: raise OptionValueError("option %s: invalid S3 ACL permission: %s (valid values: %s)" % (opt, permission, ", ".join(permissions))) except OptionValueError: raise except Exception: raise OptionValueError("option %s: invalid S3 ACL format: %r" % (opt, value)) class OptionAll(OptionMimeType, OptionS3ACL): TYPE_CHECKER = copy(Option.TYPE_CHECKER) TYPE_CHECKER["mimetype"] = OptionMimeType.check_mimetype TYPE_CHECKER["s3acl"] = OptionS3ACL.check_s3acl TYPES = Option.TYPES + ("mimetype", "s3acl") class MyHelpFormatter(IndentedHelpFormatter): def format_epilog(self, epilog): if epilog: return "\n" + epilog + "\n" else: return "" def main(): cfg = Config() commands_list = get_commands_list() commands = {} ## Populate "commands" from "commands_list" for cmd in commands_list: if 'cmd' in cmd: commands[cmd['cmd']] = cmd optparser = OptionParser(option_class=OptionAll, formatter=MyHelpFormatter()) #optparser.disable_interspersed_args() autodetected_encoding = locale.getpreferredencoding() or "UTF-8" config_file = None if os.getenv("S3CMD_CONFIG"): config_file = unicodise_s(os.getenv("S3CMD_CONFIG"), autodetected_encoding) elif os.name == "nt" and os.getenv("USERPROFILE"): config_file = os.path.join( unicodise_s(os.getenv("USERPROFILE"), autodetected_encoding), os.getenv("APPDATA") and unicodise_s(os.getenv("APPDATA"), autodetected_encoding) or 'Application Data', "s3cmd.ini") else: from os.path import expanduser config_file = os.path.join(expanduser("~"), ".s3cfg") optparser.set_defaults(config = config_file) optparser.add_option( "--configure", dest="run_configure", action="store_true", help="Invoke interactive (re)configuration tool. Optionally use as '--configure s3://some-bucket' to test access to a specific bucket instead of attempting to list them all.") optparser.add_option("-c", "--config", dest="config", metavar="FILE", help="Config file name. Defaults to $HOME/.s3cfg") optparser.add_option( "--dump-config", dest="dump_config", action="store_true", help="Dump current configuration after parsing config files and command line options and exit.") optparser.add_option( "--access_key", dest="access_key", help="AWS Access Key") optparser.add_option( "--secret_key", dest="secret_key", help="AWS Secret Key") optparser.add_option( "--access_token", dest="access_token", help="AWS Access Token") optparser.add_option("-n", "--dry-run", dest="dry_run", action="store_true", help="Only show what should be uploaded or downloaded but don't actually do it. May still perform S3 requests to get bucket listings and other information though (only for file transfer commands)") optparser.add_option("-s", "--ssl", dest="use_https", action="store_true", help="Use HTTPS connection when communicating with S3. (default)") optparser.add_option( "--no-ssl", dest="use_https", action="store_false", help="Don't use HTTPS.") optparser.add_option("-e", "--encrypt", dest="encrypt", action="store_true", help="Encrypt files before uploading to S3.") optparser.add_option( "--no-encrypt", dest="encrypt", action="store_false", help="Don't encrypt files.") optparser.add_option("-f", "--force", dest="force", action="store_true", help="Force overwrite and other dangerous operations.") optparser.add_option( "--continue", dest="get_continue", action="store_true", help="Continue getting a partially downloaded file (only for [get] command).") optparser.add_option( "--continue-put", dest="put_continue", action="store_true", help="Continue uploading partially uploaded files or multipart upload parts. Restarts parts/files that don't have matching size and md5. Skips files/parts that do. Note: md5sum checks are not always sufficient to check (part) file equality. Enable this at your own risk.") optparser.add_option( "--upload-id", dest="upload_id", help="UploadId for Multipart Upload, in case you want continue an existing upload (equivalent to --continue-put) and there are multiple partial uploads. Use s3cmd multipart [URI] to see what UploadIds are associated with the given URI.") optparser.add_option( "--skip-existing", dest="skip_existing", action="store_true", help="Skip over files that exist at the destination (only for [get] and [sync] commands).") optparser.add_option("-r", "--recursive", dest="recursive", action="store_true", help="Recursive upload, download or removal.") optparser.add_option( "--check-md5", dest="check_md5", action="store_true", help="Check MD5 sums when comparing files for [sync]. (default)") optparser.add_option( "--no-check-md5", dest="check_md5", action="store_false", help="Do not check MD5 sums when comparing files for [sync]. Only size will be compared. May significantly speed up transfer but may also miss some changed files.") optparser.add_option("-P", "--acl-public", dest="acl_public", action="store_true", help="Store objects with ACL allowing read for anyone.") optparser.add_option( "--acl-private", dest="acl_public", action="store_false", help="Store objects with default ACL allowing access for you only.") optparser.add_option( "--acl-grant", dest="acl_grants", type="s3acl", action="append", metavar="PERMISSION:EMAIL or USER_CANONICAL_ID", help="Grant stated permission to a given amazon user. Permission is one of: read, write, read_acp, write_acp, full_control, all") optparser.add_option( "--acl-revoke", dest="acl_revokes", type="s3acl", action="append", metavar="PERMISSION:USER_CANONICAL_ID", help="Revoke stated permission for a given amazon user. Permission is one of: read, write, read_acp, write_acp, full_control, all") optparser.add_option("-D", "--restore-days", dest="restore_days", action="store", help="Number of days to keep restored file available (only for 'restore' command). Default is 1 day.", metavar="NUM") optparser.add_option( "--restore-priority", dest="restore_priority", action="store", choices=['standard', 'expedited', 'bulk'], help="Priority for restoring files from S3 Glacier (only for 'restore' command). Choices available: bulk, standard, expedited") optparser.add_option( "--delete-removed", dest="delete_removed", action="store_true", help="Delete destination objects with no corresponding source file [sync]") optparser.add_option( "--no-delete-removed", dest="delete_removed", action="store_false", help="Don't delete destination objects [sync]") optparser.add_option( "--delete-after", dest="delete_after", action="store_true", help="Perform deletes AFTER new uploads when delete-removed is enabled [sync]") optparser.add_option( "--delay-updates", dest="delay_updates", action="store_true", help="*OBSOLETE* Put all updated files into place at end [sync]") # OBSOLETE optparser.add_option( "--max-delete", dest="max_delete", action="store", help="Do not delete more than NUM files. [del] and [sync]", metavar="NUM") optparser.add_option( "--limit", dest="limit", action="store", help="Limit number of objects returned in the response body (only for [ls] and [la] commands)", metavar="NUM") optparser.add_option( "--add-destination", dest="additional_destinations", action="append", help="Additional destination for parallel uploads, in addition to last arg. May be repeated.") optparser.add_option( "--delete-after-fetch", dest="delete_after_fetch", action="store_true", help="Delete remote objects after fetching to local file (only for [get] and [sync] commands).") optparser.add_option("-p", "--preserve", dest="preserve_attrs", action="store_true", help="Preserve filesystem attributes (mode, ownership, timestamps). Default for [sync] command.") optparser.add_option( "--no-preserve", dest="preserve_attrs", action="store_false", help="Don't store FS attributes") optparser.add_option( "--keep-dirs", dest="keep_dirs", action="store_true", help="Preserve all local directories as remote objects including empty directories. Experimental feature.") optparser.add_option( "--exclude", dest="exclude", action="append", metavar="GLOB", help="Filenames and paths matching GLOB will be excluded from sync") optparser.add_option( "--exclude-from", dest="exclude_from", action="append", metavar="FILE", help="Read --exclude GLOBs from FILE") optparser.add_option( "--rexclude", dest="rexclude", action="append", metavar="REGEXP", help="Filenames and paths matching REGEXP (regular expression) will be excluded from sync") optparser.add_option( "--rexclude-from", dest="rexclude_from", action="append", metavar="FILE", help="Read --rexclude REGEXPs from FILE") optparser.add_option( "--include", dest="include", action="append", metavar="GLOB", help="Filenames and paths matching GLOB will be included even if previously excluded by one of --(r)exclude(-from) patterns") optparser.add_option( "--include-from", dest="include_from", action="append", metavar="FILE", help="Read --include GLOBs from FILE") optparser.add_option( "--rinclude", dest="rinclude", action="append", metavar="REGEXP", help="Same as --include but uses REGEXP (regular expression) instead of GLOB") optparser.add_option( "--rinclude-from", dest="rinclude_from", action="append", metavar="FILE", help="Read --rinclude REGEXPs from FILE") optparser.add_option( "--files-from", dest="files_from", action="append", metavar="FILE", help="Read list of source-file names from FILE. Use - to read from stdin.") optparser.add_option( "--region", "--bucket-location", metavar="REGION", dest="bucket_location", help="Region to create bucket in. As of now the regions are: us-east-1, us-west-1, us-west-2, eu-west-1, eu-central-1, ap-northeast-1, ap-southeast-1, ap-southeast-2, sa-east-1") optparser.add_option( "--host", metavar="HOSTNAME", dest="host_base", help="HOSTNAME:PORT for S3 endpoint (default: %s, alternatives such as s3-eu-west-1.amazonaws.com). You should also set --host-bucket." % (cfg.host_base)) optparser.add_option( "--host-bucket", dest="host_bucket", help="DNS-style bucket+hostname:port template for accessing a bucket (default: %s)" % (cfg.host_bucket)) optparser.add_option( "--reduced-redundancy", "--rr", dest="reduced_redundancy", action="store_true", help="Store object with 'Reduced redundancy'. Lower per-GB price. [put, cp, mv]") optparser.add_option( "--no-reduced-redundancy", "--no-rr", dest="reduced_redundancy", action="store_false", help="Store object without 'Reduced redundancy'. Higher per-GB price. [put, cp, mv]") optparser.add_option( "--storage-class", dest="storage_class", action="store", metavar="CLASS", help="Store object with specified CLASS (STANDARD, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER or DEEP_ARCHIVE). [put, cp, mv]") optparser.add_option( "--access-logging-target-prefix", dest="log_target_prefix", help="Target prefix for access logs (S3 URI) (for [cfmodify] and [accesslog] commands)") optparser.add_option( "--no-access-logging", dest="log_target_prefix", action="store_false", help="Disable access logging (for [cfmodify] and [accesslog] commands)") optparser.add_option( "--default-mime-type", dest="default_mime_type", type="mimetype", action="store", help="Default MIME-type for stored objects. Application default is binary/octet-stream.") optparser.add_option("-M", "--guess-mime-type", dest="guess_mime_type", action="store_true", help="Guess MIME-type of files by their extension or mime magic. Fall back to default MIME-Type as specified by --default-mime-type option") optparser.add_option( "--no-guess-mime-type", dest="guess_mime_type", action="store_false", help="Don't guess MIME-type and use the default type instead.") optparser.add_option( "--no-mime-magic", dest="use_mime_magic", action="store_false", help="Don't use mime magic when guessing MIME-type.") optparser.add_option("-m", "--mime-type", dest="mime_type", type="mimetype", metavar="MIME/TYPE", help="Force MIME-type. Override both --default-mime-type and --guess-mime-type.") optparser.add_option( "--add-header", dest="add_header", action="append", metavar="NAME:VALUE", help="Add a given HTTP header to the upload request. Can be used multiple times. For instance set 'Expires' or 'Cache-Control' headers (or both) using this option.") optparser.add_option( "--remove-header", dest="remove_headers", action="append", metavar="NAME", help="Remove a given HTTP header. Can be used multiple times. For instance, remove 'Expires' or 'Cache-Control' headers (or both) using this option. [modify]") optparser.add_option( "--server-side-encryption", dest="server_side_encryption", action="store_true", help="Specifies that server-side encryption will be used when putting objects. [put, sync, cp, modify]") optparser.add_option( "--server-side-encryption-kms-id", dest="kms_key", action="store", help="Specifies the key id used for server-side encryption with AWS KMS-Managed Keys (SSE-KMS) when putting objects. [put, sync, cp, modify]") optparser.add_option( "--encoding", dest="encoding", metavar="ENCODING", help="Override autodetected terminal and filesystem encoding (character set). Autodetected: %s" % autodetected_encoding) optparser.add_option( "--add-encoding-exts", dest="add_encoding_exts", metavar="EXTENSIONs", help="Add encoding to these comma delimited extensions i.e. (css,js,html) when uploading to S3 )") optparser.add_option( "--verbatim", dest="urlencoding_mode", action="store_const", const="verbatim", help="Use the S3 name as given on the command line. No pre-processing, encoding, etc. Use with caution!") optparser.add_option( "--disable-multipart", dest="enable_multipart", action="store_false", help="Disable multipart upload on files bigger than --multipart-chunk-size-mb") optparser.add_option( "--multipart-chunk-size-mb", dest="multipart_chunk_size_mb", type="int", action="store", metavar="SIZE", help="Size of each chunk of a multipart upload. Files bigger than SIZE are automatically uploaded as multithreaded-multipart, smaller files are uploaded using the traditional method. SIZE is in Mega-Bytes, default chunk size is 15MB, minimum allowed chunk size is 5MB, maximum is 5GB.") optparser.add_option( "--list-md5", dest="list_md5", action="store_true", help="Include MD5 sums in bucket listings (only for 'ls' command).") optparser.add_option( "--list-allow-unordered", dest="list_allow_unordered", action="store_true", help="Not an AWS standard. Allow the listing results to be returned in unsorted order. This may be faster when listing very large buckets.") optparser.add_option("-H", "--human-readable-sizes", dest="human_readable_sizes", action="store_true", help="Print sizes in human readable form (eg 1kB instead of 1234).") optparser.add_option( "--ws-index", dest="website_index", action="store", help="Name of index-document (only for [ws-create] command)") optparser.add_option( "--ws-error", dest="website_error", action="store", help="Name of error-document (only for [ws-create] command)") optparser.add_option( "--expiry-date", dest="expiry_date", action="store", help="Indicates when the expiration rule takes effect. (only for [expire] command)") optparser.add_option( "--expiry-days", dest="expiry_days", action="store", help="Indicates the number of days after object creation the expiration rule takes effect. (only for [expire] command)") optparser.add_option( "--expiry-prefix", dest="expiry_prefix", action="store", help="Identifying one or more objects with the prefix to which the expiration rule applies. (only for [expire] command)") optparser.add_option( "--skip-destination-validation", dest="skip_destination_validation", action="store_true", help="Skips validation of Amazon SQS, Amazon SNS, and AWS Lambda destinations when applying notification configuration. (only for [setnotification] command)") optparser.add_option( "--progress", dest="progress_meter", action="store_true", help="Display progress meter (default on TTY).") optparser.add_option( "--no-progress", dest="progress_meter", action="store_false", help="Don't display progress meter (default on non-TTY).") optparser.add_option( "--stats", dest="stats", action="store_true", help="Give some file-transfer stats.") optparser.add_option( "--enable", dest="enable", action="store_true", help="Enable given CloudFront distribution (only for [cfmodify] command)") optparser.add_option( "--disable", dest="enable", action="store_false", help="Disable given CloudFront distribution (only for [cfmodify] command)") optparser.add_option( "--cf-invalidate", dest="invalidate_on_cf", action="store_true", help="Invalidate the uploaded filed in CloudFront. Also see [cfinval] command.") # joseprio: adding options to invalidate the default index and the default # index root optparser.add_option( "--cf-invalidate-default-index", dest="invalidate_default_index_on_cf", action="store_true", help="When using Custom Origin and S3 static website, invalidate the default index file.") optparser.add_option( "--cf-no-invalidate-default-index-root", dest="invalidate_default_index_root_on_cf", action="store_false", help="When using Custom Origin and S3 static website, don't invalidate the path to the default index file.") optparser.add_option( "--cf-add-cname", dest="cf_cnames_add", action="append", metavar="CNAME", help="Add given CNAME to a CloudFront distribution (only for [cfcreate] and [cfmodify] commands)") optparser.add_option( "--cf-remove-cname", dest="cf_cnames_remove", action="append", metavar="CNAME", help="Remove given CNAME from a CloudFront distribution (only for [cfmodify] command)") optparser.add_option( "--cf-comment", dest="cf_comment", action="store", metavar="COMMENT", help="Set COMMENT for a given CloudFront distribution (only for [cfcreate] and [cfmodify] commands)") optparser.add_option( "--cf-default-root-object", dest="cf_default_root_object", action="store", metavar="DEFAULT_ROOT_OBJECT", help="Set the default root object to return when no object is specified in the URL. Use a relative path, i.e. default/index.html instead of /default/index.html or s3://bucket/default/index.html (only for [cfcreate] and [cfmodify] commands)") optparser.add_option("-v", "--verbose", dest="verbosity", action="store_const", const=logging.INFO, help="Enable verbose output.") optparser.add_option("-d", "--debug", dest="verbosity", action="store_const", const=logging.DEBUG, help="Enable debug output.") optparser.add_option( "--version", dest="show_version", action="store_true", help="Show s3cmd version (%s) and exit." % (PkgInfo.version)) optparser.add_option("-F", "--follow-symlinks", dest="follow_symlinks", action="store_true", default=False, help="Follow symbolic links as if they are regular files") optparser.add_option( "--cache-file", dest="cache_file", action="store", default="", metavar="FILE", help="Cache FILE containing local source MD5 values") optparser.add_option("-q", "--quiet", dest="quiet", action="store_true", default=False, help="Silence output on stdout") optparser.add_option( "--ca-certs", dest="ca_certs_file", action="store", default=None, help="Path to SSL CA certificate FILE (instead of system default)") optparser.add_option( "--ssl-cert", dest="ssl_client_cert_file", action="store", default=None, help="Path to client own SSL certificate CRT_FILE") optparser.add_option( "--ssl-key", dest="ssl_client_key_file", action="store", default=None, help="Path to client own SSL certificate private key KEY_FILE") optparser.add_option( "--check-certificate", dest="check_ssl_certificate", action="store_true", help="Check SSL certificate validity") optparser.add_option( "--no-check-certificate", dest="check_ssl_certificate", action="store_false", help="Do not check SSL certificate validity") optparser.add_option( "--check-hostname", dest="check_ssl_hostname", action="store_true", help="Check SSL certificate hostname validity") optparser.add_option( "--no-check-hostname", dest="check_ssl_hostname", action="store_false", help="Do not check SSL certificate hostname validity") optparser.add_option( "--signature-v2", dest="signature_v2", action="store_true", help="Use AWS Signature version 2 instead of newer signature methods. Helpful for S3-like systems that don't have AWS Signature v4 yet.") optparser.add_option( "--limit-rate", dest="limitrate", action="store", type="string", help="Limit the upload or download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix") optparser.add_option( "--no-connection-pooling", dest="connection_pooling", action="store_false", help="Disable connection reuse") optparser.add_option( "--requester-pays", dest="requester_pays", action="store_true", help="Set the REQUESTER PAYS flag for operations") optparser.add_option("-l", "--long-listing", dest="long_listing", action="store_true", help="Produce long listing [ls]") optparser.add_option( "--stop-on-error", dest="stop_on_error", action="store_true", help="stop if error in transfer") optparser.add_option( "--max-retries", dest="max_retries", action="store", help="Maximum number of times to retry a failed request before giving up. Default is 5", metavar="NUM") optparser.add_option( "--content-disposition", dest="content_disposition", action="store", help="Provide a Content-Disposition for signed URLs, e.g., \"inline; filename=myvideo.mp4\"") optparser.add_option( "--content-type", dest="content_type", action="store", help="Provide a Content-Type for signed URLs, e.g., \"video/mp4\"") optparser.set_usage(optparser.usage + " COMMAND [parameters]") optparser.set_description('S3cmd is a tool for managing objects in '+ 'Amazon S3 storage. It allows for making and removing '+ '"buckets" and uploading, downloading and removing '+ '"objects" from these buckets.') optparser.epilog = format_commands(optparser.get_prog_name(), commands_list) optparser.epilog += ("\nFor more information, updates and news, visit the s3cmd website:\n%s\n" % PkgInfo.url) (options, args) = optparser.parse_args() ## Some mucking with logging levels to enable ## debugging/verbose output for config file parser on request logging.basicConfig(level=options.verbosity or Config().verbosity, format='%(levelname)s: %(message)s', stream = sys.stderr) if options.show_version: output(u"s3cmd version %s" % PkgInfo.version) sys.exit(EX_OK) debug(u"s3cmd version %s" % PkgInfo.version) if options.quiet: try: f = open("/dev/null", "w") sys.stdout = f except IOError: warning(u"Unable to open /dev/null: --quiet disabled.") ## Now finally parse the config file if not options.config: error(u"Can't find a config file. Please use --config option.") sys.exit(EX_CONFIG) try: cfg = Config(options.config, options.access_key, options.secret_key, options.access_token) except ValueError as exc: raise ParameterError(unicode(exc)) except IOError as e: if options.run_configure: cfg = Config() else: error(u"%s: %s" % (options.config, e.strerror)) error(u"Configuration file not available.") error(u"Consider using --configure parameter to create one.") sys.exit(EX_CONFIG) # allow commandline verbosity config to override config file if options.verbosity is not None: cfg.verbosity = options.verbosity logging.root.setLevel(cfg.verbosity) ## Unsupported features on Win32 platform if os.name == "nt": if cfg.preserve_attrs: error(u"Option --preserve is not yet supported on MS Windows platform. Assuming --no-preserve.") cfg.preserve_attrs = False if cfg.progress_meter: error(u"Option --progress is not yet supported on MS Windows platform. Assuming --no-progress.") cfg.progress_meter = False ## Pre-process --add-header's and put them to Config.extra_headers SortedDict() if options.add_header: for hdr in options.add_header: try: key, val = unicodise_s(hdr).split(":", 1) except ValueError: raise ParameterError("Invalid header format: %s" % unicodise_s(hdr)) # key char restrictions of the http headers name specification key_inval = re.sub(r"[a-zA-Z0-9\-.!#$%&*+^_|]", "", key) if key_inval: key_inval = key_inval.replace(" ", "") key_inval = key_inval.replace("\t", "") raise ParameterError("Invalid character(s) in header name '%s'" ": \"%s\"" % (key, key_inval)) debug(u"Updating Config.Config extra_headers[%s] -> %s" % (key.strip().lower(), val.strip())) cfg.extra_headers[key.strip().lower()] = val.strip() # Process --remove-header if options.remove_headers: cfg.remove_headers = options.remove_headers ## --acl-grant/--acl-revoke arguments are pre-parsed by OptionS3ACL() if options.acl_grants: for grant in options.acl_grants: cfg.acl_grants.append(grant) if options.acl_revokes: for grant in options.acl_revokes: cfg.acl_revokes.append(grant) ## Process --(no-)check-md5 if options.check_md5 == False: if "md5" in cfg.sync_checks: cfg.sync_checks.remove("md5") if "md5" in cfg.preserve_attrs_list: cfg.preserve_attrs_list.remove("md5") elif options.check_md5 == True: if "md5" not in cfg.sync_checks: cfg.sync_checks.append("md5") if "md5" not in cfg.preserve_attrs_list: cfg.preserve_attrs_list.append("md5") ## Update Config with other parameters for option in cfg.option_list(): try: value = getattr(options, option) if value != None: if type(value) == type(b''): value = unicodise_s(value) debug(u"Updating Config.Config %s -> %s" % (option, value)) cfg.update_option(option, value) except AttributeError: ## Some Config() options are not settable from command line pass ## Special handling for tri-state options (True, False, None) cfg.update_option("enable", options.enable) if options.acl_public is not None: cfg.update_option("acl_public", options.acl_public) ## Check multipart chunk constraints if cfg.multipart_chunk_size_mb < MultiPartUpload.MIN_CHUNK_SIZE_MB: raise ParameterError("Chunk size %d MB is too small, must be >= %d MB. Please adjust --multipart-chunk-size-mb" % (cfg.multipart_chunk_size_mb, MultiPartUpload.MIN_CHUNK_SIZE_MB)) if cfg.multipart_chunk_size_mb > MultiPartUpload.MAX_CHUNK_SIZE_MB: raise ParameterError("Chunk size %d MB is too large, must be <= %d MB. Please adjust --multipart-chunk-size-mb" % (cfg.multipart_chunk_size_mb, MultiPartUpload.MAX_CHUNK_SIZE_MB)) ## If an UploadId was provided, set put_continue True if options.upload_id: cfg.upload_id = options.upload_id cfg.put_continue = True if cfg.upload_id and not cfg.multipart_chunk_size_mb: raise ParameterError("Must have --multipart-chunk-size-mb if using --put-continue or --upload-id") ## CloudFront's cf_enable and Config's enable share the same --enable switch options.cf_enable = options.enable ## CloudFront's cf_logging and Config's log_target_prefix share the same --log-target-prefix switch options.cf_logging = options.log_target_prefix ## Update CloudFront options if some were set for option in CfCmd.options.option_list(): try: value = getattr(options, option) if value != None: if type(value) == type(b''): value = unicodise_s(value) if value != None: debug(u"Updating CloudFront.Cmd %s -> %s" % (option, value)) CfCmd.options.update_option(option, value) except AttributeError: ## Some CloudFront.Cmd.Options() options are not settable from command line pass if options.additional_destinations: cfg.additional_destinations = options.additional_destinations if options.files_from: cfg.files_from = options.files_from ## Set output and filesystem encoding for printing out filenames. try: # Support for python3 # That don't need codecs if output is the # encoding of the system, but just in case, still use it. # For that, we need to use directly the binary buffer # of stdout/stderr sys.stdout = codecs.getwriter(cfg.encoding)(sys.stdout.buffer, "replace") sys.stderr = codecs.getwriter(cfg.encoding)(sys.stderr.buffer, "replace") # getwriter with create an "IObuffer" that have not the encoding attribute # better to add it to not break some functions like "input". sys.stdout.encoding = cfg.encoding sys.stderr.encoding = cfg.encoding except AttributeError: sys.stdout = codecs.getwriter(cfg.encoding)(sys.stdout, "replace") sys.stderr = codecs.getwriter(cfg.encoding)(sys.stderr, "replace") ## Process --exclude and --exclude-from patterns_list, patterns_textual = process_patterns(options.exclude, options.exclude_from, is_glob = True, option_txt = "exclude") cfg.exclude.extend(patterns_list) cfg.debug_exclude.update(patterns_textual) ## Process --rexclude and --rexclude-from patterns_list, patterns_textual = process_patterns(options.rexclude, options.rexclude_from, is_glob = False, option_txt = "rexclude") cfg.exclude.extend(patterns_list) cfg.debug_exclude.update(patterns_textual) ## Process --include and --include-from patterns_list, patterns_textual = process_patterns(options.include, options.include_from, is_glob = True, option_txt = "include") cfg.include.extend(patterns_list) cfg.debug_include.update(patterns_textual) ## Process --rinclude and --rinclude-from patterns_list, patterns_textual = process_patterns(options.rinclude, options.rinclude_from, is_glob = False, option_txt = "rinclude") cfg.include.extend(patterns_list) cfg.debug_include.update(patterns_textual) ## Set socket read()/write() timeout socket.setdefaulttimeout(cfg.socket_timeout) if cfg.encrypt and cfg.gpg_passphrase == "": error(u"Encryption requested but no passphrase set in config file.") error(u"Please re-run 's3cmd --configure' and supply it.") sys.exit(EX_CONFIG) if options.dump_config: cfg.dump_config(sys.stdout) sys.exit(EX_OK) if options.run_configure: # 'args' may contain the test-bucket URI run_configure(options.config, args) sys.exit(EX_OK) ## set config if stop_on_error is set if options.stop_on_error: cfg.stop_on_error = options.stop_on_error if options.content_disposition: cfg.content_disposition = options.content_disposition if options.content_type: cfg.content_type = options.content_type if len(args) < 1: optparser.print_help() sys.exit(EX_USAGE) ## Unicodise all remaining arguments: args = [unicodise(arg) for arg in args] command = args.pop(0) try: debug(u"Command: %s" % commands[command]["cmd"]) ## We must do this lookup in extra step to ## avoid catching all KeyError exceptions ## from inner functions. cmd_func = commands[command]["func"] except KeyError as e: error(u"Invalid command: %s", command) sys.exit(EX_USAGE) if len(args) < commands[command]["argc"]: error(u"Not enough parameters for command '%s'" % command) sys.exit(EX_USAGE) rc = cmd_func(args) if rc is None: # if we missed any cmd_*() returns rc = EX_GENERAL return rc def report_exception(e, msg=u''): alert_header = u""" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! An unexpected error has occurred. Please try reproducing the error using the latest s3cmd code from the git master branch found at: https://github.com/s3tools/s3cmd and have a look at the known issues list: https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions-(FAQ) If the error persists, please report the %s (removing any private info as necessary) to: s3tools-bugs@lists.sourceforge.net%s !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! """ sys.stderr.write(alert_header % (u"following lines", u"\n\n" + msg)) tb = traceback.format_exc() try: s = u' '.join([unicodise(a) for a in sys.argv]) except NameError: # Error happened before Utils module was yet imported to provide # unicodise try: s = u' '.join([(a) for a in sys.argv]) except UnicodeDecodeError: s = u'[encoding safe] ' + u' '.join([('%r'%a) for a in sys.argv]) sys.stderr.write(u"Invoked as: %s\n" % s) e_class = str(e.__class__) e_class = e_class[e_class.rfind(".")+1 : -2] try: sys.stderr.write(u"Problem: %s: %s\n" % (e_class, e)) except UnicodeDecodeError: sys.stderr.write(u"Problem: [encoding safe] %r: %r\n" % (e_class, e)) try: sys.stderr.write(u"S3cmd: %s\n" % PkgInfo.version) except NameError: sys.stderr.write(u"S3cmd: unknown version." "Module import problem?\n") sys.stderr.write(u"python: %s\n" % sys.version) try: sys.stderr.write(u"environment LANG=%s\n" % unicodise_s(os.getenv("LANG", "NOTSET"), 'ascii')) except NameError: # Error happened before Utils module was yet imported to provide # unicodise sys.stderr.write(u"environment LANG=%s\n" % os.getenv("LANG", "NOTSET")) sys.stderr.write(u"\n") if type(tb) == unicode: sys.stderr.write(tb) else: sys.stderr.write(unicode(tb, errors="replace")) if type(e) == ImportError: sys.stderr.write("\n") sys.stderr.write("Your sys.path contains these entries:\n") for path in sys.path: sys.stderr.write(u"\t%s\n" % path) sys.stderr.write("Now the question is where have the s3cmd modules" " been installed?\n") sys.stderr.write(alert_header % (u"above lines", u"")) if __name__ == '__main__': try: ## Our modules ## Keep them in try/except block to ## detect any syntax errors in there from S3.ExitCodes import * from S3.Exceptions import * from S3 import PkgInfo from S3.S3 import S3 from S3.Config import Config from S3.SortedDict import SortedDict from S3.FileDict import FileDict from S3.S3Uri import S3Uri from S3 import Utils from S3.BaseUtils import (formatDateTime, getPrettyFromXml, encode_to_s3, decode_from_s3, s3path) from S3.Crypto import hash_file_md5, sign_string_v2, sign_url_v2 from S3.Utils import (formatSize, unicodise_safe, unicodise_s, unicodise, deunicodise, replace_nonprintables) from S3.Progress import Progress, StatsInfo from S3.CloudFront import Cmd as CfCmd from S3.CloudFront import CloudFront from S3.FileLists import * from S3.MultiPart import MultiPartUpload except Exception as e: report_exception(e, "Error loading some components of s3cmd (Import Error)") # 1 = EX_GENERAL but be safe in that situation sys.exit(1) try: rc = main() sys.exit(rc) except ImportError as e: report_exception(e) sys.exit(EX_GENERAL) except (ParameterError, InvalidFileError) as e: error(u"Parameter problem: %s" % e) sys.exit(EX_USAGE) except (S3DownloadError, S3UploadError, S3RequestError) as e: error(u"S3 Temporary Error: %s. Please try again later." % e) sys.exit(EX_TEMPFAIL) except S3Error as e: error(u"S3 error: %s" % e) sys.exit(e.get_error_code()) except (S3Exception, S3ResponseError, CloudFrontError) as e: report_exception(e) sys.exit(EX_SOFTWARE) except SystemExit as e: sys.exit(e.code) except KeyboardInterrupt: sys.stderr.write("See ya!\n") sys.exit(EX_BREAK) except (S3SSLError, S3SSLCertificateError) as e: # SSLError is a subtype of IOError error("SSL certificate verification failure: %s" % e) sys.exit(EX_ACCESSDENIED) except ConnectionRefusedError as e: error("Could not connect to server: %s" % e) sys.exit(EX_CONNECTIONREFUSED) # typically encountered error is: # ERROR: [Errno 111] Connection refused except socket.gaierror as e: # gaierror is a subset of IOError # typically encountered error is: # gaierror: [Errno -2] Name or service not known error(e) error("Connection Error: Error resolving a server hostname.\n" "Please check the servers address specified in 'host_base', 'host_bucket', 'cloudfront_host', 'website_endpoint'") sys.exit(EX_IOERR) except IOError as e: if e.errno in (errno.ECONNREFUSED, errno.EHOSTUNREACH): # Python2 does not have ConnectionRefusedError error("Could not connect to server: %s" % e) sys.exit(EX_CONNECTIONREFUSED) if e.errno == errno.EPIPE: # Fail silently on SIGPIPE. This likely means we wrote to a closed # pipe and user does not care for any more output. sys.exit(EX_IOERR) report_exception(e) sys.exit(EX_IOERR) except OSError as e: error(e) sys.exit(EX_OSERR) except MemoryError: msg = """ MemoryError! You have exceeded the amount of memory available for this process. This usually occurs when syncing >750,000 files on a 32-bit python instance. The solutions to this are: 1) sync several smaller subtrees; or 2) use a 64-bit python on a 64-bit OS with >8GB RAM """ sys.stderr.write(msg) sys.exit(EX_OSERR) except UnicodeEncodeError as e: lang = unicodise_s(os.getenv("LANG", "NOTSET"), 'ascii') msg = """ You have encountered a UnicodeEncodeError. Your environment variable LANG=%s may not specify a Unicode encoding (e.g. UTF-8). Please set LANG=en_US.UTF-8 or similar in your environment before invoking s3cmd. """ % lang report_exception(e, msg) sys.exit(EX_GENERAL) except Exception as e: report_exception(e) sys.exit(EX_GENERAL) # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/MANIFEST.in0000664000175100017510000000007214534034713013655 0ustar floflo00000000000000include INSTALL.md README.md LICENSE NEWS include s3cmd.1 s3cmd-2.4.0/LICENSE0000664000175100017510000003556414534034713013142 0ustar floflo00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS s3cmd-2.4.0/setup.cfg0000664000175100017510000000014014535744737013753 0ustar floflo00000000000000[sdist] formats = gztar,zip [bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 s3cmd-2.4.0/INSTALL.md0000664000175100017510000000625514534034713013560 0ustar floflo00000000000000Installation of s3cmd package ============================= Copyright: TGRMN Software and contributors S3tools / S3cmd project homepage: http://s3tools.org !!! !!! Please consult README file for setup, usage and examples! !!! Package formats --------------- S3cmd is distributed in two formats: 1) Prebuilt RPM file - should work on most RPM-based distributions 2) Source .tar.gz package Installation of Brew package --------------------------- ``` brew install s3cmd ``` Installation of RPM package --------------------------- As user "root" run: ``` rpm -ivh s3cmd-X.Y.Z.noarch.rpm ``` where X.Y.Z is the most recent s3cmd release version. You may be informed about missing dependencies on Python or some libraries. Please consult your distribution documentation on ways to solve the problem. Installation from PyPA (Python Package Authority) --------------------- S3cmd can be installed from the PyPA using PIP (the recommended tool for PyPA). 1) Confirm you have PIP installed. PIP home page is here: https://pypi.python.org/pypi/pip. Example install on a RHEL yum based machine ``` sudo yum install python-pip ``` 2) Install with pip ``` sudo pip install s3cmd ``` Installation from zip file -------------------------- There are three options to run s3cmd from source tarball: 1) The S3cmd program, as distributed in s3cmd-X.Y.Z.tar.gz on SourceForge or in master.zip on GitHub, can be run directly from where you unzipped the package. 2) Or you may want to move "s3cmd" file and "S3" subdirectory to some other path. Make sure that "S3" subdirectory ends up in the same place where you move the "s3cmd" file. For instance if you decide to move s3cmd to you $HOME/bin you will have $HOME/bin/s3cmd file and $HOME/bin/S3 directory with a number of support files. 3) The cleanest and most recommended approach is to unzip the package and then just run: `python setup.py install` You will however need Python "distutils" module for this to work. It is often part of the core python package (e.g. in OpenSuse Python 2.5 package) or it can be installed using your package manager, e.g. in Debian use `apt-get install python-setuptools` Again, consult your distribution documentation on how to find out the actual package name and how to install it then. Note that on Linux, if you are not "root" already, you may need to run: `sudo python setup.py install` instead. Note to distributions package maintainers ---------------------------------------- Define shell environment variable S3CMD_PACKAGING=yes if you don't want setup.py to install manpages and doc files. You'll have to install them manually in your .spec or similar package build scripts. On the other hand if you want setup.py to install manpages and docs, but to other than default path, define env variables $S3CMD_INSTPATH_MAN and $S3CMD_INSTPATH_DOC. Check out setup.py for details and default values. Where to get help ----------------- If in doubt, or if something doesn't work as expected, get back to us via mailing list: ``` s3tools-general@lists.sourceforge.net ``` or visit the S3cmd / S3tools homepage at: [http://s3tools.org](http://s3tools.org) s3cmd-2.4.0/README.md0000664000175100017510000003611714534034713013407 0ustar floflo00000000000000## S3cmd tool for Amazon Simple Storage Service (S3) [![Build Status](https://github.com/s3tools/s3cmd/actions/workflows/test.yml/badge.svg)](https://github.com/s3tools/s3cmd/actions/workflows/test.yml) * Authors: Michal Ludvig (michal@logix.cz), Florent Viard (florent@sodria.com) * [Project homepage](https://s3tools.org) * (c) [TGRMN Software](http://www.tgrmn.com), [Sodria SAS](http://www.sodria.com) and contributors S3tools / S3cmd mailing lists: * Announcements of new releases: s3tools-announce@lists.sourceforge.net * General questions and discussion: s3tools-general@lists.sourceforge.net * Bug reports: s3tools-bugs@lists.sourceforge.net S3cmd requires Python 2.6 or newer. Python 3+ is also supported starting with S3cmd version 2. See [installation instructions](https://github.com/s3tools/s3cmd/blob/master/INSTALL.md). ### What is S3cmd S3cmd (`s3cmd`) is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects. It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc. S3cmd is written in Python. It's an open source project available under GNU Public License v2 (GPLv2) and is free for both commercial and private use. You will only have to pay Amazon for using their storage. Lots of features and options have been added to S3cmd, since its very first release in 2008.... we recently counted more than 60 command line options, including multipart uploads, encryption, incremental backup, s3 sync, ACL and Metadata management, S3 bucket size, bucket policies, and more! ### What is Amazon S3 Amazon S3 provides a managed internet-accessible storage service where anyone can store any amount of data and retrieve it later again. S3 is a paid service operated by Amazon. Before storing anything into S3 you must sign up for an "AWS" account (where AWS = Amazon Web Services) to obtain a pair of identifiers: Access Key and Secret Key. You will need to give these keys to S3cmd. Think of them as if they were a username and password for your S3 account. ### Amazon S3 pricing explained At the time of this writing the costs of using S3 are (in USD): $0.023 per GB per month of storage space used plus $0.00 per GB - all data uploaded plus $0.000 per GB - first 1GB / month data downloaded $0.090 per GB - up to 10 TB / month data downloaded $0.085 per GB - next 40 TB / month data downloaded $0.070 per GB - next 100 TB / month data downloaded $0.050 per GB - data downloaded / month over 150 TB plus $0.005 per 1,000 PUT or COPY or LIST requests $0.004 per 10,000 GET and all other requests If for instance on 1st of January you upload 2GB of photos in JPEG from your holiday in New Zealand, at the end of January you will be charged $0.05 for using 2GB of storage space for a month, $0.0 for uploading 2GB of data, and a few cents for requests. That comes to slightly over $0.06 for a complete backup of your precious holiday pictures. In February you don't touch it. Your data are still on S3 servers so you pay $0.06 for those two gigabytes, but not a single cent will be charged for any transfer. That comes to $0.05 as an ongoing cost of your backup. Not too bad. In March you allow anonymous read access to some of your pictures and your friends download, say, 1500MB of them. As the files are owned by you, you are responsible for the costs incurred. That means at the end of March you'll be charged $0.05 for storage plus $0.045 for the download traffic generated by your friends. There is no minimum monthly contract or a setup fee. What you use is what you pay for. At the beginning my bill used to be like US$0.03 or even nil. That's the pricing model of Amazon S3 in a nutshell. Check the [Amazon S3 homepage](https://aws.amazon.com/s3/pricing/) for more details. Needless to say that all these money are charged by Amazon itself, there is obviously no payment for using S3cmd :-) ### Amazon S3 basics Files stored in S3 are called "objects" and their names are officially called "keys". Since this is sometimes confusing for the users we often refer to the objects as "files" or "remote files". Each object belongs to exactly one "bucket". To describe objects in S3 storage we invented a URI-like schema in the following form: ``` s3://BUCKET ``` or ``` s3://BUCKET/OBJECT ``` ### Buckets Buckets are sort of like directories or folders with some restrictions: 1. each user can only have 100 buckets at the most, 2. bucket names must be unique amongst all users of S3, 3. buckets can not be nested into a deeper hierarchy and 4. a name of a bucket can only consist of basic alphanumeric characters plus dot (.) and dash (-). No spaces, no accented or UTF-8 letters, etc. It is a good idea to use DNS-compatible bucket names. That for instance means you should not use upper case characters. While DNS compliance is not strictly required some features described below are not available for DNS-incompatible named buckets. One more step further is using a fully qualified domain name (FQDN) for a bucket - that has even more benefits. * For example "s3://--My-Bucket--" is not DNS compatible. * On the other hand "s3://my-bucket" is DNS compatible but is not FQDN. * Finally "s3://my-bucket.s3tools.org" is DNS compatible and FQDN provided you own the s3tools.org domain and can create the domain record for "my-bucket.s3tools.org". Look for "Virtual Hosts" later in this text for more details regarding FQDN named buckets. ### Objects (files stored in Amazon S3) Unlike for buckets there are almost no restrictions on object names. These can be any UTF-8 strings of up to 1024 bytes long. Interestingly enough the object name can contain forward slash character (/) thus a `my/funny/picture.jpg` is a valid object name. Note that there are not directories nor buckets called `my` and `funny` - it is really a single object name called `my/funny/picture.jpg` and S3 does not care at all that it _looks_ like a directory structure. The full URI of such an image could be, for example: ``` s3://my-bucket/my/funny/picture.jpg ``` ### Public vs Private files The files stored in S3 can be either Private or Public. The Private ones are readable only by the user who uploaded them while the Public ones can be read by anyone. Additionally the Public files can be accessed using HTTP protocol, not only using `s3cmd` or a similar tool. The ACL (Access Control List) of a file can be set at the time of upload using `--acl-public` or `--acl-private` options with `s3cmd put` or `s3cmd sync` commands (see below). Alternatively the ACL can be altered for existing remote files with `s3cmd setacl --acl-public` (or `--acl-private`) command. ### Simple s3cmd HowTo 1) Register for Amazon AWS / S3 Go to https://aws.amazon.com/s3, click the "Sign up for web service" button in the right column and work through the registration. You will have to supply your Credit Card details in order to allow Amazon charge you for S3 usage. At the end you should have your Access and Secret Keys. If you set up a separate IAM user, that user's access key must have at least the following permissions to do anything: - s3:ListAllMyBuckets - s3:GetBucketLocation - s3:ListBucket Other example policies can be found at https://docs.aws.amazon.com/AmazonS3/latest/dev/example-policies-s3.html 2) Run `s3cmd --configure` You will be asked for the two keys - copy and paste them from your confirmation email or from your Amazon account page. Be careful when copying them! They are case sensitive and must be entered accurately or you'll keep getting errors about invalid signatures or similar. Remember to add s3:ListAllMyBuckets permissions to the keys or you will get an AccessDenied error while testing access. 3) Run `s3cmd ls` to list all your buckets. As you just started using S3 there are no buckets owned by you as of now. So the output will be empty. 4) Make a bucket with `s3cmd mb s3://my-new-bucket-name` As mentioned above the bucket names must be unique amongst _all_ users of S3. That means the simple names like "test" or "asdf" are already taken and you must make up something more original. To demonstrate as many features as possible let's create a FQDN-named bucket `s3://public.s3tools.org`: ``` $ s3cmd mb s3://public.s3tools.org Bucket 's3://public.s3tools.org' created ``` 5) List your buckets again with `s3cmd ls` Now you should see your freshly created bucket: ``` $ s3cmd ls 2009-01-28 12:34 s3://public.s3tools.org ``` 6) List the contents of the bucket: ``` $ s3cmd ls s3://public.s3tools.org $ ``` It's empty, indeed. 7) Upload a single file into the bucket: ``` $ s3cmd put some-file.xml s3://public.s3tools.org/somefile.xml some-file.xml -> s3://public.s3tools.org/somefile.xml [1 of 1] 123456 of 123456 100% in 2s 51.75 kB/s done ``` Upload a two-directory tree into the bucket's virtual 'directory': ``` $ s3cmd put --recursive dir1 dir2 s3://public.s3tools.org/somewhere/ File 'dir1/file1-1.txt' stored as 's3://public.s3tools.org/somewhere/dir1/file1-1.txt' [1 of 5] File 'dir1/file1-2.txt' stored as 's3://public.s3tools.org/somewhere/dir1/file1-2.txt' [2 of 5] File 'dir1/file1-3.log' stored as 's3://public.s3tools.org/somewhere/dir1/file1-3.log' [3 of 5] File 'dir2/file2-1.bin' stored as 's3://public.s3tools.org/somewhere/dir2/file2-1.bin' [4 of 5] File 'dir2/file2-2.txt' stored as 's3://public.s3tools.org/somewhere/dir2/file2-2.txt' [5 of 5] ``` As you can see we didn't have to create the `/somewhere` 'directory'. In fact it's only a filename prefix, not a real directory and it doesn't have to be created in any way beforehand. Instead of using `put` with the `--recursive` option, you could also use the `sync` command: ``` $ s3cmd sync dir1 dir2 s3://public.s3tools.org/somewhere/ ``` 8) Now list the bucket's contents again: ``` $ s3cmd ls s3://public.s3tools.org DIR s3://public.s3tools.org/somewhere/ 2009-02-10 05:10 123456 s3://public.s3tools.org/somefile.xml ``` Use --recursive (or -r) to list all the remote files: ``` $ s3cmd ls --recursive s3://public.s3tools.org 2009-02-10 05:10 123456 s3://public.s3tools.org/somefile.xml 2009-02-10 05:13 18 s3://public.s3tools.org/somewhere/dir1/file1-1.txt 2009-02-10 05:13 8 s3://public.s3tools.org/somewhere/dir1/file1-2.txt 2009-02-10 05:13 16 s3://public.s3tools.org/somewhere/dir1/file1-3.log 2009-02-10 05:13 11 s3://public.s3tools.org/somewhere/dir2/file2-1.bin 2009-02-10 05:13 8 s3://public.s3tools.org/somewhere/dir2/file2-2.txt ``` 9) Retrieve one of the files back and verify that it hasn't been corrupted: ``` $ s3cmd get s3://public.s3tools.org/somefile.xml some-file-2.xml s3://public.s3tools.org/somefile.xml -> some-file-2.xml [1 of 1] 123456 of 123456 100% in 3s 35.75 kB/s done ``` ``` $ md5sum some-file.xml some-file-2.xml 39bcb6992e461b269b95b3bda303addf some-file.xml 39bcb6992e461b269b95b3bda303addf some-file-2.xml ``` Checksums of the original file matches the one of the retrieved ones. Looks like it worked :-) To retrieve a whole 'directory tree' from S3 use recursive get: ``` $ s3cmd get --recursive s3://public.s3tools.org/somewhere File s3://public.s3tools.org/somewhere/dir1/file1-1.txt saved as './somewhere/dir1/file1-1.txt' File s3://public.s3tools.org/somewhere/dir1/file1-2.txt saved as './somewhere/dir1/file1-2.txt' File s3://public.s3tools.org/somewhere/dir1/file1-3.log saved as './somewhere/dir1/file1-3.log' File s3://public.s3tools.org/somewhere/dir2/file2-1.bin saved as './somewhere/dir2/file2-1.bin' File s3://public.s3tools.org/somewhere/dir2/file2-2.txt saved as './somewhere/dir2/file2-2.txt' ``` Since the destination directory wasn't specified, `s3cmd` saved the directory structure in a current working directory ('.'). There is an important difference between: ``` get s3://public.s3tools.org/somewhere ``` and ``` get s3://public.s3tools.org/somewhere/ ``` (note the trailing slash) `s3cmd` always uses the last path part, ie the word after the last slash, for naming files. In the case of `s3://.../somewhere` the last path part is 'somewhere' and therefore the recursive get names the local files as somewhere/dir1, somewhere/dir2, etc. On the other hand in `s3://.../somewhere/` the last path part is empty and s3cmd will only create 'dir1' and 'dir2' without the 'somewhere/' prefix: ``` $ s3cmd get --recursive s3://public.s3tools.org/somewhere/ ~/ File s3://public.s3tools.org/somewhere/dir1/file1-1.txt saved as '~/dir1/file1-1.txt' File s3://public.s3tools.org/somewhere/dir1/file1-2.txt saved as '~/dir1/file1-2.txt' File s3://public.s3tools.org/somewhere/dir1/file1-3.log saved as '~/dir1/file1-3.log' File s3://public.s3tools.org/somewhere/dir2/file2-1.bin saved as '~/dir2/file2-1.bin' ``` See? It's `~/dir1` and not `~/somewhere/dir1` as it was in the previous example. 10) Clean up - delete the remote files and remove the bucket: Remove everything under s3://public.s3tools.org/somewhere/ ``` $ s3cmd del --recursive s3://public.s3tools.org/somewhere/ File s3://public.s3tools.org/somewhere/dir1/file1-1.txt deleted File s3://public.s3tools.org/somewhere/dir1/file1-2.txt deleted ... ``` Now try to remove the bucket: ``` $ s3cmd rb s3://public.s3tools.org ERROR: S3 error: 409 (BucketNotEmpty): The bucket you tried to delete is not empty ``` Ouch, we forgot about `s3://public.s3tools.org/somefile.xml`. We can force the bucket removal anyway: ``` $ s3cmd rb --force s3://public.s3tools.org/ WARNING: Bucket is not empty. Removing all the objects from it first. This may take some time... File s3://public.s3tools.org/somefile.xml deleted Bucket 's3://public.s3tools.org/' removed ``` ### Hints The basic usage is as simple as described in the previous section. You can increase the level of verbosity with `-v` option and if you're really keen to know what the program does under its bonnet run it with `-d` to see all 'debugging' output. After configuring it with `--configure` all available options are spitted into your `~/.s3cfg` file. It's a text file ready to be modified in your favourite text editor. The Transfer commands (put, get, cp, mv, and sync) continue transferring even if an object fails. If a failure occurs the failure is output to stderr and the exit status will be EX_PARTIAL (2). If the option `--stop-on-error` is specified, or the config option stop_on_error is true, the transfers stop and an appropriate error code is returned. For more information refer to the [S3cmd / S3tools homepage](https://s3tools.org). ### License Copyright (C) 2007-2023 TGRMN Software (https://www.tgrmn.com), Sodria SAS (https://www.sodria.com/) and contributors This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. s3cmd-2.4.0/s3cmd.egg-info/0000775000175100017510000000000014535744737014642 5ustar floflo00000000000000s3cmd-2.4.0/s3cmd.egg-info/top_level.txt0000664000175100017510000000000314535744737017365 0ustar floflo00000000000000S3 s3cmd-2.4.0/s3cmd.egg-info/requires.txt0000664000175100017510000000003514535744737017240 0ustar floflo00000000000000python-dateutil python-magic s3cmd-2.4.0/s3cmd.egg-info/SOURCES.txt0000664000175100017510000000107414535744737016530 0ustar floflo00000000000000INSTALL.md LICENSE MANIFEST.in NEWS README.md s3cmd s3cmd.1 setup.cfg setup.py S3/ACL.py S3/AccessLog.py S3/BaseUtils.py S3/BidirMap.py S3/CloudFront.py S3/Config.py S3/ConnMan.py S3/Crypto.py S3/Custom_httplib27.py S3/Custom_httplib3x.py S3/Exceptions.py S3/ExitCodes.py S3/FileDict.py S3/FileLists.py S3/HashCache.py S3/MultiPart.py S3/PkgInfo.py S3/Progress.py S3/S3.py S3/S3Uri.py S3/SortedDict.py S3/Utils.py S3/__init__.py s3cmd.egg-info/PKG-INFO s3cmd.egg-info/SOURCES.txt s3cmd.egg-info/dependency_links.txt s3cmd.egg-info/requires.txt s3cmd.egg-info/top_level.txts3cmd-2.4.0/s3cmd.egg-info/dependency_links.txt0000664000175100017510000000000114535744737020710 0ustar floflo00000000000000 s3cmd-2.4.0/s3cmd.egg-info/PKG-INFO0000664000175100017510000000425514535744737015745 0ustar floflo00000000000000Metadata-Version: 1.2 Name: s3cmd Version: 2.4.0 Summary: Command line tool for managing Amazon S3 and CloudFront services Home-page: http://s3tools.org Author: Michal Ludvig Author-email: michal@logix.cz Maintainer: github.com/fviard, github.com/matteobar Maintainer-email: s3tools-bugs@lists.sourceforge.net License: GNU GPL v2+ Description: S3cmd lets you copy files from/to Amazon S3 (Simple Storage Service) using a simple to use command line client. Supports rsync-like backup, GPG encryption, and more. Also supports management of Amazon's CloudFront content delivery network. Authors: -------- Florent Viard Michal Ludvig Matt Domsch (github.com/mdomsch) Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Environment :: MacOS X Classifier: Environment :: Win32 (MS Windows) Classifier: Intended Audience :: End Users/Desktop Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+) Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: Unix Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: System :: Archiving Classifier: Topic :: Utilities s3cmd-2.4.0/s3cmd.10000664000175100017510000005552514535731306013232 0ustar floflo00000000000000 .\" !!! IMPORTANT: This file is generated from s3cmd \-\-help output using format-manpage.pl .\" !!! Do your changes either in s3cmd file or in 'format\-manpage.pl' otherwise .\" !!! they will be overwritten! .TH s3cmd 1 .SH NAME s3cmd \- tool for managing Amazon S3 storage space and Amazon CloudFront content delivery network .SH SYNOPSIS .B s3cmd [\fIOPTIONS\fR] \fICOMMAND\fR [\fIPARAMETERS\fR] .SH DESCRIPTION .PP .B s3cmd is a command line client for copying files to/from Amazon S3 (Simple Storage Service) and performing other related tasks, for instance creating and removing buckets, listing objects, etc. .SH COMMANDS .PP .B s3cmd can do several \fIactions\fR specified by the following \fIcommands\fR. .TP s3cmd \fBmb\fR \fIs3://BUCKET\fR Make bucket .TP s3cmd \fBrb\fR \fIs3://BUCKET\fR Remove bucket .TP s3cmd \fBls\fR \fI[s3://BUCKET[/PREFIX]]\fR List objects or buckets .TP s3cmd \fBla\fR \fI\fR List all object in all buckets .TP s3cmd \fBput\fR \fIFILE [FILE...] s3://BUCKET[/PREFIX]\fR Put file into bucket .TP s3cmd \fBget\fR \fIs3://BUCKET/OBJECT LOCAL_FILE\fR Get file from bucket .TP s3cmd \fBdel\fR \fIs3://BUCKET/OBJECT\fR Delete file from bucket .TP s3cmd \fBrm\fR \fIs3://BUCKET/OBJECT\fR Delete file from bucket (alias for del) .TP s3cmd \fBrestore\fR \fIs3://BUCKET/OBJECT\fR Restore file from Glacier storage .TP s3cmd \fBsync\fR \fILOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]\fR Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below) .TP s3cmd \fBdu\fR \fI[s3://BUCKET[/PREFIX]]\fR Disk usage by buckets .TP s3cmd \fBinfo\fR \fIs3://BUCKET[/OBJECT]\fR Get various information about Buckets or Files .TP s3cmd \fBcp\fR \fIs3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]\fR Copy object .TP s3cmd \fBmodify\fR \fIs3://BUCKET1/OBJECT\fR Modify object metadata .TP s3cmd \fBmv\fR \fIs3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]\fR Move object .TP s3cmd \fBsetacl\fR \fIs3://BUCKET[/OBJECT]\fR Modify Access control list for Bucket or Files .TP s3cmd \fBsetversioning\fR \fIs3://BUCKET enable|disable\fR Modify Bucket Versioning .TP s3cmd \fBsetownership\fR \fIs3://BUCKET BucketOwnerPreferred|BucketOwnerEnforced|ObjectWriter\fR Modify Bucket Object Ownership .TP s3cmd \fBsetblockpublicaccess\fR \fIs3://BUCKET BlockPublicAcls,IgnorePublicAcls,BlockPublicPolicy,RestrictPublicBuckets\fR Modify Block Public Access rules .TP s3cmd \fBsetobjectlegalhold\fR \fISTATUS s3://BUCKET/OBJECT\fR Modify Object Legal Hold .TP s3cmd \fBsetobjectretention\fR \fIMODE RETAIN_UNTIL_DATE s3://BUCKET/OBJECT\fR Modify Object Retention .TP s3cmd \fBsetpolicy\fR \fIFILE s3://BUCKET\fR Modify Bucket Policy .TP s3cmd \fBdelpolicy\fR \fIs3://BUCKET\fR Delete Bucket Policy .TP s3cmd \fBsetcors\fR \fIFILE s3://BUCKET\fR Modify Bucket CORS .TP s3cmd \fBdelcors\fR \fIs3://BUCKET\fR Delete Bucket CORS .TP s3cmd \fBpayer\fR \fIs3://BUCKET\fR Modify Bucket Requester Pays policy .TP s3cmd \fBmultipart\fR \fIs3://BUCKET [Id]\fR Show multipart uploads .TP s3cmd \fBabortmp\fR \fIs3://BUCKET/OBJECT Id\fR Abort a multipart upload .TP s3cmd \fBlistmp\fR \fIs3://BUCKET/OBJECT Id\fR List parts of a multipart upload .TP s3cmd \fBaccesslog\fR \fIs3://BUCKET\fR Enable/disable bucket access logging .TP s3cmd \fBsign\fR \fISTRING\-TO\-SIGN\fR Sign arbitrary string using the secret key .TP s3cmd \fBsignurl\fR \fIs3://BUCKET/OBJECT \fR Sign an S3 URL to provide limited public access with expiry .TP s3cmd \fBfixbucket\fR \fIs3://BUCKET[/PREFIX]\fR Fix invalid file names in a bucket .TP s3cmd \fBsettagging\fR \fIs3://BUCKET[/OBJECT] "KEY=VALUE[&KEY=VALUE ...]"\fR Modify tagging for Bucket or Files .TP s3cmd \fBgettagging\fR \fIs3://BUCKET[/OBJECT]\fR Get tagging for Bucket or Files .TP s3cmd \fBdeltagging\fR \fIs3://BUCKET[/OBJECT]\fR Delete tagging for Bucket or Files .TP s3cmd \fBexpire\fR \fIs3://BUCKET\fR Set or delete expiration rule for the bucket .TP s3cmd \fBsetlifecycle\fR \fIFILE s3://BUCKET\fR Upload a lifecycle policy for the bucket .TP s3cmd \fBgetlifecycle\fR \fIs3://BUCKET\fR Get a lifecycle policy for the bucket .TP s3cmd \fBdellifecycle\fR \fIs3://BUCKET\fR Remove a lifecycle policy for the bucket .TP s3cmd \fBsetnotification\fR \fIFILE s3://BUCKET\fR Upload a notification policy for the bucket .TP s3cmd \fBgetnotification\fR \fIs3://BUCKET\fR Get a notification policy for the bucket .TP s3cmd \fBdelnotification\fR \fIs3://BUCKET\fR Remove a notification policy for the bucket .PP Commands for static WebSites configuration .TP s3cmd \fBws\-create\fR \fIs3://BUCKET\fR Create Website from bucket .TP s3cmd \fBws\-delete\fR \fIs3://BUCKET\fR Delete Website .TP s3cmd \fBws\-info\fR \fIs3://BUCKET\fR Info about Website .PP Commands for CloudFront management .TP s3cmd \fBcflist\fR \fI\fR List CloudFront distribution points .TP s3cmd \fBcfinfo\fR \fI[cf://DIST_ID]\fR Display CloudFront distribution point parameters .TP s3cmd \fBcfcreate\fR \fIs3://BUCKET\fR Create CloudFront distribution point .TP s3cmd \fBcfdelete\fR \fIcf://DIST_ID\fR Delete CloudFront distribution point .TP s3cmd \fBcfmodify\fR \fIcf://DIST_ID\fR Change CloudFront distribution point parameters .TP s3cmd \fBcfinval\fR \fIs3://BUCKET/OBJECT [s3://BUCKET/OBJECT ...]\fR Invalidate CloudFront objects .TP s3cmd \fBcfinvalinfo\fR \fIcf://DIST_ID[/INVAL_ID]\fR Display CloudFront invalidation request(s) status .SH OPTIONS .PP Some of the below specified options can have their default values set in .B s3cmd config file (by default $HOME/.s3cmd). As it's a simple text file feel free to open it with your favorite text editor and do any changes you like. .TP \fB\-h\fR, \fB\-\-help\fR show this help message and exit .TP \fB\-\-configure\fR Invoke interactive (re)configuration tool. Optionally use as '\fB\-\-configure\fR s3://some\-bucket' to test access to a specific bucket instead of attempting to list them all. .TP \fB\-c\fR FILE, \fB\-\-config\fR=FILE Config file name. Defaults to $HOME/.s3cfg .TP \fB\-\-dump\-config\fR Dump current configuration after parsing config files and command line options and exit. .TP \fB\-\-access_key\fR=ACCESS_KEY AWS Access Key .TP \fB\-\-secret_key\fR=SECRET_KEY AWS Secret Key .TP \fB\-\-access_token\fR=ACCESS_TOKEN AWS Access Token .TP \fB\-n\fR, \fB\-\-dry\-run\fR Only show what should be uploaded or downloaded but don't actually do it. May still perform S3 requests to get bucket listings and other information though (only for file transfer commands) .TP \fB\-s\fR, \fB\-\-ssl\fR Use HTTPS connection when communicating with S3. (default) .TP \fB\-\-no\-ssl\fR Don't use HTTPS. .TP \fB\-e\fR, \fB\-\-encrypt\fR Encrypt files before uploading to S3. .TP \fB\-\-no\-encrypt\fR Don't encrypt files. .TP \fB\-f\fR, \fB\-\-force\fR Force overwrite and other dangerous operations. .TP \fB\-\-continue\fR Continue getting a partially downloaded file (only for [get] command). .TP \fB\-\-continue\-put\fR Continue uploading partially uploaded files or multipart upload parts. Restarts parts/files that don't have matching size and md5. Skips files/parts that do. Note: md5sum checks are not always sufficient to check (part) file equality. Enable this at your own risk. .TP \fB\-\-upload\-id\fR=UPLOAD_ID UploadId for Multipart Upload, in case you want continue an existing upload (equivalent to \fB\-\-continue\-\fR put) and there are multiple partial uploads. Use s3cmd multipart [URI] to see what UploadIds are associated with the given URI. .TP \fB\-\-skip\-existing\fR Skip over files that exist at the destination (only for [get] and [sync] commands). .TP \fB\-r\fR, \fB\-\-recursive\fR Recursive upload, download or removal. .TP \fB\-\-check\-md5\fR Check MD5 sums when comparing files for [sync]. (default) .TP \fB\-\-no\-check\-md5\fR Do not check MD5 sums when comparing files for [sync]. Only size will be compared. May significantly speed up transfer but may also miss some changed files. .TP \fB\-P\fR, \fB\-\-acl\-public\fR Store objects with ACL allowing read for anyone. .TP \fB\-\-acl\-private\fR Store objects with default ACL allowing access for you only. .TP \fB\-\-acl\-grant\fR=PERMISSION:EMAIL or USER_CANONICAL_ID Grant stated permission to a given amazon user. Permission is one of: read, write, read_acp, write_acp, full_control, all .TP \fB\-\-acl\-revoke\fR=PERMISSION:USER_CANONICAL_ID Revoke stated permission for a given amazon user. Permission is one of: read, write, read_acp, write_acp, full_control, all .TP \fB\-D\fR NUM, \fB\-\-restore\-days\fR=NUM Number of days to keep restored file available (only for 'restore' command). Default is 1 day. .TP \fB\-\-restore\-priority\fR=RESTORE_PRIORITY Priority for restoring files from S3 Glacier (only for 'restore' command). Choices available: bulk, standard, expedited .TP \fB\-\-delete\-removed\fR Delete destination objects with no corresponding source file [sync] .TP \fB\-\-no\-delete\-removed\fR Don't delete destination objects [sync] .TP \fB\-\-delete\-after\fR Perform deletes AFTER new uploads when delete-removed is enabled [sync] .TP \fB\-\-delay\-updates\fR *OBSOLETE* Put all updated files into place at end [sync] .TP \fB\-\-max\-delete\fR=NUM Do not delete more than NUM files. [del] and [sync] .TP \fB\-\-limit\fR=NUM Limit number of objects returned in the response body (only for [ls] and [la] commands) .TP \fB\-\-add\-destination\fR=ADDITIONAL_DESTINATIONS Additional destination for parallel uploads, in addition to last arg. May be repeated. .TP \fB\-\-delete\-after\-fetch\fR Delete remote objects after fetching to local file (only for [get] and [sync] commands). .TP \fB\-p\fR, \fB\-\-preserve\fR Preserve filesystem attributes (mode, ownership, timestamps). Default for [sync] command. .TP \fB\-\-no\-preserve\fR Don't store FS attributes .TP \fB\-\-keep\-dirs\fR Preserve all local directories as remote objects including empty directories. Experimental feature. .TP \fB\-\-exclude\fR=GLOB Filenames and paths matching GLOB will be excluded from sync .TP \fB\-\-exclude\-from\fR=FILE Read --exclude GLOBs from FILE .TP \fB\-\-rexclude\fR=REGEXP Filenames and paths matching REGEXP (regular expression) will be excluded from sync .TP \fB\-\-rexclude\-from\fR=FILE Read --rexclude REGEXPs from FILE .TP \fB\-\-include\fR=GLOB Filenames and paths matching GLOB will be included even if previously excluded by one of \fB\-\-(r)exclude(\-from)\fR patterns .TP \fB\-\-include\-from\fR=FILE Read --include GLOBs from FILE .TP \fB\-\-rinclude\fR=REGEXP Same as --include but uses REGEXP (regular expression) instead of GLOB .TP \fB\-\-rinclude\-from\fR=FILE Read --rinclude REGEXPs from FILE .TP \fB\-\-files\-from\fR=FILE Read list of source-file names from FILE. Use - to read from stdin. .TP \fB\-\-region\fR=REGION, \fB\-\-bucket\-location\fR=REGION Region to create bucket in. As of now the regions are: us\-east\-1, us\-west\-1, us\-west\-2, eu\-west\-1, eu\- central\-1, ap\-northeast\-1, ap\-southeast\-1, ap\- southeast\-2, sa\-east\-1 .TP \fB\-\-host\fR=HOSTNAME HOSTNAME:PORT for S3 endpoint (default: s3.amazonaws.com, alternatives such as s3\-eu\- west\-1.amazonaws.com). You should also set \fB\-\-host\-\fR bucket. .TP \fB\-\-host\-bucket\fR=HOST_BUCKET DNS\-style bucket+hostname:port template for accessing a bucket (default: %(bucket)s.s3.amazonaws.com) .TP \fB\-\-reduced\-redundancy\fR, \fB\-\-rr\fR Store object with 'Reduced redundancy'. Lower per\-GB price. [put, cp, mv] .TP \fB\-\-no\-reduced\-redundancy\fR, \fB\-\-no\-rr\fR Store object without 'Reduced redundancy'. Higher per\- GB price. [put, cp, mv] .TP \fB\-\-storage\-class\fR=CLASS Store object with specified CLASS (STANDARD, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER or DEEP_ARCHIVE). [put, cp, mv] .TP \fB\-\-access\-logging\-target\-prefix\fR=LOG_TARGET_PREFIX Target prefix for access logs (S3 URI) (for [cfmodify] and [accesslog] commands) .TP \fB\-\-no\-access\-logging\fR Disable access logging (for [cfmodify] and [accesslog] commands) .TP \fB\-\-default\-mime\-type\fR=DEFAULT_MIME_TYPE Default MIME\-type for stored objects. Application default is binary/octet\-stream. .TP \fB\-M\fR, \fB\-\-guess\-mime\-type\fR Guess MIME\-type of files by their extension or mime magic. Fall back to default MIME\-Type as specified by \fB\-\-default\-mime\-type\fR option .TP \fB\-\-no\-guess\-mime\-type\fR Don't guess MIME-type and use the default type instead. .TP \fB\-\-no\-mime\-magic\fR Don't use mime magic when guessing MIME-type. .TP \fB\-m\fR MIME/TYPE, \fB\-\-mime\-type\fR=MIME/TYPE Force MIME\-type. Override both \fB\-\-default\-mime\-type\fR and \fB\-\-guess\-mime\-type\fR. .TP \fB\-\-add\-header\fR=NAME:VALUE Add a given HTTP header to the upload request. Can be used multiple times. For instance set 'Expires' or \&'Cache\-Control' headers (or both) using this option. .TP \fB\-\-remove\-header\fR=NAME Remove a given HTTP header. Can be used multiple times. For instance, remove 'Expires' or 'Cache\- Control' headers (or both) using this option. [modify] .TP \fB\-\-server\-side\-encryption\fR Specifies that server\-side encryption will be used when putting objects. [put, sync, cp, modify] .TP \fB\-\-server\-side\-encryption\-kms\-id\fR=KMS_KEY Specifies the key id used for server\-side encryption with AWS KMS\-Managed Keys (SSE\-KMS) when putting objects. [put, sync, cp, modify] .TP \fB\-\-encoding\fR=ENCODING Override autodetected terminal and filesystem encoding (character set). Autodetected: UTF\-8 .TP \fB\-\-add\-encoding\-exts\fR=EXTENSIONs Add encoding to these comma delimited extensions i.e. (css,js,html) when uploading to S3 ) .TP \fB\-\-verbatim\fR Use the S3 name as given on the command line. No pre- processing, encoding, etc. Use with caution! .TP \fB\-\-disable\-multipart\fR Disable multipart upload on files bigger than \fB\-\-multipart\-chunk\-size\-mb\fR .TP \fB\-\-multipart\-chunk\-size\-mb\fR=SIZE Size of each chunk of a multipart upload. Files bigger than SIZE are automatically uploaded as multithreaded\- multipart, smaller files are uploaded using the traditional method. SIZE is in Mega\-Bytes, default chunk size is 15MB, minimum allowed chunk size is 5MB, maximum is 5GB. .TP \fB\-\-list\-md5\fR Include MD5 sums in bucket listings (only for 'ls' command). .TP \fB\-\-list\-allow\-unordered\fR Not an AWS standard. Allow the listing results to be returned in unsorted order. This may be faster when listing very large buckets. .TP \fB\-H\fR, \fB\-\-human\-readable\-sizes\fR Print sizes in human readable form (eg 1kB instead of 1234). .TP \fB\-\-ws\-index\fR=WEBSITE_INDEX Name of index\-document (only for [ws\-create] command) .TP \fB\-\-ws\-error\fR=WEBSITE_ERROR Name of error\-document (only for [ws\-create] command) .TP \fB\-\-expiry\-date\fR=EXPIRY_DATE Indicates when the expiration rule takes effect. (only for [expire] command) .TP \fB\-\-expiry\-days\fR=EXPIRY_DAYS Indicates the number of days after object creation the expiration rule takes effect. (only for [expire] command) .TP \fB\-\-expiry\-prefix\fR=EXPIRY_PREFIX Identifying one or more objects with the prefix to which the expiration rule applies. (only for [expire] command) .TP \fB\-\-skip\-destination\-validation\fR Skips validation of Amazon SQS, Amazon SNS, and AWS Lambda destinations when applying notification configuration. (only for [setnotification] command) .TP \fB\-\-progress\fR Display progress meter (default on TTY). .TP \fB\-\-no\-progress\fR Don't display progress meter (default on non-TTY). .TP \fB\-\-stats\fR Give some file-transfer stats. .TP \fB\-\-enable\fR Enable given CloudFront distribution (only for [cfmodify] command) .TP \fB\-\-disable\fR Disable given CloudFront distribution (only for [cfmodify] command) .TP \fB\-\-cf\-invalidate\fR Invalidate the uploaded filed in CloudFront. Also see [cfinval] command. .TP \fB\-\-cf\-invalidate\-default\-index\fR When using Custom Origin and S3 static website, invalidate the default index file. .TP \fB\-\-cf\-no\-invalidate\-default\-index\-root\fR When using Custom Origin and S3 static website, don't invalidate the path to the default index file. .TP \fB\-\-cf\-add\-cname\fR=CNAME Add given CNAME to a CloudFront distribution (only for [cfcreate] and [cfmodify] commands) .TP \fB\-\-cf\-remove\-cname\fR=CNAME Remove given CNAME from a CloudFront distribution (only for [cfmodify] command) .TP \fB\-\-cf\-comment\fR=COMMENT Set COMMENT for a given CloudFront distribution (only for [cfcreate] and [cfmodify] commands) .TP \fB\-\-cf\-default\-root\-object\fR=DEFAULT_ROOT_OBJECT Set the default root object to return when no object is specified in the URL. Use a relative path, i.e. default/index.html instead of /default/index.html or s3://bucket/default/index.html (only for [cfcreate] and [cfmodify] commands) .TP \fB\-v\fR, \fB\-\-verbose\fR Enable verbose output. .TP \fB\-d\fR, \fB\-\-debug\fR Enable debug output. .TP \fB\-\-version\fR Show s3cmd version (2.4.0) and exit. .TP \fB\-F\fR, \fB\-\-follow\-symlinks\fR Follow symbolic links as if they are regular files .TP \fB\-\-cache\-file\fR=FILE Cache FILE containing local source MD5 values .TP \fB\-q\fR, \fB\-\-quiet\fR Silence output on stdout .TP \fB\-\-ca\-certs\fR=CA_CERTS_FILE Path to SSL CA certificate FILE (instead of system default) .TP \fB\-\-ssl\-cert\fR=SSL_CLIENT_CERT_FILE Path to client own SSL certificate CRT_FILE .TP \fB\-\-ssl\-key\fR=SSL_CLIENT_KEY_FILE Path to client own SSL certificate private key KEY_FILE .TP \fB\-\-check\-certificate\fR Check SSL certificate validity .TP \fB\-\-no\-check\-certificate\fR Do not check SSL certificate validity .TP \fB\-\-check\-hostname\fR Check SSL certificate hostname validity .TP \fB\-\-no\-check\-hostname\fR Do not check SSL certificate hostname validity .TP \fB\-\-signature\-v2\fR Use AWS Signature version 2 instead of newer signature methods. Helpful for S3\-like systems that don't have AWS Signature v4 yet. .TP \fB\-\-limit\-rate\fR=LIMITRATE Limit the upload or download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix .TP \fB\-\-no\-connection\-pooling\fR Disable connection reuse .TP \fB\-\-requester\-pays\fR Set the REQUESTER PAYS flag for operations .TP \fB\-l\fR, \fB\-\-long\-listing\fR Produce long listing [ls] .TP \fB\-\-stop\-on\-error\fR stop if error in transfer .TP \fB\-\-max\-retries\fR=NUM Maximum number of times to retry a failed request before giving up. Default is 5 .TP \fB\-\-content\-disposition\fR=CONTENT_DISPOSITION Provide a Content\-Disposition for signed URLs, e.g., "inline; filename=myvideo.mp4" .TP \fB\-\-content\-type\fR=CONTENT_TYPE Provide a Content\-Type for signed URLs, e.g., "video/mp4" .SH EXAMPLES One of the most powerful commands of \fIs3cmd\fR is \fBs3cmd sync\fR used for synchronising complete directory trees to or from remote S3 storage. To some extent \fBs3cmd put\fR and \fBs3cmd get\fR share a similar behaviour with \fBsync\fR. .PP Basic usage common in backup scenarios is as simple as: .nf s3cmd sync /local/path/ s3://test\-bucket/backup/ .fi .PP This command will find all files under /local/path directory and copy them to corresponding paths under s3://test\-bucket/backup on the remote side. For example: .nf /local/path/\fBfile1.ext\fR \-> s3://bucket/backup/\fBfile1.ext\fR /local/path/\fBdir123/file2.bin\fR \-> s3://bucket/backup/\fBdir123/file2.bin\fR .fi .PP However if the local path doesn't end with a slash the last directory's name is used on the remote side as well. Compare these with the previous example: .nf s3cmd sync /local/path s3://test\-bucket/backup/ .fi will sync: .nf /local/\fBpath/file1.ext\fR \-> s3://bucket/backup/\fBpath/file1.ext\fR /local/\fBpath/dir123/file2.bin\fR \-> s3://bucket/backup/\fBpath/dir123/file2.bin\fR .fi .PP To retrieve the files back from S3 use inverted syntax: .nf s3cmd sync s3://test\-bucket/backup/ ~/restore/ .fi that will download files: .nf s3://bucket/backup/\fBfile1.ext\fR \-> ~/restore/\fBfile1.ext\fR s3://bucket/backup/\fBdir123/file2.bin\fR \-> ~/restore/\fBdir123/file2.bin\fR .fi .PP Without the trailing slash on source the behaviour is similar to what has been demonstrated with upload: .nf s3cmd sync s3://test\-bucket/backup ~/restore/ .fi will download the files as: .nf s3://bucket/\fBbackup/file1.ext\fR \-> ~/restore/\fBbackup/file1.ext\fR s3://bucket/\fBbackup/dir123/file2.bin\fR \-> ~/restore/\fBbackup/dir123/file2.bin\fR .fi .PP All source file names, the bold ones above, are matched against \fBexclude\fR rules and those that match are then re\-checked against \fBinclude\fR rules to see whether they should be excluded or kept in the source list. .PP For the purpose of \fB\-\-exclude\fR and \fB\-\-include\fR matching only the bold file names above are used. For instance only \fBpath/file1.ext\fR is tested against the patterns, not \fI/local/\fBpath/file1.ext\fR .PP Both \fB\-\-exclude\fR and \fB\-\-include\fR work with shell\-style wildcards (a.k.a. GLOB). For a greater flexibility s3cmd provides Regular\-expression versions of the two exclude options named \fB\-\-rexclude\fR and \fB\-\-rinclude\fR. The options with ...\fB\-from\fR suffix (eg \-\-rinclude\-from) expect a filename as an argument. Each line of such a file is treated as one pattern. .PP There is only one set of patterns built from all \fB\-\-(r)exclude(\-from)\fR options and similarly for include variant. Any file excluded with eg \-\-exclude can be put back with a pattern found in \-\-rinclude\-from list. .PP Run s3cmd with \fB\-\-dry\-run\fR to verify that your rules work as expected. Use together with \fB\-\-debug\fR get detailed information about matching file names against exclude and include rules. .PP For example to exclude all files with ".jpg" extension except those beginning with a number use: .PP \-\-exclude '*.jpg' \-\-rinclude '[0\-9].*\.jpg' .PP To exclude all files except "*.jpg" extension, use: .PP \-\-exclude '*' \-\-include '*.jpg' .PP To exclude local directory 'somedir', be sure to use a trailing forward slash, as such: .PP \-\-exclude 'somedir/' .PP .SH SEE ALSO For the most up to date list of options run: .B s3cmd \-\-help .br For more info about usage, examples and other related info visit project homepage at: .B https://s3tools.org .SH AUTHOR Written by Michal Ludvig, Florent Viard and contributors .SH CONTACT, SUPPORT Preferred way to get support is our mailing list: .br .I s3tools\-general@lists.sourceforge.net .br or visit the project homepage: .br .B https://s3tools.org .SH REPORTING BUGS Report bugs to .I s3tools\-bugs@lists.sourceforge.net .SH COPYRIGHT Copyright \(co 2007\-2023 TGRMN Software (https://www.tgrmn.com), Sodria SAS (https://www.sodria.com) and contributors .br .SH LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. .br s3cmd-2.4.0/PKG-INFO0000664000175100017510000000425514535744737013242 0ustar floflo00000000000000Metadata-Version: 1.2 Name: s3cmd Version: 2.4.0 Summary: Command line tool for managing Amazon S3 and CloudFront services Home-page: http://s3tools.org Author: Michal Ludvig Author-email: michal@logix.cz Maintainer: github.com/fviard, github.com/matteobar Maintainer-email: s3tools-bugs@lists.sourceforge.net License: GNU GPL v2+ Description: S3cmd lets you copy files from/to Amazon S3 (Simple Storage Service) using a simple to use command line client. Supports rsync-like backup, GPG encryption, and more. Also supports management of Amazon's CloudFront content delivery network. Authors: -------- Florent Viard Michal Ludvig Matt Domsch (github.com/mdomsch) Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Environment :: MacOS X Classifier: Environment :: Win32 (MS Windows) Classifier: Intended Audience :: End Users/Desktop Classifier: Intended Audience :: System Administrators Classifier: License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+) Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: POSIX Classifier: Operating System :: Unix Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Topic :: System :: Archiving Classifier: Topic :: Utilities s3cmd-2.4.0/S3/0000775000175100017510000000000014535744737012424 5ustar floflo00000000000000s3cmd-2.4.0/S3/Exceptions.py0000664000175100017510000001136414534034713015105 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager - Exceptions library ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import from logging import debug, error import sys import S3.BaseUtils import S3.Utils from . import ExitCodes if sys.version_info >= (3, 0): PY3 = True # In python 3, unicode -> str, and str -> bytes unicode = str else: PY3 = False ## External exceptions from ssl import SSLError as S3SSLError try: from ssl import CertificateError as S3SSLCertificateError except ImportError: class S3SSLCertificateError(Exception): pass try: from xml.etree.ElementTree import ParseError as XmlParseError except ImportError: # ParseError was only added in python2.7, before ET was raising ExpatError from xml.parsers.expat import ExpatError as XmlParseError ## s3cmd exceptions class S3Exception(Exception): def __init__(self, message=""): self.message = S3.Utils.unicodise(message) def __str__(self): ## Don't return self.message directly because ## __unicode__() method could be overridden in subclasses! if PY3: return self.__unicode__() else: return S3.Utils.deunicodise(self.__unicode__()) def __unicode__(self): return self.message ## (Base)Exception.message has been deprecated in Python 2.6 def _get_message(self): return self._message def _set_message(self, message): self._message = message message = property(_get_message, _set_message) class S3Error(S3Exception): def __init__(self, response): self.status = response["status"] self.reason = response["reason"] self.info = { "Code": "", "Message": "", "Resource": "" } debug("S3Error: %s (%s)" % (self.status, self.reason)) if "headers" in response: for header in response["headers"]: debug("HttpHeader: %s: %s" % (header, response["headers"][header])) if "data" in response and response["data"]: try: tree = S3.BaseUtils.getTreeFromXml(response["data"]) except XmlParseError: debug("Not an XML response") else: try: self.info.update(self.parse_error_xml(tree)) except Exception as e: error("Error parsing xml: %s. ErrorXML: %s" % (e, response["data"])) self.code = self.info["Code"] self.message = self.info["Message"] self.resource = self.info["Resource"] def __unicode__(self): retval = u"%d " % (self.status) retval += (u"(%s)" % (self.code or self.reason)) error_msg = self.message if error_msg: retval += (u": %s" % error_msg) return retval def get_error_code(self): if self.status in [301, 307]: return ExitCodes.EX_SERVERMOVED elif self.status in [400, 405, 411, 416, 417, 501, 504]: return ExitCodes.EX_SERVERERROR elif self.status == 403: return ExitCodes.EX_ACCESSDENIED elif self.status == 404: return ExitCodes.EX_NOTFOUND elif self.status == 409: return ExitCodes.EX_CONFLICT elif self.status == 412: return ExitCodes.EX_PRECONDITION elif self.status == 500: return ExitCodes.EX_SOFTWARE elif self.status in [429, 503]: return ExitCodes.EX_SERVICE else: return ExitCodes.EX_SOFTWARE @staticmethod def parse_error_xml(tree): info = {} error_node = tree if not error_node.tag == "Error": error_node = tree.find(".//Error") if error_node is not None: for child in error_node: if child.text != "": debug("ErrorXML: " + child.tag + ": " + repr(child.text)) info[child.tag] = child.text else: raise S3ResponseError("Malformed error XML returned from remote server.") return info class CloudFrontError(S3Error): pass class S3UploadError(S3Exception): pass class S3DownloadError(S3Exception): pass class S3RequestError(S3Exception): pass class S3ResponseError(S3Exception): pass class InvalidFileError(S3Exception): pass class ParameterError(S3Exception): pass # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/CloudFront.py0000664000175100017510000011132314534034713015037 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon CloudFront support ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import import sys import time import random from collections import defaultdict from datetime import datetime from logging import debug, info, warning, error try: import xml.etree.ElementTree as ET except ImportError: import elementtree.ElementTree as ET from .S3 import S3 from .Config import Config from .Exceptions import CloudFrontError, ParameterError from .ExitCodes import EX_OK, EX_GENERAL, EX_PARTIAL from .BaseUtils import (getTreeFromXml, appendXmlTextNode, getDictFromTree, dateS3toPython, encode_to_s3, decode_from_s3) from .Utils import (getBucketFromHostname, getHostnameFromBucket, deunicodise, convertHeaderTupleListToDict) from .Crypto import sign_string_v2 from .S3Uri import S3Uri, S3UriS3 from .ConnMan import ConnMan from .SortedDict import SortedDict PY3 = (sys.version_info >= (3, 0)) cloudfront_api_version = "2010-11-01" cloudfront_resource = "/%(api_ver)s/distribution" % { 'api_ver' : cloudfront_api_version } def output(message): sys.stdout.write(message + "\n") def pretty_output(label, message): #label = ("%s " % label).ljust(20, ".") label = ("%s:" % label).ljust(15) output("%s %s" % (label, message)) class DistributionSummary(object): ## Example: ## ## ## 1234567890ABC ## Deployed ## 2009-01-16T11:49:02.189Z ## blahblahblah.cloudfront.net ## ## example.bucket.s3.amazonaws.com ## ## cdn.example.com ## img.example.com ## What Ever ## true ## def __init__(self, tree): if tree.tag != "DistributionSummary": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) self.info['Enabled'] = (self.info['Enabled'].lower() == "true") if "CNAME" in self.info and type(self.info['CNAME']) != list: self.info['CNAME'] = [self.info['CNAME']] def uri(self): return S3Uri(u"cf://%s" % self.info['Id']) class DistributionList(object): ## Example: ## ## ## ## 100 ## false ## ## ... handled by DistributionSummary() class ... ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "DistributionList": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) ## Normalise some items self.info['IsTruncated'] = (self.info['IsTruncated'].lower() == "true") self.dist_summs = [] for dist_summ in tree.findall(".//DistributionSummary"): self.dist_summs.append(DistributionSummary(dist_summ)) class Distribution(object): ## Example: ## ## ## 1234567890ABC ## InProgress ## 2009-01-16T13:07:11.319Z ## blahblahblah.cloudfront.net ## ## ... handled by DistributionConfig() class ... ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "Distribution": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) ## Normalise some items self.info['LastModifiedTime'] = dateS3toPython(self.info['LastModifiedTime']) self.info['DistributionConfig'] = DistributionConfig(tree = tree.find(".//DistributionConfig")) def uri(self): return S3Uri(u"cf://%s" % self.info['Id']) class DistributionConfig(object): ## Example: ## ## ## somebucket.s3.amazonaws.com ## s3://somebucket/ ## http://somebucket.s3.amazonaws.com/ ## true ## ## bu.ck.et ## /cf-somebucket/ ## ## EMPTY_CONFIG = "true" xmlns = "http://cloudfront.amazonaws.com/doc/%(api_ver)s/" % { 'api_ver' : cloudfront_api_version } def __init__(self, xml = None, tree = None): if xml is None: xml = DistributionConfig.EMPTY_CONFIG if tree is None: tree = getTreeFromXml(xml) if tree.tag != "DistributionConfig": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) self.info['Enabled'] = (self.info['Enabled'].lower() == "true") if "CNAME" not in self.info: self.info['CNAME'] = [] if type(self.info['CNAME']) != list: self.info['CNAME'] = [self.info['CNAME']] self.info['CNAME'] = [cname.lower() for cname in self.info['CNAME']] if "Comment" not in self.info: self.info['Comment'] = "" if "DefaultRootObject" not in self.info: self.info['DefaultRootObject'] = "" ## Figure out logging - complex node not parsed by getDictFromTree() logging_nodes = tree.findall(".//Logging") if logging_nodes: logging_dict = getDictFromTree(logging_nodes[0]) logging_dict['Bucket'], success = getBucketFromHostname(logging_dict['Bucket']) if not success: warning("Logging to unparsable bucket name: %s" % logging_dict['Bucket']) self.info['Logging'] = S3UriS3(u"s3://%(Bucket)s/%(Prefix)s" % logging_dict) else: self.info['Logging'] = None def get_printable_tree(self): tree = ET.Element("DistributionConfig") tree.attrib['xmlns'] = DistributionConfig.xmlns ## Retain the order of the following calls! s3org = appendXmlTextNode("S3Origin", '', tree) appendXmlTextNode("DNSName", self.info['S3Origin']['DNSName'], s3org) appendXmlTextNode("CallerReference", self.info['CallerReference'], tree) for cname in self.info['CNAME']: appendXmlTextNode("CNAME", cname.lower(), tree) if self.info['Comment']: appendXmlTextNode("Comment", self.info['Comment'], tree) appendXmlTextNode("Enabled", str(self.info['Enabled']).lower(), tree) # don't create a empty DefaultRootObject element as it would result in a MalformedXML error if str(self.info['DefaultRootObject']): appendXmlTextNode("DefaultRootObject", str(self.info['DefaultRootObject']), tree) if self.info['Logging']: logging_el = ET.Element("Logging") appendXmlTextNode("Bucket", getHostnameFromBucket(self.info['Logging'].bucket()), logging_el) appendXmlTextNode("Prefix", self.info['Logging'].object(), logging_el) tree.append(logging_el) return tree def __unicode__(self): return decode_from_s3(ET.tostring(self.get_printable_tree())) def __str__(self): if PY3: # Return unicode return ET.tostring(self.get_printable_tree(), encoding="unicode") else: # Return bytes return ET.tostring(self.get_printable_tree()) class Invalidation(object): ## Example: ## ## ## id ## status ## date ## ## /image1.jpg ## /image2.jpg ## /videos/movie.flv ## my-batch ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "Invalidation": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) def __str__(self): return str(self.info) class InvalidationList(object): ## Example: ## ## ## ## Invalidation ID ## 2 ## true ## ## [Second Invalidation ID] ## Completed ## ## ## [First Invalidation ID] ## Completed ## ## def __init__(self, xml): tree = getTreeFromXml(xml) if tree.tag != "InvalidationList": raise ValueError("Expected xml, got: <%s />" % tree.tag) self.parse(tree) def parse(self, tree): self.info = getDictFromTree(tree) def __str__(self): return str(self.info) class InvalidationBatch(object): ## Example: ## ## ## /image1.jpg ## /image2.jpg ## /videos/movie.flv ## /sound%20track.mp3 ## my-batch ## def __init__(self, reference = None, distribution = None, paths = []): if reference: self.reference = reference else: if not distribution: distribution="0" self.reference = "%s.%s.%s" % (distribution, datetime.strftime(datetime.now(),"%Y%m%d%H%M%S"), random.randint(1000,9999)) self.paths = [] self.add_objects(paths) def add_objects(self, paths): self.paths.extend(paths) def get_reference(self): return self.reference def get_printable_tree(self): tree = ET.Element("InvalidationBatch") for path in self.paths: if len(path) < 1 or path[0] != "/": path = "/" + path appendXmlTextNode("Path", path, tree) appendXmlTextNode("CallerReference", self.reference, tree) return tree def __unicode__(self): return decode_from_s3(ET.tostring(self.get_printable_tree())) def __str__(self): if PY3: # Return unicode return ET.tostring(self.get_printable_tree(), encoding="unicode") else: # Return bytes return ET.tostring(self.get_printable_tree()) class CloudFront(object): operations = { "CreateDist" : { 'method' : "POST", 'resource' : "" }, "DeleteDist" : { 'method' : "DELETE", 'resource' : "/%(dist_id)s" }, "GetList" : { 'method' : "GET", 'resource' : "" }, "GetDistInfo" : { 'method' : "GET", 'resource' : "/%(dist_id)s" }, "GetDistConfig" : { 'method' : "GET", 'resource' : "/%(dist_id)s/config" }, "SetDistConfig" : { 'method' : "PUT", 'resource' : "/%(dist_id)s/config" }, "Invalidate" : { 'method' : "POST", 'resource' : "/%(dist_id)s/invalidation" }, "GetInvalList" : { 'method' : "GET", 'resource' : "/%(dist_id)s/invalidation" }, "GetInvalInfo" : { 'method' : "GET", 'resource' : "/%(dist_id)s/invalidation/%(request_id)s" }, } dist_list = None def __init__(self, config): self.config = config ## -------------------------------------------------- ## Methods implementing CloudFront API ## -------------------------------------------------- def GetList(self): response = self.send_request("GetList") response['dist_list'] = DistributionList(response['data']) if response['dist_list'].info['IsTruncated']: raise NotImplementedError("List is truncated. Ask s3cmd author to add support.") ## TODO: handle Truncated return response def CreateDistribution(self, uri, cnames_add = [], comment = None, logging = None, default_root_object = None): dist_config = DistributionConfig() dist_config.info['Enabled'] = True dist_config.info['S3Origin']['DNSName'] = uri.host_name() dist_config.info['CallerReference'] = str(uri) dist_config.info['DefaultRootObject'] = default_root_object if comment == None: dist_config.info['Comment'] = uri.public_url() else: dist_config.info['Comment'] = comment for cname in cnames_add: if dist_config.info['CNAME'].count(cname) == 0: dist_config.info['CNAME'].append(cname) if logging: dist_config.info['Logging'] = S3UriS3(logging) request_body = str(dist_config) debug("CreateDistribution(): request_body: %s" % request_body) response = self.send_request("CreateDist", body = request_body) response['distribution'] = Distribution(response['data']) return response def ModifyDistribution(self, cfuri, cnames_add = [], cnames_remove = [], comment = None, enabled = None, logging = None, default_root_object = None): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) # Get current dist status (enabled/disabled) and Etag info("Checking current status of %s" % cfuri) response = self.GetDistConfig(cfuri) dc = response['dist_config'] if enabled != None: dc.info['Enabled'] = enabled if comment != None: dc.info['Comment'] = comment if default_root_object != None: dc.info['DefaultRootObject'] = default_root_object for cname in cnames_add: if dc.info['CNAME'].count(cname) == 0: dc.info['CNAME'].append(cname) for cname in cnames_remove: while dc.info['CNAME'].count(cname) > 0: dc.info['CNAME'].remove(cname) if logging != None: if logging == False: dc.info['Logging'] = False else: dc.info['Logging'] = S3UriS3(logging) response = self.SetDistConfig(cfuri, dc, response['headers']['etag']) return response def DeleteDistribution(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) # Get current dist status (enabled/disabled) and Etag info("Checking current status of %s" % cfuri) response = self.GetDistConfig(cfuri) if response['dist_config'].info['Enabled']: info("Distribution is ENABLED. Disabling first.") response['dist_config'].info['Enabled'] = False response = self.SetDistConfig(cfuri, response['dist_config'], response['headers']['etag']) warning("Waiting for Distribution to become disabled.") warning("This may take several minutes, please wait.") while True: response = self.GetDistInfo(cfuri) d = response['distribution'] if d.info['Status'] == "Deployed" and d.info['Enabled'] == False: info("Distribution is now disabled") break warning("Still waiting...") time.sleep(10) headers = SortedDict(ignore_case = True) headers['if-match'] = response['headers']['etag'] response = self.send_request("DeleteDist", dist_id = cfuri.dist_id(), headers = headers) return response def GetDistInfo(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) response = self.send_request("GetDistInfo", dist_id = cfuri.dist_id()) response['distribution'] = Distribution(response['data']) return response def GetDistConfig(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) response = self.send_request("GetDistConfig", dist_id = cfuri.dist_id()) response['dist_config'] = DistributionConfig(response['data']) return response def SetDistConfig(self, cfuri, dist_config, etag = None): if etag == None: debug("SetDistConfig(): Etag not set. Fetching it first.") etag = self.GetDistConfig(cfuri)['headers']['etag'] debug("SetDistConfig(): Etag = %s" % etag) request_body = str(dist_config) debug("SetDistConfig(): request_body: %s" % request_body) headers = SortedDict(ignore_case = True) headers['if-match'] = etag response = self.send_request("SetDistConfig", dist_id = cfuri.dist_id(), body = request_body, headers = headers) return response def InvalidateObjects(self, uri, paths, default_index_file, invalidate_default_index_on_cf, invalidate_default_index_root_on_cf): # joseprio: if the user doesn't want to invalidate the default index # path, or if the user wants to invalidate the root of the default # index, we need to process those paths if default_index_file is not None and (not invalidate_default_index_on_cf or invalidate_default_index_root_on_cf): new_paths = [] default_index_suffix = '/' + default_index_file for path in paths: if path.endswith(default_index_suffix) or path == default_index_file: if invalidate_default_index_on_cf: new_paths.append(path) if invalidate_default_index_root_on_cf: new_paths.append(path[:-len(default_index_file)]) else: new_paths.append(path) paths = new_paths # uri could be either cf:// or s3:// uri cfuris = self.get_dist_name_for_bucket(uri) if len(paths) > 999: try: tmp_filename = Utils.mktmpfile() with open(deunicodise(tmp_filename), "w") as fp: fp.write(deunicodise("\n".join(paths)+"\n")) warning("Request to invalidate %d paths (max 999 supported)" % len(paths)) warning("All the paths are now saved in: %s" % tmp_filename) except Exception: pass raise ParameterError("Too many paths to invalidate") responses = [] for cfuri in cfuris: invalbatch = InvalidationBatch(distribution = cfuri.dist_id(), paths = paths) debug("InvalidateObjects(): request_body: %s" % invalbatch) response = self.send_request("Invalidate", dist_id = cfuri.dist_id(), body = str(invalbatch)) response['dist_id'] = cfuri.dist_id() if response['status'] == 201: inval_info = Invalidation(response['data']).info response['request_id'] = inval_info['Id'] debug("InvalidateObjects(): response: %s" % response) responses.append(response) return responses def GetInvalList(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) response = self.send_request("GetInvalList", dist_id = cfuri.dist_id()) response['inval_list'] = InvalidationList(response['data']) return response def GetInvalInfo(self, cfuri): if cfuri.type != "cf": raise ValueError("Expected CFUri instead of: %s" % cfuri) if cfuri.request_id() is None: raise ValueError("Expected CFUri with Request ID") response = self.send_request("GetInvalInfo", dist_id = cfuri.dist_id(), request_id = cfuri.request_id()) response['inval_status'] = Invalidation(response['data']) return response ## -------------------------------------------------- ## Low-level methods for handling CloudFront requests ## -------------------------------------------------- def send_request(self, op_name, dist_id = None, request_id = None, body = None, headers = None, retries = None): if retries is None: retries = self.config.max_retries if headers is None: headers = SortedDict(ignore_case = True) operation = self.operations[op_name] if body: headers['content-type'] = 'text/plain' request = self.create_request(operation, dist_id, request_id, headers) conn = self.get_connection() debug("send_request(): %s %s" % (request['method'], request['resource'])) conn.c.request(request['method'], request['resource'], body, request['headers']) http_response = conn.c.getresponse() response = {} response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertHeaderTupleListToDict(http_response.getheaders()) response["data"] = http_response.read() ConnMan.put(conn) debug("CloudFront: response: %r" % response) if response["status"] >= 500: e = CloudFrontError(response) if retries: warning(u"Retrying failed request: %s (%s)" % (op_name, e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(op_name, dist_id, body = body, retries = retries - 1) else: raise e if response["status"] < 200 or response["status"] > 299: raise CloudFrontError(response) return response def create_request(self, operation, dist_id = None, request_id = None, headers = None): resource = cloudfront_resource + ( operation['resource'] % { 'dist_id' : dist_id, 'request_id' : request_id }) if not headers: headers = SortedDict(ignore_case = True) if "date" in headers: if "x-amz-date" not in headers: headers["x-amz-date"] = headers["date"] del(headers["date"]) if "x-amz-date" not in headers: headers["x-amz-date"] = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime()) if len(self.config.access_token)>0: self.config.role_refresh() headers['x-amz-security-token']=self.config.access_token signature = self.sign_request(headers) headers["Authorization"] = "AWS "+self.config.access_key+":"+signature request = {} request['resource'] = resource request['headers'] = headers request['method'] = operation['method'] return request def sign_request(self, headers): string_to_sign = headers['x-amz-date'] signature = decode_from_s3(sign_string_v2(encode_to_s3(string_to_sign))) debug(u"CloudFront.sign_request('%s') = %s" % (string_to_sign, signature)) return signature def get_connection(self): conn = ConnMan.get(self.config.cloudfront_host, ssl = True) return conn def _fail_wait(self, retries): # Wait a few seconds. The more it fails the more we wait. return (self.config.max_retries - retries + 1) * 3 def get_dist_name_for_bucket(self, uri): if uri.type == "cf": return [uri] if uri.type != "s3": raise ParameterError("CloudFront or S3 URI required instead of: %s" % uri) debug("_get_dist_name_for_bucket(%r)" % uri) if CloudFront.dist_list is None: response = self.GetList() CloudFront.dist_list = {} for d in response['dist_list'].dist_summs: distListIndex = "" if "S3Origin" in d.info: distListIndex = getBucketFromHostname(d.info['S3Origin']['DNSName'])[0] elif "CustomOrigin" in d.info: # Aral: This used to skip over distributions with CustomOrigin, however, we mustn't # do this since S3 buckets that are set up as websites use custom origins. # Thankfully, the custom origin URLs they use start with the URL of the # S3 bucket. Here, we make use this naming convention to support this use case. distListIndex = getBucketFromHostname(d.info['CustomOrigin']['DNSName'])[0] distListIndex = distListIndex[:len(uri.bucket())] else: # Aral: I'm not sure when this condition will be reached, but keeping it in there. continue if CloudFront.dist_list.get(distListIndex, None) is None: CloudFront.dist_list[distListIndex] = set() CloudFront.dist_list[distListIndex].add(d.uri()) debug("dist_list: %s" % CloudFront.dist_list) try: return CloudFront.dist_list[uri.bucket()] except Exception as e: debug(e) raise ParameterError("Unable to translate S3 URI to CloudFront distribution name: %s" % uri) class Cmd(object): """ Class that implements CloudFront commands """ class Options(object): cf_cnames_add = [] cf_cnames_remove = [] cf_comment = None cf_enable = None cf_logging = None cf_default_root_object = None def option_list(self): return [opt for opt in dir(self) if opt.startswith("cf_")] def update_option(self, option, value): setattr(Cmd.options, option, value) options = Options() @staticmethod def _parse_args(args): cf = CloudFront(Config()) cfuris = [] for arg in args: uris = cf.get_dist_name_for_bucket(S3Uri(arg)) cfuris.extend(uris) return cfuris @staticmethod def info(args): cf = CloudFront(Config()) if not args: response = cf.GetList() for d in response['dist_list'].dist_summs: if "S3Origin" in d.info: origin = S3UriS3.httpurl_to_s3uri(d.info['S3Origin']['DNSName']) elif "CustomOrigin" in d.info: origin = "http://%s/" % d.info['CustomOrigin']['DNSName'] else: origin = "" pretty_output("Origin", origin) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) if "CNAME" in d.info: pretty_output("CNAMEs", ", ".join(d.info['CNAME'])) pretty_output("Status", d.info['Status']) pretty_output("Enabled", d.info['Enabled']) output("") else: cfuris = Cmd._parse_args(args) for cfuri in cfuris: response = cf.GetDistInfo(cfuri) d = response['distribution'] dc = d.info['DistributionConfig'] if "S3Origin" in dc.info: origin = S3UriS3.httpurl_to_s3uri(dc.info['S3Origin']['DNSName']) elif "CustomOrigin" in dc.info: origin = "http://%s/" % dc.info['CustomOrigin']['DNSName'] else: origin = "" pretty_output("Origin", origin) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) if "CNAME" in dc.info: pretty_output("CNAMEs", ", ".join(dc.info['CNAME'])) pretty_output("Status", d.info['Status']) pretty_output("Comment", dc.info['Comment']) pretty_output("Enabled", dc.info['Enabled']) pretty_output("DfltRootObject", dc.info['DefaultRootObject']) pretty_output("Logging", dc.info['Logging'] or "Disabled") pretty_output("Etag", response['headers']['etag']) @staticmethod def create(args): cf = CloudFront(Config()) buckets = [] for arg in args: uri = S3Uri(arg) if uri.type != "s3": raise ParameterError("Distribution can only be created from a s3:// URI instead of: %s" % arg) if uri.object(): raise ParameterError("Use s3:// URI with a bucket name only instead of: %s" % arg) if not uri.is_dns_compatible(): raise ParameterError("CloudFront can only handle lowercase-named buckets.") buckets.append(uri) if not buckets: raise ParameterError("No valid bucket names found") for uri in buckets: info("Creating distribution from: %s" % uri) response = cf.CreateDistribution(uri, cnames_add = Cmd.options.cf_cnames_add, comment = Cmd.options.cf_comment, logging = Cmd.options.cf_logging, default_root_object = Cmd.options.cf_default_root_object) d = response['distribution'] dc = d.info['DistributionConfig'] output("Distribution created:") pretty_output("Origin", S3UriS3.httpurl_to_s3uri(dc.info['S3Origin']['DNSName'])) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) pretty_output("CNAMEs", ", ".join(dc.info['CNAME'])) pretty_output("Comment", dc.info['Comment']) pretty_output("Status", d.info['Status']) pretty_output("Enabled", dc.info['Enabled']) pretty_output("DefaultRootObject", dc.info['DefaultRootObject']) pretty_output("Etag", response['headers']['etag']) @staticmethod def delete(args): cf = CloudFront(Config()) cfuris = Cmd._parse_args(args) for cfuri in cfuris: response = cf.DeleteDistribution(cfuri) if response['status'] >= 400: error("Distribution %s could not be deleted: %s" % (cfuri, response['reason'])) output("Distribution %s deleted" % cfuri) @staticmethod def modify(args): cf = CloudFront(Config()) if len(args) > 1: raise ParameterError("Too many parameters. Modify one Distribution at a time.") try: cfuri = Cmd._parse_args(args)[0] except IndexError: raise ParameterError("No valid Distribution URI found.") response = cf.ModifyDistribution(cfuri, cnames_add = Cmd.options.cf_cnames_add, cnames_remove = Cmd.options.cf_cnames_remove, comment = Cmd.options.cf_comment, enabled = Cmd.options.cf_enable, logging = Cmd.options.cf_logging, default_root_object = Cmd.options.cf_default_root_object) if response['status'] >= 400: error("Distribution %s could not be modified: %s" % (cfuri, response['reason'])) output("Distribution modified: %s" % cfuri) response = cf.GetDistInfo(cfuri) d = response['distribution'] dc = d.info['DistributionConfig'] pretty_output("Origin", S3UriS3.httpurl_to_s3uri(dc.info['S3Origin']['DNSName'])) pretty_output("DistId", d.uri()) pretty_output("DomainName", d.info['DomainName']) pretty_output("Status", d.info['Status']) pretty_output("CNAMEs", ", ".join(dc.info['CNAME'])) pretty_output("Comment", dc.info['Comment']) pretty_output("Enabled", dc.info['Enabled']) pretty_output("DefaultRootObject", dc.info['DefaultRootObject']) pretty_output("Etag", response['headers']['etag']) @staticmethod def invalinfo(args): cf = CloudFront(Config()) cfuris = Cmd._parse_args(args) requests = [] for cfuri in cfuris: if cfuri.request_id(): requests.append(str(cfuri)) else: inval_list = cf.GetInvalList(cfuri) try: for i in inval_list['inval_list'].info['InvalidationSummary']: requests.append("/".join(["cf:/", cfuri.dist_id(), i["Id"]])) except Exception: continue for req in requests: cfuri = S3Uri(req) inval_info = cf.GetInvalInfo(cfuri) st = inval_info['inval_status'].info paths = st['InvalidationBatch']['Path'] nr_of_paths = len(paths) if isinstance(paths, list) else 1 pretty_output("URI", str(cfuri)) pretty_output("Status", st['Status']) pretty_output("Created", st['CreateTime']) pretty_output("Nr of paths", nr_of_paths) pretty_output("Reference", st['InvalidationBatch']['CallerReference']) output("") @staticmethod def invalidate(args): cfg = Config() cf = CloudFront(cfg) s3 = S3(cfg) bucket_paths = defaultdict(list) for arg in args: uri = S3Uri(arg) uobject = uri.object() if not uobject: # If object is not defined, we want to invalidate the whole bucket uobject = '*' elif uobject[-1] == '/': # If object is folder (ie prefix), we want to invalidate the whole content uobject += '*' bucket_paths[uri.bucket()].append(uobject) ret = EX_OK params = [] for bucket, paths in bucket_paths.items(): base_uri = S3Uri(u's3://%s' % bucket) cfuri = next(iter(cf.get_dist_name_for_bucket(base_uri))) default_index_file = None if cfg.invalidate_default_index_on_cf or cfg.invalidate_default_index_root_on_cf: info_response = s3.website_info(base_uri, cfg.bucket_location) if info_response: default_index_file = info_response['index_document'] if not default_index_file: default_index_file = None if cfg.dry_run: fulluri_paths = [S3UriS3.compose_uri(bucket, path) for path in paths] output(u"[--dry-run] Would invalidate %r" % fulluri_paths) continue params.append((bucket, paths, base_uri, cfuri, default_index_file)) if cfg.dry_run: warning(u"Exiting now because of --dry-run") return EX_OK nb_success = 0 first = True for bucket, paths, base_uri, cfuri, default_index_file in params: if not first: output("") else: first = False results = cf.InvalidateObjects( cfuri, paths, default_index_file, cfg.invalidate_default_index_on_cf, cfg.invalidate_default_index_root_on_cf ) dist_id = cfuri.dist_id() pretty_output("URI", str(base_uri)) pretty_output("DistId", dist_id) pretty_output("Nr of paths", len(paths)) for result in results: result_code = result['status'] if result_code != 201: pretty_output("Status", "Failed: %d" % result_code) ret = EX_GENERAL continue request_id = result['request_id'] nb_success += 1 pretty_output("Status", "Created") pretty_output("RequestId", request_id) pretty_output("Info", u"Check progress with: s3cmd cfinvalinfo %s/%s" % (dist_id, request_id)) if ret != EX_OK and cfg.stop_on_error: error(u"Exiting now because of --stop-on-error") break if ret != EX_OK and nb_success: ret = EX_PARTIAL return ret # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/Custom_httplib3x.py0000664000175100017510000002636114534034713016242 0ustar floflo00000000000000from __future__ import absolute_import, print_function import os import sys import http.client as httplib from http.client import (_CS_REQ_SENT, _CS_REQ_STARTED, CONTINUE, UnknownProtocol, CannotSendHeader, NO_CONTENT, NOT_MODIFIED, EXPECTATION_FAILED, HTTPMessage, HTTPException) from io import StringIO from .BaseUtils import encode_to_s3 _METHODS_EXPECTING_BODY = ['PATCH', 'POST', 'PUT'] # Fixed python 2.X httplib to be able to support # Expect: 100-Continue http feature # Inspired by: # http://bugs.python.org/file26357/issue1346874-273.patch def _encode(data, name='data'): """Call data.encode("latin-1") but show a better error message.""" try: return data.encode("latin-1") except UnicodeEncodeError as err: # The following is equivalent to raise Exception() from None # but is still byte-compilable compatible with python 2. exc = UnicodeEncodeError( err.encoding, err.object, err.start, err.end, "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') " "if you want to send it encoded in UTF-8." % (name.title(), data[err.start:err.end], name)) exc.__cause__ = None raise exc def httpresponse_patched_begin(self): """ Re-implemented httplib begin function to not loop over "100 CONTINUE" status replies but to report it to higher level so it can be processed. """ if self.headers is not None: # we've already started reading the response return # read only one status even if we get a non-100 response version, status, reason = self._read_status() self.code = self.status = status self.reason = reason.strip() if version in ('HTTP/1.0', 'HTTP/0.9'): # Some servers might still return "0.9", treat it as 1.0 anyway self.version = 10 elif version.startswith('HTTP/1.'): self.version = 11 # use HTTP/1.1 code for HTTP/1.x where x>=1 else: raise UnknownProtocol(version) self.headers = self.msg = httplib.parse_headers(self.fp) if self.debuglevel > 0: for hdr in self.headers: print("header:", hdr, end=" ") # are we using the chunked-style of transfer encoding? tr_enc = self.headers.get('transfer-encoding') if tr_enc and tr_enc.lower() == "chunked": self.chunked = True self.chunk_left = None else: self.chunked = False # will the connection close at the end of the response? self.will_close = self._check_close() # do we have a Content-Length? # NOTE: RFC 2616, S4.4, #3 says we ignore this if tr_enc is "chunked" self.length = None length = self.headers.get('content-length') if length and not self.chunked: try: self.length = int(length) except ValueError: self.length = None else: if self.length < 0: # ignore nonsensical negative lengths self.length = None else: self.length = None # does the body have a fixed length? (of zero) if (status == NO_CONTENT or status == NOT_MODIFIED or 100 <= status < 200 or # 1xx codes self._method == 'HEAD'): self.length = 0 # if the connection remains open, and we aren't using chunked, and # a content-length was not provided, then assume that the connection # WILL close. if (not self.will_close and not self.chunked and self.length is None): self.will_close = True # No need to override httplib with this one, as it is only used by send_request def httpconnection_patched_get_content_length(body, method): """## REIMPLEMENTED because new in last httplib but needed by send_request""" """Get the content-length based on the body. If the body is None, we set Content-Length: 0 for methods that expect a body (RFC 7230, Section 3.3.2). We also set the Content-Length for any method if the body is a str or bytes-like object and not a file. """ if body is None: # do an explicit check for not None here to distinguish # between unset and set but empty if method.upper() in _METHODS_EXPECTING_BODY: return 0 else: return None if hasattr(body, 'read'): # file-like object. return None try: # does it implement the buffer protocol (bytes, bytearray, array)? mv = memoryview(body) return mv.nbytes except TypeError: pass if isinstance(body, str): return len(body) return None def httpconnection_patched_send_request(self, method, url, body, headers, encode_chunked=False): # Honor explicitly requested Host: and Accept-Encoding: headers. header_names = dict.fromkeys([k.lower() for k in headers]) skips = {} if 'host' in header_names: skips['skip_host'] = 1 if 'accept-encoding' in header_names: skips['skip_accept_encoding'] = 1 expect_continue = False for hdr, value in headers.items(): if 'expect' == hdr.lower() and '100-continue' in value.lower(): expect_continue = True self.putrequest(method, url, **skips) # chunked encoding will happen if HTTP/1.1 is used and either # the caller passes encode_chunked=True or the following # conditions hold: # 1. content-length has not been explicitly set # 2. the body is a file or iterable, but not a str or bytes-like # 3. Transfer-Encoding has NOT been explicitly set by the caller if 'content-length' not in header_names: # only chunk body if not explicitly set for backwards # compatibility, assuming the client code is already handling the # chunking if 'transfer-encoding' not in header_names: # if content-length cannot be automatically determined, fall # back to chunked encoding encode_chunked = False content_length = httpconnection_patched_get_content_length(body, method) if content_length is None: if body is not None: if self.debuglevel > 0: print('Unable to determine size of %r' % body) encode_chunked = True self.putheader('Transfer-Encoding', 'chunked') else: self.putheader('Content-Length', str(content_length)) else: encode_chunked = False for hdr, value in headers.items(): self.putheader(encode_to_s3(hdr), encode_to_s3(value)) if isinstance(body, str): # RFC 2616 Section 3.7.1 says that text default has a # default charset of iso-8859-1. body = _encode(body, 'body') # If an Expect: 100-continue was sent, we need to check for a 417 # Expectation Failed to avoid unnecessarily sending the body # See RFC 2616 8.2.3 if not expect_continue: self.endheaders(body, encode_chunked=encode_chunked) else: if not body: raise HTTPException("A body is required when expecting " "100-continue") self.endheaders() resp = self.getresponse() resp.read() self._HTTPConnection__state = _CS_REQ_SENT if resp.status == EXPECTATION_FAILED: raise ExpectationFailed() elif resp.status == CONTINUE: self.wrapper_send_body(body, encode_chunked) def httpconnection_patched_endheaders(self, message_body=None, encode_chunked=False): """REIMPLEMENTED because new argument encode_chunked added after py 3.4""" """Indicate that the last header line has been sent to the server. This method sends the request to the server. The optional message_body argument can be used to pass a message body associated with the request. """ if self._HTTPConnection__state == _CS_REQ_STARTED: self._HTTPConnection__state = _CS_REQ_SENT else: raise CannotSendHeader() self._send_output(message_body, encode_chunked=encode_chunked) def httpconnection_patched_read_readable(self, readable): """REIMPLEMENTED because needed by send_output and added after py 3.4 """ blocksize = 8192 if self.debuglevel > 0: print("sendIng a read()able") encode = self._is_textIO(readable) if encode and self.debuglevel > 0: print("encoding file using iso-8859-1") while True: datablock = readable.read(blocksize) if not datablock: break if encode: datablock = datablock.encode("iso-8859-1") yield datablock def httpconnection_patched_send_output(self, message_body=None, encode_chunked=False): """REIMPLEMENTED because needed by endheaders and parameter encode_chunked was added""" """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. A message_body may be specified, to be appended to the request. """ self._buffer.extend((b"", b"")) msg = b"\r\n".join(self._buffer) del self._buffer[:] self.send(msg) if message_body is not None: self.wrapper_send_body(message_body, encode_chunked) class ExpectationFailed(HTTPException): pass # Wrappers # def httpconnection_patched_wrapper_send_body(self, message_body, encode_chunked=False): # create a consistent interface to message_body if hasattr(message_body, 'read'): # Let file-like take precedence over byte-like. This # is needed to allow the current position of mmap'ed # files to be taken into account. chunks = self._read_readable(message_body) else: try: # this is solely to check to see if message_body # implements the buffer API. it /would/ be easier # to capture if PyObject_CheckBuffer was exposed # to Python. memoryview(message_body) except TypeError: try: chunks = iter(message_body) except TypeError: raise TypeError("message_body should be a bytes-like " "object or an iterable, got %r" % type(message_body)) else: # the object implements the buffer interface and # can be passed directly into socket methods chunks = (message_body,) for chunk in chunks: if not chunk: if self.debuglevel > 0: print('Zero length chunk ignored') continue if encode_chunked and self._http_vsn == 11: # chunked encoding chunk = '{:X}\r\n'.format(len(chunk)).encode('ascii') + chunk \ + b'\r\n' self.send(chunk) if encode_chunked and self._http_vsn == 11: # end chunked transfer self.send(b'0\r\n\r\n') httplib.HTTPResponse.begin = httpresponse_patched_begin httplib.HTTPConnection.endheaders = httpconnection_patched_endheaders httplib.HTTPConnection._send_readable = httpconnection_patched_read_readable httplib.HTTPConnection._send_output = httpconnection_patched_send_output httplib.HTTPConnection._send_request = httpconnection_patched_send_request # Interfaces added to httplib.HTTPConnection: httplib.HTTPConnection.wrapper_send_body = httpconnection_patched_wrapper_send_body s3cmd-2.4.0/S3/Config.py0000664000175100017510000007236214534034713014176 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import import logging import datetime import locale import re import os import io import sys import json import time from logging import debug, warning from .ExitCodes import EX_OSFILE try: import dateutil.parser import dateutil.tz except ImportError: sys.stderr.write(u""" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ImportError trying to import dateutil.parser and dateutil.tz. Please install the python dateutil module: $ sudo apt-get install python-dateutil or $ sudo yum install python-dateutil or $ pip install python-dateutil !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! """) sys.stderr.flush() sys.exit(EX_OSFILE) try: # python 3 support import httplib except ImportError: import http.client as httplib try: from configparser import (NoOptionError, NoSectionError, MissingSectionHeaderError, ParsingError, ConfigParser as PyConfigParser) except ImportError: # Python2 fallback code from ConfigParser import (NoOptionError, NoSectionError, MissingSectionHeaderError, ParsingError, ConfigParser as PyConfigParser) from . import Progress from .SortedDict import SortedDict from .BaseUtils import (s3_quote, getTreeFromXml, getDictFromTree, base_unicodise, dateRFC822toPython) try: unicode except NameError: # python 3 support # In python 3, unicode -> str, and str -> bytes unicode = str def is_bool_true(value): """Check to see if a string is true, yes, on, or 1 value may be a str, or unicode. Return True if it is """ if type(value) == unicode: return value.lower() in ["true", "yes", "on", "1"] elif type(value) == bool and value == True: return True else: return False def is_bool_false(value): """Check to see if a string is false, no, off, or 0 value may be a str, or unicode. Return True if it is """ if type(value) == unicode: return value.lower() in ["false", "no", "off", "0"] elif type(value) == bool and value == False: return True else: return False def is_bool(value): """Check a string value to see if it is bool""" return is_bool_true(value) or is_bool_false(value) class Config(object): _instance = None _parsed_files = [] _doc = {} access_key = u"" secret_key = u"" access_token = u"" _access_token_refresh = True _access_token_expiration = None _access_token_last_update = None host_base = u"s3.amazonaws.com" host_bucket = u"%(bucket)s.s3.amazonaws.com" kms_key = u"" #can't set this and Server Side Encryption at the same time # simpledb_host looks useless, legacy? to remove? simpledb_host = u"sdb.amazonaws.com" cloudfront_host = u"cloudfront.amazonaws.com" verbosity = logging.WARNING progress_meter = sys.stdout.isatty() progress_class = Progress.ProgressCR send_chunk = 64 * 1024 recv_chunk = 64 * 1024 list_md5 = False long_listing = False human_readable_sizes = False extra_headers = SortedDict(ignore_case = True) force = False server_side_encryption = False enable = None get_continue = False put_continue = False upload_id = u"" skip_existing = False recursive = False restore_days = 1 restore_priority = u"Standard" acl_public = None acl_grants = [] acl_revokes = [] proxy_host = u"" proxy_port = 3128 encrypt = False dry_run = False add_encoding_exts = u"" preserve_attrs = True preserve_attrs_list = [ u'uname', # Verbose owner Name (e.g. 'root') u'uid', # Numeric user ID (e.g. 0) u'gname', # Group name (e.g. 'users') u'gid', # Numeric group ID (e.g. 100) u'atime', # Last access timestamp u'mtime', # Modification timestamp u'ctime', # Creation timestamp u'mode', # File mode (e.g. rwxr-xr-x = 755) u'md5', # File MD5 (if known) #u'acl', # Full ACL (not yet supported) ] keep_dirs = False delete_removed = False delete_after = False delete_after_fetch = False max_delete = -1 limit = -1 _doc['delete_removed'] = u"[sync] Remove remote S3 objects when local file has been deleted" delay_updates = False # OBSOLETE gpg_passphrase = u"" gpg_command = u"" gpg_encrypt = u"%(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s" gpg_decrypt = u"%(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s" use_https = True ca_certs_file = u"" ssl_client_key_file = u"" ssl_client_cert_file = u"" check_ssl_certificate = True check_ssl_hostname = True bucket_location = u"US" default_mime_type = u"binary/octet-stream" guess_mime_type = True use_mime_magic = True mime_type = u"" enable_multipart = True # Chunk size is at the same time the chunk size and the threshold multipart_chunk_size_mb = 15 # MiB # Maximum chunk size for s3-to-s3 copy is 5 GiB. # But, use a lot lower value by default (1GiB) multipart_copy_chunk_size_mb = 1 * 1024 # Maximum chunks on AWS S3, could be different on other S3-compatible APIs multipart_max_chunks = 10000 # List of checks to be performed for 'sync' sync_checks = ['size', 'md5'] # 'weak-timestamp' # List of compiled REGEXPs exclude = [] include = [] # Dict mapping compiled REGEXPs back to their textual form debug_exclude = {} debug_include = {} encoding = locale.getpreferredencoding() or "UTF-8" urlencoding_mode = u"normal" log_target_prefix = u"" reduced_redundancy = False storage_class = u"" follow_symlinks = False # If too big, this value can be overridden by the OS socket timeouts max values. # For example, on Linux, a connection attempt will automatically timeout after 120s. socket_timeout = 300 invalidate_on_cf = False # joseprio: new flags for default index invalidation invalidate_default_index_on_cf = False invalidate_default_index_root_on_cf = True website_index = u"index.html" website_error = u"" website_endpoint = u"http://%(bucket)s.s3-website-%(location)s.amazonaws.com/" additional_destinations = [] files_from = [] cache_file = u"" add_headers = u"" remove_headers = [] expiry_days = u"" expiry_date = u"" expiry_prefix = u"" skip_destination_validation = False signature_v2 = False limitrate = 0 requester_pays = False stop_on_error = False content_disposition = u"" content_type = u"" stats = False # Disabled by default because can create a latency with a CONTINUE status reply # expected for every send file requests. use_http_expect = False signurl_use_https = False # Maximum sleep duration for throttle / limitrate. # s3 will timeout if a request/transfer is stuck for more than a short time throttle_max = 100 public_url_use_https = False connection_pooling = True # How long in seconds a connection can be kept idle in the pool and still # be alive. AWS s3 is supposed to close connections that are idle for 20 # seconds or more, but in real life, undocumented, it closes https conns # after around 6s of inactivity. connection_max_age = 5 # Not an AWS standard # allow the listing results to be returned in unsorted order. # This may be faster when listing very large buckets. list_allow_unordered = False # Maximum attempts of re-issuing failed requests max_retries = 5 ## Creating a singleton def __new__(self, configfile = None, access_key=None, secret_key=None, access_token=None): if self._instance is None: self._instance = object.__new__(self) return self._instance def __init__(self, configfile = None, access_key=None, secret_key=None, access_token=None): if configfile: try: self.read_config_file(configfile) except IOError: if 'AWS_CREDENTIAL_FILE' in os.environ or 'AWS_PROFILE' in os.environ: self.aws_credential_file() # override these if passed on the command-line if access_key and secret_key: self.access_key = access_key self.secret_key = secret_key if access_token: self.access_token = access_token # Do not refresh the IAM role when an access token is provided. self._access_token_refresh = False if len(self.access_key) == 0: env_access_key = os.getenv('AWS_ACCESS_KEY') or os.getenv('AWS_ACCESS_KEY_ID') env_secret_key = os.getenv('AWS_SECRET_KEY') or os.getenv('AWS_SECRET_ACCESS_KEY') env_access_token = os.getenv('AWS_SESSION_TOKEN') or os.getenv('AWS_SECURITY_TOKEN') if env_access_key: if not env_secret_key: raise ValueError( "AWS_ACCESS_KEY environment variable is used but" " AWS_SECRET_KEY variable is missing" ) # py3 getenv returns unicode and py2 returns bytes. self.access_key = base_unicodise(env_access_key) self.secret_key = base_unicodise(env_secret_key) if env_access_token: # Do not refresh the IAM role when an access token is provided. self._access_token_refresh = False self.access_token = base_unicodise(env_access_token) else: self.role_config() #TODO check KMS key is valid if self.kms_key and self.server_side_encryption == True: warning('Cannot have server_side_encryption (S3 SSE) and KMS_key set (S3 KMS). KMS encryption will be used. Please set server_side_encryption to False') if self.kms_key and self.signature_v2 == True: raise Exception('KMS encryption requires signature v4. Please set signature_v2 to False') def role_config(self): """ Get credentials from IAM authentication and STS AssumeRole """ try: role_arn = os.environ.get('AWS_ROLE_ARN') if role_arn: role_session_name = 'role-session-%s' % (int(time.time())) params = { 'Action': 'AssumeRole', 'Version': '2011-06-15', 'RoleArn': role_arn, 'RoleSessionName': role_session_name, } web_identity_token_file = os.environ.get('AWS_WEB_IDENTITY_TOKEN_FILE') if web_identity_token_file: with open(web_identity_token_file) as f: web_identity_token = f.read() params['Action'] = 'AssumeRoleWithWebIdentity' params['WebIdentityToken'] = web_identity_token encoded_params = '&'.join([ '%s=%s' % (k, s3_quote(v, unicode_output=True)) for k, v in params.items() ]) sts_endpoint = "sts.amazonaws.com" if os.environ.get("AWS_STS_REGIONAL_ENDPOINTS") == "regional": # Check if the AWS_REGION variable is available to use as a region. region = os.environ.get("AWS_REGION") if not region: # Otherwise use the bucket location region = self.bucket_location sts_endpoint = "sts.%s.amazonaws.com" % region conn = httplib.HTTPSConnection(host=sts_endpoint, timeout=2) conn.request('POST', '/?' + encoded_params) resp = conn.getresponse() resp_content = resp.read() if resp.status == 200 and len(resp_content) > 1: tree = getTreeFromXml(resp_content) result_dict = getDictFromTree(tree) if tree.tag == "AssumeRoleResponse": creds = result_dict['AssumeRoleResult']['Credentials'] elif tree.tag == "AssumeRoleWithWebIdentityResponse": creds = result_dict['AssumeRoleWithWebIdentityResult']['Credentials'] else: raise IOError("Unexpected XML message from STS server: <%s />" % tree.tag) Config().update_option('access_key', creds['AccessKeyId']) Config().update_option('secret_key', creds['SecretAccessKey']) Config().update_option('access_token', creds['SessionToken']) expiration = dateRFC822toPython(base_unicodise(creds['Expiration'])) # Add a timedelta to prevent any expiration if the EC2 machine is not at the right date self._access_token_expiration = expiration - datetime.timedelta(minutes=15) # last update date is not provided in STS responses self._access_token_last_update = datetime.datetime.now(dateutil.tz.tzutc()) # Others variables : Code / Type else: raise IOError else: conn = httplib.HTTPConnection(host='169.254.169.254', timeout=2) # To use Instance Metadata Service (IMDSv2), we first need to obtain a token, then # supply it with every IMDS HTTP call. More info: # # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html # # 60 seconds is arbitrary, but since we're just pulling small bits of data from the # local instance, it should be plenty of time. # # There's a chance that there are "mostly AWS compatible" systems that might offer # only IMDSv1 emulation, so we make this optional -- if we can't get the token, we # just proceed without. # # More discussion at https://github.com/Hyperbase/hyperbase/pull/22259 # imds_auth = {} try: imds_ttl = {"X-aws-ec2-metadata-token-ttl-seconds": "60"} conn.request('PUT', "/latest/api/token", headers=imds_ttl) resp = conn.getresponse() resp_content = resp.read() if resp.status == 200: imds_token = base_unicodise(resp_content) imds_auth = {"X-aws-ec2-metadata-token": imds_token} except Exception: # Ensure to close the connection in case of timeout or # anything. This will avoid CannotSendRequest errors for # the next request. conn.close() conn.request('GET', "/latest/meta-data/iam/security-credentials/", headers=imds_auth) resp = conn.getresponse() files = resp.read() if resp.status == 200 and len(files) > 1: conn.request('GET', "/latest/meta-data/iam/security-credentials/%s" % files.decode('utf-8'), headers=imds_auth) resp=conn.getresponse() if resp.status == 200: resp_content = base_unicodise(resp.read()) creds = json.loads(resp_content) Config().update_option('access_key', base_unicodise(creds['AccessKeyId'])) Config().update_option('secret_key', base_unicodise(creds['SecretAccessKey'])) Config().update_option('access_token', base_unicodise(creds['Token'])) expiration = dateRFC822toPython(base_unicodise(creds['Expiration'])) # Add a timedelta to prevent any expiration if the EC2 machine is not at the right date self._access_token_expiration = expiration - datetime.timedelta(minutes=15) self._access_token_last_update = dateRFC822toPython(base_unicodise(creds['LastUpdated'])) # Others variables : Code / Type else: raise IOError else: raise IOError except Exception: raise def role_refresh(self): if self._access_token_refresh: now = datetime.datetime.now(dateutil.tz.tzutc()) if self._access_token_expiration \ and now < self._access_token_expiration \ and self._access_token_last_update \ and self._access_token_last_update <= now: # current token is still valid. No need to refresh it return try: self.role_config() except Exception: warning("Could not refresh role") def aws_credential_file(self): try: aws_credential_file = os.path.expanduser('~/.aws/credentials') credential_file_from_env = os.environ.get('AWS_CREDENTIAL_FILE') if credential_file_from_env and \ os.path.isfile(credential_file_from_env): aws_credential_file = base_unicodise(credential_file_from_env) elif not os.path.isfile(aws_credential_file): return config = PyConfigParser() debug("Reading AWS credentials from %s" % (aws_credential_file)) with io.open(aws_credential_file, "r", encoding=getattr(self, 'encoding', 'UTF-8')) as fp: config_string = fp.read() try: try: # readfp is replaced by read_file in python3, # but so far readfp it is still available. config.readfp(io.StringIO(config_string)) except MissingSectionHeaderError: # if header is missing, this could be deprecated # credentials file format as described here: # https://blog.csanchez.org/2011/05/ # then do the hacky-hack and add default header # to be able to read the file with PyConfigParser() config_string = u'[default]\n' + config_string config.readfp(io.StringIO(config_string)) except ParsingError as exc: raise ValueError( "Error reading aws_credential_file " "(%s): %s" % (aws_credential_file, str(exc))) profile = base_unicodise(os.environ.get('AWS_PROFILE', "default")) debug("Using AWS profile '%s'" % (profile)) # get_key - helper function to read the aws profile credentials # including the legacy ones as described here: # https://blog.csanchez.org/2011/05/ def get_key(profile, key, legacy_key, print_warning=True): result = None try: result = config.get(profile, key) except NoOptionError as e: # we may want to skip warning message for optional keys if print_warning: warning("Couldn't find key '%s' for the AWS Profile " "'%s' in the credentials file '%s'", e.option, e.section, aws_credential_file) # if the legacy_key defined and original one wasn't found, # try read the legacy_key if legacy_key: try: key = legacy_key profile = "default" result = config.get(profile, key) warning( "Legacy configuration key '%s' used, please use" " the standardized config format as described " "here: https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/", key) except NoOptionError as e: pass if result: debug("Found the configuration option '%s' for the AWS " "Profile '%s' in the credentials file %s", key, profile, aws_credential_file) return result profile_access_key = get_key(profile, "aws_access_key_id", "AWSAccessKeyId") if profile_access_key: Config().update_option('access_key', base_unicodise(profile_access_key)) profile_secret_key = get_key(profile, "aws_secret_access_key", "AWSSecretKey") if profile_secret_key: Config().update_option('secret_key', base_unicodise(profile_secret_key)) profile_access_token = get_key(profile, "aws_session_token", None, False) if profile_access_token: Config().update_option('access_token', base_unicodise(profile_access_token)) except IOError as e: warning("Errno %d accessing credentials file %s", e.errno, aws_credential_file) except NoSectionError as e: warning("Couldn't find AWS Profile '%s' in the credentials file " "'%s'", profile, aws_credential_file) def option_list(self): retval = [] for option in dir(self): ## Skip attributes that start with underscore or are not string, int or bool option_type = type(getattr(Config, option)) if option.startswith("_") or \ not (option_type in ( type(u"string"), # str type(42), # int type(True))): # bool continue retval.append(option) return retval def read_config_file(self, configfile): cp = ConfigParser(configfile) for option in self.option_list(): _option = cp.get(option) if _option is not None: _option = _option.strip() self.update_option(option, _option) # allow acl_public to be set from the config file too, even though by # default it is set to None, and not present in the config file. if cp.get('acl_public'): self.update_option('acl_public', cp.get('acl_public')) if cp.get('add_headers'): for option in cp.get('add_headers').split(","): (key, value) = option.split(':', 1) self.extra_headers[key.strip()] = value.strip() self._parsed_files.append(configfile) def dump_config(self, stream): ConfigDumper(stream).dump(u"default", self) def update_option(self, option, value): if value is None: return #### Handle environment reference if unicode(value).startswith("$"): return self.update_option(option, os.getenv(value[1:])) #### Special treatment of some options ## verbosity must be known to "logging" module if option == "verbosity": # support integer verboisities try: value = int(value) except ValueError: try: # otherwise it must be a key known to the logging module try: # python 3 support value = logging._levelNames[value] except AttributeError: value = logging._nameToLevel[value] except KeyError: raise ValueError("Config: verbosity level '%s' is not valid" % value) elif option == "limitrate": #convert kb,mb to bytes if value.endswith("k") or value.endswith("K"): shift = 10 elif value.endswith("m") or value.endswith("M"): shift = 20 else: shift = 0 try: value = shift and int(value[:-1]) << shift or int(value) except Exception: raise ValueError("Config: value of option %s must have suffix m, k, or nothing, not '%s'" % (option, value)) ## allow yes/no, true/false, on/off and 1/0 for boolean options ## Some options default to None, if that's the case check the value to see if it is bool elif (type(getattr(Config, option)) is type(True) or # Config is bool (getattr(Config, option) is None and is_bool(value))): # Config is None and value is bool if is_bool_true(value): value = True elif is_bool_false(value): value = False else: raise ValueError("Config: value of option '%s' must be Yes or No, not '%s'" % (option, value)) elif type(getattr(Config, option)) is type(42): # int try: value = int(value) except ValueError: raise ValueError("Config: value of option '%s' must be an integer, not '%s'" % (option, value)) elif option in ["host_base", "host_bucket", "cloudfront_host"]: if value.startswith("http://"): value = value[7:] elif value.startswith("https://"): value = value[8:] setattr(Config, option, value) class ConfigParser(object): def __init__(self, file, sections = []): self.cfg = {} self.parse_file(file, sections) def parse_file(self, file, sections = []): debug("ConfigParser: Reading file '%s'" % file) if type(sections) != type([]): sections = [sections] in_our_section = True r_comment = re.compile(r'^\s*#.*') r_empty = re.compile(r'^\s*$') r_section = re.compile(r'^\[([^\]]+)\]') r_data = re.compile(r'^\s*(?P\w+)\s*=\s*(?P.*)') r_quotes = re.compile(r'^"(.*)"\s*$') with io.open(file, "r", encoding=self.get('encoding', 'UTF-8')) as fp: for line in fp: if r_comment.match(line) or r_empty.match(line): continue is_section = r_section.match(line) if is_section: section = is_section.groups()[0] in_our_section = (section in sections) or (len(sections) == 0) continue is_data = r_data.match(line) if is_data and in_our_section: data = is_data.groupdict() if r_quotes.match(data["value"]): data["value"] = data["value"][1:-1] self.__setitem__(data["key"], data["value"]) if data["key"] in ("access_key", "secret_key", "gpg_passphrase"): print_value = ("%s...%d_chars...%s") % (data["value"][:2], len(data["value"]) - 3, data["value"][-1:]) else: print_value = data["value"] debug("ConfigParser: %s->%s" % (data["key"], print_value)) continue warning("Ignoring invalid line in '%s': %s" % (file, line)) def __getitem__(self, name): return self.cfg[name] def __setitem__(self, name, value): self.cfg[name] = value def get(self, name, default = None): if name in self.cfg: return self.cfg[name] return default class ConfigDumper(object): def __init__(self, stream): self.stream = stream def dump(self, section, config): self.stream.write(u"[%s]\n" % section) for option in config.option_list(): value = getattr(config, option) if option == "verbosity": # we turn level numbers back into strings if possible if isinstance(value, int): try: try: # python 3 support value = logging._levelNames[value] except AttributeError: value = logging._levelToName[value] except KeyError: pass self.stream.write(u"%s = %s\n" % (option, value)) # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/ACL.py0000664000175100017510000002063114534034713013360 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 - Access Control List representation ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, print_function import sys from .BaseUtils import getTreeFromXml, encode_to_s3, decode_from_s3 from .Utils import deunicodise try: import xml.etree.ElementTree as ET except ImportError: import elementtree.ElementTree as ET PY3 = (sys.version_info >= (3, 0)) class Grantee(object): ALL_USERS_URI = "http://acs.amazonaws.com/groups/global/AllUsers" LOG_DELIVERY_URI = "http://acs.amazonaws.com/groups/s3/LogDelivery" def __init__(self): self.xsi_type = None self.tag = None self.name = None self.display_name = '' self.permission = None def __repr__(self): return repr('Grantee("%(tag)s", "%(name)s", "%(permission)s")' % { "tag" : self.tag, "name" : self.name, "permission" : self.permission }) def isAllUsers(self): return self.tag == "URI" and self.name == Grantee.ALL_USERS_URI def isAnonRead(self): return self.isAllUsers() and (self.permission == "READ" or self.permission == "FULL_CONTROL") def isAnonWrite(self): return self.isAllUsers() and (self.permission == "WRITE" or self.permission == "FULL_CONTROL") def getElement(self): el = ET.Element("Grant") grantee = ET.SubElement(el, "Grantee", { 'xmlns:xsi' : 'http://www.w3.org/2001/XMLSchema-instance', 'xsi:type' : self.xsi_type }) name = ET.SubElement(grantee, self.tag) name.text = self.name permission = ET.SubElement(el, "Permission") permission.text = self.permission return el class GranteeAnonRead(Grantee): def __init__(self): Grantee.__init__(self) self.xsi_type = "Group" self.tag = "URI" self.name = Grantee.ALL_USERS_URI self.permission = "READ" class GranteeLogDelivery(Grantee): def __init__(self, permission): """ permission must be either READ_ACP or WRITE """ Grantee.__init__(self) self.xsi_type = "Group" self.tag = "URI" self.name = Grantee.LOG_DELIVERY_URI self.permission = permission class ACL(object): EMPTY_ACL = b"" def __init__(self, xml = None): if not xml: xml = ACL.EMPTY_ACL self.grantees = [] self.owner_id = "" self.owner_nick = "" tree = getTreeFromXml(encode_to_s3(xml)) self.parseOwner(tree) self.parseGrants(tree) def parseOwner(self, tree): self.owner_id = tree.findtext(".//Owner//ID") self.owner_nick = tree.findtext(".//Owner//DisplayName") def parseGrants(self, tree): for grant in tree.findall(".//Grant"): grantee = Grantee() g = grant.find(".//Grantee") grantee.xsi_type = g.attrib['{http://www.w3.org/2001/XMLSchema-instance}type'] grantee.permission = grant.find('Permission').text for el in g: if el.tag == "DisplayName": grantee.display_name = el.text else: grantee.tag = el.tag grantee.name = el.text self.grantees.append(grantee) def getGrantList(self): acl = [] for grantee in self.grantees: if grantee.display_name: user = grantee.display_name elif grantee.isAllUsers(): user = "*anon*" else: user = grantee.name acl.append({'grantee': user, 'permission': grantee.permission}) return acl def getOwner(self): return { 'id' : self.owner_id, 'nick' : self.owner_nick } def isAnonRead(self): for grantee in self.grantees: if grantee.isAnonRead(): return True return False def isAnonWrite(self): for grantee in self.grantees: if grantee.isAnonWrite(): return True return False def grantAnonRead(self): if not self.isAnonRead(): self.appendGrantee(GranteeAnonRead()) def revokeAnonRead(self): self.grantees = [g for g in self.grantees if not g.isAnonRead()] def revokeAnonWrite(self): self.grantees = [g for g in self.grantees if not g.isAnonWrite()] def appendGrantee(self, grantee): self.grantees.append(grantee) def hasGrant(self, name, permission): name = name.lower() permission = permission.upper() for grantee in self.grantees: if grantee.name.lower() == name: if grantee.permission == "FULL_CONTROL": return True elif grantee.permission.upper() == permission: return True return False def grant(self, name, permission): if self.hasGrant(name, permission): return permission = permission.upper() if "ALL" == permission: permission = "FULL_CONTROL" if "FULL_CONTROL" == permission: self.revoke(name, "ALL") grantee = Grantee() grantee.name = name grantee.permission = permission if '@' in name: grantee.name = grantee.name.lower() grantee.xsi_type = "AmazonCustomerByEmail" grantee.tag = "EmailAddress" elif 'http://acs.amazonaws.com/groups/' in name: grantee.xsi_type = "Group" grantee.tag = "URI" else: grantee.name = grantee.name.lower() grantee.xsi_type = "CanonicalUser" grantee.tag = "ID" self.appendGrantee(grantee) def revoke(self, name, permission): name = name.lower() permission = permission.upper() if "ALL" == permission: self.grantees = [g for g in self.grantees if not (g.name.lower() == name or (g.display_name is not None and g.display_name.lower() == name))] else: self.grantees = [g for g in self.grantees if not (((g.display_name is not None and g.display_name.lower() == name) or g.name.lower() == name) and g.permission.upper() == permission)] def get_printable_tree(self): tree = getTreeFromXml(ACL.EMPTY_ACL) tree.attrib['xmlns'] = "http://s3.amazonaws.com/doc/2006-03-01/" owner = tree.find(".//Owner//ID") owner.text = self.owner_id acl = tree.find(".//AccessControlList") for grantee in self.grantees: acl.append(grantee.getElement()) return tree def __unicode__(self): return decode_from_s3(ET.tostring(self.get_printable_tree())) def __str__(self): if PY3: # Return unicode return ET.tostring(self.get_printable_tree(), encoding="unicode") else: # Return bytes return ET.tostring(self.get_printable_tree()) if __name__ == "__main__": xml = b""" 12345678901234567890 owner-nickname 12345678901234567890 owner-nickname FULL_CONTROL http://acs.amazonaws.com/groups/global/AllUsers READ """ acl = ACL(xml) print("Grants:", acl.getGrantList()) acl.revokeAnonRead() print("Grants:", acl.getGrantList()) acl.grantAnonRead() print("Grants:", acl.getGrantList()) print(acl) # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/HashCache.py0000664000175100017510000000364514534034713014576 0ustar floflo00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import try: # python 3 support import cPickle as pickle except ImportError: import pickle from .Utils import deunicodise class HashCache(object): def __init__(self): self.inodes = dict() def add(self, dev, inode, mtime, size, md5): if dev == 0 or inode == 0: return # Windows if dev not in self.inodes: self.inodes[dev] = dict() if inode not in self.inodes[dev]: self.inodes[dev][inode] = dict() self.inodes[dev][inode][mtime] = dict(md5=md5, size=size) def md5(self, dev, inode, mtime, size): try: d = self.inodes[dev][inode][mtime] if d['size'] != size: return None except Exception: return None return d['md5'] def mark_all_for_purge(self): for d in tuple(self.inodes): for i in tuple(self.inodes[d]): for c in tuple(self.inodes[d][i]): self.inodes[d][i][c]['purge'] = True def unmark_for_purge(self, dev, inode, mtime, size): try: d = self.inodes[dev][inode][mtime] except KeyError: return if d['size'] == size and 'purge' in d: del self.inodes[dev][inode][mtime]['purge'] def purge(self): for d in tuple(self.inodes): for i in tuple(self.inodes[d]): for m in tuple(self.inodes[d][i]): if 'purge' in self.inodes[d][i][m]: del self.inodes[d][i] break def save(self, f): d = dict(inodes=self.inodes, version=1) with open(deunicodise(f), 'wb') as fp: pickle.dump(d, fp) def load(self, f): with open(deunicodise(f), 'rb') as fp: d = pickle.load(fp) if d.get('version') == 1 and 'inodes' in d: self.inodes = d['inodes'] s3cmd-2.4.0/S3/BaseUtils.py0000664000175100017510000002432414534034713014657 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, division import functools import re import posixpath import sys from calendar import timegm from hashlib import md5 from logging import debug, warning, error import xml.dom.minidom import xml.etree.ElementTree as ET from .ExitCodes import EX_OSFILE try: import dateutil.parser except ImportError: sys.stderr.write(u""" !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ImportError trying to import dateutil.parser. Please install the python dateutil module: $ sudo apt-get install python-dateutil or $ sudo yum install python-dateutil or $ pip install python-dateutil !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! """) sys.stderr.flush() sys.exit(EX_OSFILE) try: from urllib import quote except ImportError: # python 3 support from urllib.parse import quote try: unicode = unicode except NameError: # python 3 support # In python 3, unicode -> str, and str -> bytes unicode = str __all__ = [] s3path = posixpath __all__.append("s3path") try: md5() except ValueError as exc: # md5 is disabled for FIPS-compliant Python builds. # Since s3cmd does not use md5 in a security context, # it is safe to allow the use of it by setting useforsecurity to False. try: md5(usedforsecurity=False) md5 = functools.partial(md5, usedforsecurity=False) except Exception: # "usedforsecurity" is only available on python >= 3.9 or RHEL distributions raise exc __all__.append("md5") RE_S3_DATESTRING = re.compile('\\.[0-9]*(?:[Z\\-\\+]*?)') RE_XML_NAMESPACE = re.compile(b'^(]+?>\\s*|\\s*)(<\\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE) # Date and time helpers def dateS3toPython(date): # Reset milliseconds to 000 date = RE_S3_DATESTRING.sub(".000", date) return dateutil.parser.parse(date, fuzzy=True) __all__.append("dateS3toPython") def dateS3toUnix(date): ## NOTE: This is timezone-aware and return the timestamp regarding GMT return timegm(dateS3toPython(date).utctimetuple()) __all__.append("dateS3toUnix") def dateRFC822toPython(date): """ Convert a string formatted like '2020-06-27T15:56:34Z' into a python datetime """ return dateutil.parser.parse(date, fuzzy=True) __all__.append("dateRFC822toPython") def dateRFC822toUnix(date): return timegm(dateRFC822toPython(date).utctimetuple()) __all__.append("dateRFC822toUnix") def formatDateTime(s3timestamp): date_obj = dateutil.parser.parse(s3timestamp, fuzzy=True) return date_obj.strftime("%Y-%m-%d %H:%M") __all__.append("formatDateTime") # Encoding / Decoding def base_unicodise(string, encoding='UTF-8', errors='replace', silent=False): """ Convert 'string' to Unicode or raise an exception. """ if type(string) == unicode: return string if not silent: debug("Unicodising %r using %s" % (string, encoding)) try: return unicode(string, encoding, errors) except UnicodeDecodeError: raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) __all__.append("base_unicodise") def base_deunicodise(string, encoding='UTF-8', errors='replace', silent=False): """ Convert unicode 'string' to , by default replacing all invalid characters with '?' or raise an exception. """ if type(string) != unicode: return string if not silent: debug("DeUnicodising %r using %s" % (string, encoding)) try: return string.encode(encoding, errors) except UnicodeEncodeError: raise UnicodeEncodeError("Conversion from unicode failed: %r" % string) __all__.append("base_deunicodise") def decode_from_s3(string, errors = "replace"): """ Convert S3 UTF-8 'string' to Unicode or raise an exception. """ return base_unicodise(string, "UTF-8", errors, True) __all__.append("decode_from_s3") def encode_to_s3(string, errors='replace'): """ Convert Unicode to S3 UTF-8 'string', by default replacing all invalid characters with '?' or raise an exception. """ return base_deunicodise(string, "UTF-8", errors, True) __all__.append("encode_to_s3") def s3_quote(param, quote_backslashes=True, unicode_output=False): """ URI encode every byte. UriEncode() must enforce the following rules: - URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. - The space character is a reserved character and must be encoded as "%20" (and not as "+"). - Each URI encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte. - Letters in the hexadecimal value must be uppercase, for example "%1A". - Encode the forward slash character, '/', everywhere except in the object key name. For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded. """ if quote_backslashes: safe_chars = "~" else: safe_chars = "~/" param = encode_to_s3(param) param = quote(param, safe=safe_chars) if unicode_output: param = decode_from_s3(param) else: param = encode_to_s3(param) return param __all__.append("s3_quote") def base_urlencode_string(string, urlencoding_mode = None, unicode_output=False): string = encode_to_s3(string) if urlencoding_mode == "verbatim": ## Don't do any pre-processing return string encoded = quote(string, safe="~/") debug("String '%s' encoded to '%s'" % (string, encoded)) if unicode_output: return decode_from_s3(encoded) else: return encode_to_s3(encoded) __all__.append("base_urlencode_string") def base_replace_nonprintables(string, with_message=False): """ replace_nonprintables(string) Replaces all non-printable characters 'ch' in 'string' where ord(ch) <= 26 with ^@, ^A, ... ^Z """ new_string = "" modified = 0 for c in string: o = ord(c) if (o <= 31): new_string += "^" + chr(ord('@') + o) modified += 1 elif (o == 127): new_string += "^?" modified += 1 else: new_string += c if modified and with_message: warning("%d non-printable characters replaced in: %s" % (modified, new_string)) return new_string __all__.append("base_replace_nonprintables") # XML helpers def parseNodes(nodes): ## WARNING: Ignores text nodes from mixed xml/text. ## For instance some textother text ## will be ignore "some text" node ## WARNING 2: Any node at first level without children will also be ignored retval = [] for node in nodes: retval_item = {} for child in node: name = decode_from_s3(child.tag) if len(child): retval_item[name] = parseNodes([child]) else: found_text = node.findtext(".//%s" % child.tag) if found_text is not None: retval_item[name] = decode_from_s3(found_text) else: retval_item[name] = None if retval_item: retval.append(retval_item) return retval __all__.append("parseNodes") def getPrettyFromXml(xmlstr): xmlparser = xml.dom.minidom.parseString(xmlstr) return xmlparser.toprettyxml() __all__.append("getPrettyFromXml") def stripNameSpace(xml): """ removeNameSpace(xml) -- remove top-level AWS namespace Operate on raw byte(utf-8) xml string. (Not unicode) """ xmlns_match = RE_XML_NAMESPACE.match(xml) if xmlns_match: xmlns = xmlns_match.group(3) xml = RE_XML_NAMESPACE.sub(b"\\1\\2", xml, 1) else: xmlns = None return xml, xmlns __all__.append("stripNameSpace") def getTreeFromXml(xml): xml, xmlns = stripNameSpace(encode_to_s3(xml)) try: tree = ET.fromstring(xml) if xmlns: tree.attrib['xmlns'] = xmlns return tree except Exception as e: error("Error parsing xml: %s", e) error(xml) raise __all__.append("getTreeFromXml") def getListFromXml(xml, node): tree = getTreeFromXml(xml) nodes = tree.findall('.//%s' % (node)) return parseNodes(nodes) __all__.append("getListFromXml") def getDictFromTree(tree): ret_dict = {} for child in tree: if len(child): ## Complex-type child. Recurse content = getDictFromTree(child) else: content = decode_from_s3(child.text) if child.text is not None else None child_tag = decode_from_s3(child.tag) if child_tag in ret_dict: if not type(ret_dict[child_tag]) == list: ret_dict[child_tag] = [ret_dict[child_tag]] ret_dict[child_tag].append(content or "") else: ret_dict[child_tag] = content or "" return ret_dict __all__.append("getDictFromTree") def getTextFromXml(xml, xpath): tree = getTreeFromXml(xml) if tree.tag.endswith(xpath): return decode_from_s3(tree.text) if tree.text is not None else None else: result = tree.findtext(xpath) return decode_from_s3(result) if result is not None else None __all__.append("getTextFromXml") def getRootTagName(xml): tree = getTreeFromXml(xml) return decode_from_s3(tree.tag) if tree.tag is not None else None __all__.append("getRootTagName") def xmlTextNode(tag_name, text): el = ET.Element(tag_name) el.text = decode_from_s3(text) return el __all__.append("xmlTextNode") def appendXmlTextNode(tag_name, text, parent): """ Creates a new Node and sets its content to 'text'. Then appends the created Node to 'parent' element if given. Returns the newly created Node. """ el = xmlTextNode(tag_name, text) parent.append(el) return el __all__.append("appendXmlTextNode") # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/ExitCodes.py0000664000175100017510000000431614534034713014652 0ustar floflo00000000000000# -*- coding: utf-8 -*- # patterned on /usr/include/sysexits.h EX_OK = 0 EX_GENERAL = 1 EX_PARTIAL = 2 # some parts of the command succeeded, while others failed EX_SERVERMOVED = 10 # 301: Moved permanently & 307: Moved temp EX_SERVERERROR = 11 # 400, 405, 411, 416, 417, 501: Bad request, 504: Gateway Time-out EX_NOTFOUND = 12 # 404: Not found EX_CONFLICT = 13 # 409: Conflict (ex: bucket error) EX_PRECONDITION = 14 # 412: Precondition failed EX_SERVICE = 15 # 503: Service not available or slow down EX_USAGE = 64 # The command was used incorrectly (e.g. bad command line syntax) EX_DATAERR = 65 # Failed file transfer, upload or download EX_SOFTWARE = 70 # internal software error (e.g. S3 error of unknown specificity) EX_OSERR = 71 # system error (e.g. out of memory) EX_OSFILE = 72 # OS error (e.g. invalid Python version) EX_IOERR = 74 # An error occurred while doing I/O on some file. EX_TEMPFAIL = 75 # temporary failure (S3DownloadError or similar, retry later) EX_ACCESSDENIED = 77 # Insufficient permissions to perform the operation on S3 EX_CONFIG = 78 # Configuration file error EX_CONNECTIONREFUSED = 111 # TCP connection refused (e.g. connecting to a closed server port) _EX_SIGNAL = 128 _EX_SIGINT = 2 EX_BREAK = _EX_SIGNAL + _EX_SIGINT # Control-C (KeyboardInterrupt raised) class ExitScoreboard(object): """Helper to return best return code""" def __init__(self): self._success = 0 self._notfound = 0 self._failed = 0 def success(self): self._success += 1 def notfound(self): self._notfound += 1 def failed(self): self._failed += 1 def rc(self): if self._success: if not self._failed and not self._notfound: return EX_OK elif self._failed: return EX_PARTIAL else: if self._failed: return EX_GENERAL else: if self._notfound: return EX_NOTFOUND return EX_GENERAL s3cmd-2.4.0/S3/Progress.py0000664000175100017510000002055314534034713014570 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, division import sys import datetime import time import S3.Utils class Progress(object): _stdout = sys.stdout _last_display = 0 def __init__(self, labels, total_size): self._stdout = sys.stdout self.new_file(labels, total_size) def new_file(self, labels, total_size): self.labels = labels self.total_size = total_size # Set initial_position to something in the # case we're not counting from 0. For instance # when appending to a partially downloaded file. # Setting initial_position will let the speed # be computed right. self.initial_position = 0 self.current_position = self.initial_position self.time_start = datetime.datetime.now() self.time_last = self.time_start self.time_current = self.time_start self.display(new_file = True) def update(self, current_position = -1, delta_position = -1): self.time_last = self.time_current self.time_current = datetime.datetime.now() if current_position > -1: self.current_position = current_position elif delta_position > -1: self.current_position += delta_position #else: # no update, just call display() self.display() def done(self, message): self.display(done_message = message) def output_labels(self): self._stdout.write(u"%(action)s: '%(source)s' -> '%(destination)s' %(extra)s\n" % self.labels) self._stdout.flush() def _display_needed(self): # We only need to update the display every so often. if time.time() - self._last_display > 1: self._last_display = time.time() return True return False def display(self, new_file = False, done_message = None): """ display(new_file = False[/True], done = False[/True]) Override this method to provide a nicer output. """ if new_file: self.output_labels() self.last_milestone = 0 return if self.current_position == self.total_size: print_size = S3.Utils.formatSize(self.current_position, True) if print_size[1] != "": print_size[1] += "B" timedelta = self.time_current - self.time_start sec_elapsed = timedelta.days * 86400 + timedelta.seconds + float(timedelta.microseconds) / 1000000.0 print_speed = S3.Utils.formatSize((self.current_position - self.initial_position) / sec_elapsed, True, True) self._stdout.write("100%% %s%s in %.2fs (%.2f %sB/s)\n" % (print_size[0], print_size[1], sec_elapsed, print_speed[0], print_speed[1])) self._stdout.flush() return rel_position = (self.current_position * 100) // self.total_size if rel_position >= self.last_milestone: # Move by increments of 5. # NOTE: to check: Looks like to not do what is looks like to be designed to do self.last_milestone = (rel_position // 5) * 5 self._stdout.write("%d%% ", self.last_milestone) self._stdout.flush() return class ProgressANSI(Progress): ## http://en.wikipedia.org/wiki/ANSI_escape_code SCI = '\x1b[' ANSI_hide_cursor = SCI + "?25l" ANSI_show_cursor = SCI + "?25h" ANSI_save_cursor_pos = SCI + "s" ANSI_restore_cursor_pos = SCI + "u" ANSI_move_cursor_to_column = SCI + "%uG" ANSI_erase_to_eol = SCI + "0K" ANSI_erase_current_line = SCI + "2K" def display(self, new_file = False, done_message = None): """ display(new_file = False[/True], done_message = None) """ if new_file: self.output_labels() self._stdout.write(self.ANSI_save_cursor_pos) self._stdout.flush() return # Only display progress every so often if not (new_file or done_message) and not self._display_needed(): return timedelta = self.time_current - self.time_start sec_elapsed = timedelta.days * 86400 + timedelta.seconds + float(timedelta.microseconds)/1000000.0 if (sec_elapsed > 0): print_speed = S3.Utils.formatSize((self.current_position - self.initial_position) / sec_elapsed, True, True) else: print_speed = (0, "") self._stdout.write(self.ANSI_restore_cursor_pos) self._stdout.write(self.ANSI_erase_to_eol) self._stdout.write("%(current)s of %(total)s %(percent)3d%% in %(elapsed)ds %(speed).2f %(speed_coeff)sB/s" % { "current" : str(self.current_position).rjust(len(str(self.total_size))), "total" : self.total_size, "percent" : self.total_size and ((self.current_position * 100) // self.total_size) or 0, "elapsed" : sec_elapsed, "speed" : print_speed[0], "speed_coeff" : print_speed[1] }) if done_message: self._stdout.write(" %s\n" % done_message) self._stdout.flush() class ProgressCR(Progress): ## Uses CR char (Carriage Return) just like other progress bars do. CR_char = chr(13) def display(self, new_file = False, done_message = None): """ display(new_file = False[/True], done_message = None) """ if new_file: self.output_labels() return # Only display progress every so often if not (new_file or done_message) and not self._display_needed(): return timedelta = self.time_current - self.time_start sec_elapsed = timedelta.days * 86400 + timedelta.seconds + float(timedelta.microseconds)/1000000.0 if (sec_elapsed > 0): print_speed = S3.Utils.formatSize((self.current_position - self.initial_position) / sec_elapsed, True, True) else: print_speed = (0, "") self._stdout.write(self.CR_char) output = " %(current)s of %(total)s %(percent)3d%% in %(elapsed)4ds %(speed)7.2f %(speed_coeff)sB/s" % { "current" : str(self.current_position).rjust(len(str(self.total_size))), "total" : self.total_size, "percent" : self.total_size and ((self.current_position * 100) // self.total_size) or 0, "elapsed" : sec_elapsed, "speed" : print_speed[0], "speed_coeff" : print_speed[1] } self._stdout.write(output) if done_message: self._stdout.write(" %s\n" % done_message) self._stdout.flush() class StatsInfo(object): """Holding info for stats totals""" def __init__(self): self.files = None self.size = None self.files_transferred = None self.size_transferred = None self.files_copied = None self.size_copied = None self.files_deleted = None self.size_deleted = None def format_output(self): outstr = u"" if self.files is not None: tmp_str = u"Number of files: %d"% self.files if self.size is not None: tmp_str += " (%d bytes) "% self.size outstr += u"\nStats: " + tmp_str if self.files_transferred: tmp_str = u"Number of files transferred: %d"% self.files_transferred if self.size_transferred is not None: tmp_str += " (%d bytes) "% self.size_transferred outstr += u"\nStats: " + tmp_str if self.files_copied: tmp_str = u"Number of files copied: %d"% self.files_copied if self.size_copied is not None: tmp_str += " (%d bytes) "% self.size_copied outstr += u"\nStats: " + tmp_str if self.files_deleted: tmp_str = u"Number of files deleted: %d"% self.files_deleted if self.size_deleted is not None: tmp_str += " (%d bytes) "% self.size_deleted outstr += u"\nStats: " + tmp_str return outstr # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/S3.py0000664000175100017510000031455414535730271013263 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, division import sys import os import time import errno import mimetypes import io import pprint from xml.sax import saxutils from socket import timeout as SocketTimeoutException from logging import debug, info, warning, error from stat import ST_SIZE, ST_MODE, S_ISDIR, S_ISREG try: # python 3 support from urlparse import urlparse except ImportError: from urllib.parse import urlparse import select from .BaseUtils import (getListFromXml, getTextFromXml, getRootTagName, decode_from_s3, encode_to_s3, md5, s3_quote) from .Utils import (convertHeaderTupleListToDict, unicodise, deunicodise, check_bucket_name, check_bucket_name_dns_support, getHostnameFromBucket) from .SortedDict import SortedDict from .AccessLog import AccessLog from .ACL import ACL, GranteeLogDelivery from .BidirMap import BidirMap from .Config import Config from .Exceptions import * from .MultiPart import MultiPartUpload from .S3Uri import S3Uri from .ConnMan import ConnMan from .Crypto import (sign_request_v2, sign_request_v4, checksum_sha256_file, checksum_sha256_buffer, generate_content_md5, hash_file_md5, calculateChecksum, format_param_str) try: from ctypes import ArgumentError import magic try: ## https://github.com/ahupp/python-magic ## Always expect unicode for python 2 ## (has Magic class but no "open()" function) magic_ = magic.Magic(mime=True) def mime_magic_file(file): return magic_.from_file(file) except TypeError: try: ## file-5.11 built-in python bindings ## Sources: http://www.darwinsys.com/file/ ## Expects unicode since version 5.19, encoded strings before ## we can't tell if a given copy of the magic library will take a ## filesystem-encoded string or a unicode value, so try first ## with the unicode, then with the encoded string. ## (has Magic class and "open()" function) magic_ = magic.open(magic.MAGIC_MIME) magic_.load() def mime_magic_file(file): try: return magic_.file(file) except (UnicodeDecodeError, UnicodeEncodeError, ArgumentError): return magic_.file(deunicodise(file)) except AttributeError: ## http://pypi.python.org/pypi/filemagic ## Accept gracefully both unicode and encoded ## (has Magic class but not "mime" argument and no "open()" function ) magic_ = magic.Magic(flags=magic.MAGIC_MIME) def mime_magic_file(file): return magic_.id_filename(file) except AttributeError: ## Older python-magic versions doesn't have a "Magic" method ## Only except encoded strings ## (has no Magic class but "open()" function) magic_ = magic.open(magic.MAGIC_MIME) magic_.load() def mime_magic_file(file): return magic_.file(deunicodise(file)) except (ImportError, OSError) as e: error_str = str(e) if 'magic' in error_str: magic_message = "Module python-magic is not available." else: magic_message = "Module python-magic can't be used (%s)." % error_str magic_message += " Guessing MIME types based on file extensions." magic_warned = False def mime_magic_file(file): global magic_warned if (not magic_warned): warning(magic_message) magic_warned = True return mimetypes.guess_type(file)[0] def mime_magic(file): ## NOTE: So far in the code, "file" var is already unicode def _mime_magic(file): magictype = mime_magic_file(file) return magictype result = _mime_magic(file) if result is not None: if isinstance(result, str): if ';' in result: mimetype, charset = result.split(';') charset = charset[len('charset'):] result = (mimetype, charset) else: result = (result, None) if result is None: result = (None, None) return result EXPECT_CONTINUE_TIMEOUT = 2 SIZE_1MB = 1024 * 1024 __all__ = [] class S3Request(object): region_map = {} ## S3 sometimes sends HTTP-301, HTTP-307 response redir_map = {} def __init__(self, s3, method_string, resource, headers, body, params = None): self.s3 = s3 self.headers = SortedDict(headers or {}, ignore_case = True) if len(self.s3.config.access_token)>0: self.s3.config.role_refresh() self.headers['x-amz-security-token']=self.s3.config.access_token self.resource = resource self.method_string = method_string self.params = params or {} self.body = body self.requester_pays() def requester_pays(self): if self.s3.config.requester_pays and self.method_string in ("GET", "POST", "PUT", "HEAD"): self.headers['x-amz-request-payer'] = 'requester' def update_timestamp(self): if "date" in self.headers: del(self.headers["date"]) self.headers["x-amz-date"] = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime()) def use_signature_v2(self): if self.s3.endpoint_requires_signature_v4: return False if self.s3.config.signature_v2 or self.s3.fallback_to_signature_v2: return True return False def sign(self): bucket_name = self.resource.get('bucket') if self.use_signature_v2(): debug("Using signature v2") if bucket_name: resource_uri = "/%s%s" % (bucket_name, self.resource['uri']) else: resource_uri = self.resource['uri'] self.headers = sign_request_v2(self.method_string, resource_uri, self.params, self.headers) else: debug("Using signature v4") hostname = self.s3.get_hostname(self.resource['bucket']) ## Default to bucket part of DNS. ## If bucket is not part of DNS assume path style to complete the request. ## Like for format_uri, take care that redirection could be to base path if bucket_name and ( (bucket_name in S3Request.redir_map and not S3Request.redir_map.get(bucket_name, '').startswith("%s."% bucket_name)) or (bucket_name not in S3Request.redir_map and not check_bucket_name_dns_support(Config().host_bucket, bucket_name)) ): resource_uri = "/%s%s" % (bucket_name, self.resource['uri']) else: resource_uri = self.resource['uri'] bucket_region = S3Request.region_map.get(self.resource['bucket'], Config().bucket_location) ## Sign the data. self.headers = sign_request_v4(self.method_string, hostname, resource_uri, self.params, bucket_region, self.headers, self.body) def get_triplet(self): self.update_timestamp() self.sign() resource = dict(self.resource) ## take a copy # URL Encode the uri for the http request resource['uri'] = s3_quote(resource['uri'], quote_backslashes=False, unicode_output=True) # Get the final uri by adding the uri parameters resource['uri'] += format_param_str(self.params) return (self.method_string, resource, self.headers) class S3(object): http_methods = BidirMap( GET = 0x01, PUT = 0x02, HEAD = 0x04, DELETE = 0x08, POST = 0x10, MASK = 0x1F, ) targets = BidirMap( SERVICE = 0x0100, BUCKET = 0x0200, OBJECT = 0x0400, BATCH = 0x0800, MASK = 0x0700, ) operations = BidirMap( UNDEFINED = 0x0000, LIST_ALL_BUCKETS = targets["SERVICE"] | http_methods["GET"], BUCKET_CREATE = targets["BUCKET"] | http_methods["PUT"], BUCKET_LIST = targets["BUCKET"] | http_methods["GET"], BUCKET_DELETE = targets["BUCKET"] | http_methods["DELETE"], OBJECT_PUT = targets["OBJECT"] | http_methods["PUT"], OBJECT_GET = targets["OBJECT"] | http_methods["GET"], OBJECT_HEAD = targets["OBJECT"] | http_methods["HEAD"], OBJECT_DELETE = targets["OBJECT"] | http_methods["DELETE"], OBJECT_POST = targets["OBJECT"] | http_methods["POST"], BATCH_DELETE = targets["BATCH"] | http_methods["POST"], ) codes = { "NoSuchBucket" : "Bucket '%s' does not exist", "AccessDenied" : "Access to bucket '%s' was denied", "BucketAlreadyExists" : "Bucket '%s' already exists", } def __init__(self, config): self.config = config self.fallback_to_signature_v2 = False self.endpoint_requires_signature_v4 = False self.expect_continue_not_supported = False def storage_class(self): # Note - you cannot specify GLACIER here # https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-class-intro.html cls = 'STANDARD' if self.config.storage_class != "": return self.config.storage_class if self.config.reduced_redundancy: cls = 'REDUCED_REDUNDANCY' return cls def get_hostname(self, bucket): if bucket and bucket in S3Request.redir_map: host = S3Request.redir_map[bucket] elif bucket and check_bucket_name_dns_support(self.config.host_bucket, bucket): host = getHostnameFromBucket(bucket) else: host = self.config.host_base.lower() # The following hack is needed because it looks like that some servers # are not respecting the HTTP spec and so will fail the signature check # if the port is specified in the "Host" header for default ports. # STUPIDIEST THING EVER FOR A SERVER... # See: https://github.com/minio/minio/issues/9169 if self.config.use_https: if host.endswith(':443'): host = host[:-4] elif host.endswith(':80'): host = host[:-3] debug('get_hostname(%s): %s' % (bucket, host)) return host def set_hostname(self, bucket, redir_hostname): S3Request.redir_map[bucket] = redir_hostname.lower() def format_uri(self, resource, base_path=None): bucket_name = resource.get('bucket') if bucket_name and ( (bucket_name in S3Request.redir_map and not S3Request.redir_map.get(bucket_name, '').startswith("%s."% bucket_name)) or (bucket_name not in S3Request.redir_map and not check_bucket_name_dns_support(self.config.host_bucket, bucket_name)) ): uri = "/%s%s" % (s3_quote(bucket_name, quote_backslashes=False, unicode_output=True), resource['uri']) else: uri = resource['uri'] if base_path: uri = "%s%s" % (base_path, uri) if self.config.proxy_host != "" and not self.config.use_https: uri = "http://%s%s" % (self.get_hostname(bucket_name), uri) debug('format_uri(): ' + uri) return uri ## Commands / Actions def list_all_buckets(self): request = self.create_request("LIST_ALL_BUCKETS") response = self.send_request(request) response["list"] = getListFromXml(response["data"], "Bucket") return response def bucket_list(self, bucket, prefix = None, recursive = None, uri_params = None, limit = -1): item_list = [] prefixes = [] for truncated, dirs, objects in self.bucket_list_streaming(bucket, prefix, recursive, uri_params, limit): item_list.extend(objects) prefixes.extend(dirs) response = {} response['list'] = item_list response['common_prefixes'] = prefixes response['truncated'] = truncated return response def bucket_list_streaming(self, bucket, prefix = None, recursive = None, uri_params = None, limit = -1): """ Generator that produces , pairs of groups of content of a specified bucket. """ def _list_truncated(data): ## can either be "true" or "false" or be missing completely is_truncated = getTextFromXml(data, ".//IsTruncated") or "false" return is_truncated.lower() != "false" def _get_contents(data): return getListFromXml(data, "Contents") def _get_common_prefixes(data): return getListFromXml(data, "CommonPrefixes") def _get_next_marker(data, current_elts, key): return getTextFromXml(response["data"], "NextMarker") or current_elts[-1][key] uri_params = uri_params and uri_params.copy() or {} truncated = True prefixes = [] num_objects = 0 num_prefixes = 0 max_keys = limit while truncated: response = self.bucket_list_noparse(bucket, prefix, recursive, uri_params, max_keys) current_list = _get_contents(response["data"]) current_prefixes = _get_common_prefixes(response["data"]) num_objects += len(current_list) num_prefixes += len(current_prefixes) if limit > num_objects + num_prefixes: max_keys = limit - (num_objects + num_prefixes) truncated = _list_truncated(response["data"]) if truncated: if limit == -1 or num_objects + num_prefixes < limit: if current_list: uri_params['marker'] = \ _get_next_marker(response["data"], current_list, "Key") elif current_prefixes: uri_params['marker'] = \ _get_next_marker(response["data"], current_prefixes, "Prefix") else: # Unexpectedly, the server lied, and so the previous # response was not truncated. So, no new key to get. yield False, current_prefixes, current_list break debug("Listing continues after '%s'" % uri_params['marker']) else: yield truncated, current_prefixes, current_list break yield truncated, current_prefixes, current_list def bucket_list_noparse(self, bucket, prefix = None, recursive = None, uri_params = None, max_keys = -1): if uri_params is None: uri_params = {} if prefix: uri_params['prefix'] = prefix if not self.config.recursive and not recursive: uri_params['delimiter'] = "/" if max_keys != -1: uri_params['max-keys'] = str(max_keys) if self.config.list_allow_unordered: uri_params['allow-unordered'] = "true" request = self.create_request("BUCKET_LIST", bucket = bucket, uri_params = uri_params) response = self.send_request(request) #debug(response) return response def bucket_create(self, bucket, bucket_location = None, extra_headers = None): headers = SortedDict(ignore_case = True) if extra_headers: headers.update(extra_headers) body = "" if bucket_location and bucket_location.strip().upper() != "US" and bucket_location.strip().lower() != "us-east-1": bucket_location = bucket_location.strip() if bucket_location.upper() == "EU": bucket_location = bucket_location.upper() body = "" body += bucket_location body += "" debug("bucket_location: " + body) check_bucket_name(bucket, dns_strict = True) else: check_bucket_name(bucket, dns_strict = False) if self.config.acl_public: headers["x-amz-acl"] = "public-read" # AWS suddenly changed the default "ownership" control value mid 2023. # ACL are disabled by default, so obviously the bucket can't be public. # See: https://aws.amazon.com/fr/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/ # To be noted: "Block Public Access" flags should also be disabled after the bucket creation to be able to set a "public" acl for an object. headers["x-amz-object-ownership"] = 'ObjectWriter' request = self.create_request("BUCKET_CREATE", bucket = bucket, headers = headers, body = body) response = self.send_request(request) return response def bucket_delete(self, bucket): request = self.create_request("BUCKET_DELETE", bucket = bucket) response = self.send_request(request) return response def get_bucket_location(self, uri, force_us_default=False): bucket = uri.bucket() request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), uri_params = {'location': None}) saved_redir_map = S3Request.redir_map.get(bucket, '') saved_region_map = S3Request.region_map.get(bucket, '') try: if force_us_default and not (saved_redir_map and saved_region_map): S3Request.redir_map[bucket] = self.config.host_base S3Request.region_map[bucket] = 'us-east-1' response = self.send_request(request) finally: if bucket in saved_redir_map: S3Request.redir_map[bucket] = saved_redir_map elif bucket in S3Request.redir_map: del S3Request.redir_map[bucket] if bucket in saved_region_map: S3Request.region_map[bucket] = saved_region_map elif bucket in S3Request.region_map: del S3Request.region_map[bucket] location = getTextFromXml(response['data'], "LocationConstraint") if not location or location in [ "", "US" ]: location = "us-east-1" elif location == "EU": location = "eu-west-1" return location def get_bucket_requester_pays(self, uri): request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), uri_params={'requestPayment': None}) response = self.send_request(request) resp_data = response.get('data', '') if resp_data: payer = getTextFromXml(resp_data, "Payer") else: payer = None return payer def set_bucket_ownership(self, uri, ownership): headers = SortedDict(ignore_case=True) body = '' \ '' \ '%s' \ '' \ '' body = body % ownership debug(u"set_bucket_ownership(%s)" % body) headers['content-md5'] = generate_content_md5(body) request = self.create_request("BUCKET_CREATE", uri = uri, headers = headers, body = body, uri_params = {'ownershipControls': None}) response = self.send_request(request) return response def get_bucket_ownership(self, uri): request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), uri_params={'ownershipControls': None}) response = self.send_request(request) resp_data = response.get('data', '') if resp_data: ownership = getTextFromXml(resp_data, ".//Rule//ObjectOwnership") else: ownership = None return ownership def set_bucket_public_access_block(self, uri, flags): headers = SortedDict(ignore_case=True) body = '' for tag in ('BlockPublicAcls', 'IgnorePublicAcls', 'BlockPublicPolicy', 'RestrictPublicBuckets'): val = flags.get(tag, False) and "true" or "false" body += '<%s>%s' % (tag, val, tag) body += '' debug(u"set_bucket_public_access_block(%s)" % body) headers['content-md5'] = generate_content_md5(body) request = self.create_request("BUCKET_CREATE", uri = uri, headers = headers, body = body, uri_params = {'publicAccessBlock': None}) response = self.send_request(request) return response def get_bucket_public_access_block(self, uri): request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), uri_params={'publicAccessBlock': None}) response = self.send_request(request) resp_data = response.get('data', '') if resp_data: flags = { "BlockPublicAcls": getTextFromXml(resp_data, "BlockPublicAcls") == "true", "IgnorePublicAcls": getTextFromXml(resp_data, "IgnorePublicAcls") == "true", "BlockPublicPolicy": getTextFromXml(resp_data, "BlockPublicPolicy") == "true", "RestrictPublicBuckets": getTextFromXml(resp_data, "RestrictPublicBuckets") == "true", } else: flags = {} return flags def bucket_info(self, uri): response = {} response['bucket-location'] = self.get_bucket_location(uri) for key, func in (('requester-pays', self.get_bucket_requester_pays), ('versioning', self.get_versioning), ('ownership', self.get_bucket_ownership)): try: response[key] = func(uri) except S3Error as e: response[key] = None try: response['public-access-block'] = self.get_bucket_public_access_block(uri) except S3Error as e: response['public-access-block'] = {} return response def website_info(self, uri, bucket_location = None): bucket = uri.bucket() request = self.create_request("BUCKET_LIST", bucket = bucket, uri_params = {'website': None}) try: response = self.send_request(request) response['index_document'] = getTextFromXml(response['data'], ".//IndexDocument//Suffix") response['error_document'] = getTextFromXml(response['data'], ".//ErrorDocument//Key") response['website_endpoint'] = self.config.website_endpoint % { "bucket" : uri.bucket(), "location" : self.get_bucket_location(uri)} return response except S3Error as e: if e.status == 404: debug("Could not get /?website - website probably not configured for this bucket") return None raise def website_create(self, uri, bucket_location = None): bucket = uri.bucket() body = '' body += ' ' body += (' %s' % self.config.website_index) body += ' ' if self.config.website_error: body += ' ' body += (' %s' % self.config.website_error) body += ' ' body += '' request = self.create_request("BUCKET_CREATE", bucket = bucket, body = body, uri_params = {'website': None}) response = self.send_request(request) debug("Received response '%s'" % (response)) return response def website_delete(self, uri, bucket_location = None): bucket = uri.bucket() request = self.create_request("BUCKET_DELETE", bucket = bucket, uri_params = {'website': None}) response = self.send_request(request) debug("Received response '%s'" % (response)) if response['status'] != 204: raise S3ResponseError("Expected status 204: %s" % response) return response def expiration_info(self, uri, bucket_location = None): bucket = uri.bucket() request = self.create_request("BUCKET_LIST", bucket=bucket, uri_params={'lifecycle': None}) try: response = self.send_request(request) except S3Error as e: if e.status == 404: debug("Could not get /?lifecycle - lifecycle probably not " "configured for this bucket") return None elif e.status == 501: debug("Could not get /?lifecycle - lifecycle support not " "implemented by the server") return None raise root_tag_name = getRootTagName(response['data']) if root_tag_name != "LifecycleConfiguration": debug("Could not get /?lifecycle - unexpected xml response: " "%s", root_tag_name) return None response['prefix'] = getTextFromXml(response['data'], ".//Rule//Prefix") response['date'] = getTextFromXml(response['data'], ".//Rule//Expiration//Date") response['days'] = getTextFromXml(response['data'], ".//Rule//Expiration//Days") return response def expiration_set(self, uri, bucket_location = None): if self.config.expiry_date and self.config.expiry_days: raise ParameterError("Expect either --expiry-day or --expiry-date") if not (self.config.expiry_date or self.config.expiry_days): if self.config.expiry_prefix: raise ParameterError("Expect either --expiry-day or --expiry-date") debug("del bucket lifecycle") bucket = uri.bucket() request = self.create_request("BUCKET_DELETE", bucket = bucket, uri_params = {'lifecycle': None}) else: request = self._expiration_set(uri) response = self.send_request(request) debug("Received response '%s'" % (response)) return response def _expiration_set(self, uri): debug("put bucket lifecycle") body = '' body += ' ' body += ' ' body += ' %s' % self.config.expiry_prefix body += ' ' body += ' Enabled' body += ' ' if self.config.expiry_date: body += ' %s' % self.config.expiry_date elif self.config.expiry_days: body += ' %s' % self.config.expiry_days body += ' ' body += ' ' body += '' headers = SortedDict(ignore_case = True) headers['content-md5'] = generate_content_md5(body) bucket = uri.bucket() request = self.create_request("BUCKET_CREATE", bucket = bucket, headers = headers, body = body, uri_params = {'lifecycle': None}) return (request) def _guess_content_type(self, filename): content_type = self.config.default_mime_type content_charset = None if filename == "-" and not self.config.default_mime_type: raise ParameterError("You must specify --mime-type or --default-mime-type for files uploaded from stdin.") if self.config.guess_mime_type: if self.config.follow_symlinks: filename = unicodise(os.path.realpath(deunicodise(filename))) if self.config.use_mime_magic: (content_type, content_charset) = mime_magic(filename) else: (content_type, content_charset) = mimetypes.guess_type(filename) if not content_type: content_type = self.config.default_mime_type return (content_type, content_charset) def stdin_content_type(self): content_type = self.config.mime_type if not content_type: content_type = self.config.default_mime_type content_type += "; charset=" + self.config.encoding.upper() return content_type def content_type(self, filename=None, is_dir=False): # explicit command line argument always wins content_type = self.config.mime_type content_charset = None if filename == u'-': return self.stdin_content_type() if is_dir: content_type = 'application/x-directory' elif not content_type: (content_type, content_charset) = self._guess_content_type(filename) ## add charset to content type if not content_charset: content_charset = self.config.encoding.upper() if self.add_encoding(filename, content_type) and content_charset is not None: content_type = content_type + "; charset=" + content_charset return content_type def add_encoding(self, filename, content_type): if 'charset=' in content_type: return False exts = self.config.add_encoding_exts.split(',') if exts[0]=='': return False parts = filename.rsplit('.',2) if len(parts) < 2: return False ext = parts[1] if ext in exts: return True else: return False def object_put(self, filename, uri, extra_headers = None, extra_label = ""): # TODO TODO # Make it consistent with stream-oriented object_get() if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) try: is_dir = False size = 0 if filename == "-": is_stream = True src_stream = io.open(sys.stdin.fileno(), mode='rb', closefd=False) src_stream.stream_name = u'' else: is_stream = False filename_bytes = deunicodise(filename) stat = os.stat(filename_bytes) mode = stat[ST_MODE] if S_ISDIR(mode): is_dir = True # Dirs are represented as empty objects on S3 src_stream = io.BytesIO(b'') elif not S_ISREG(mode): raise InvalidFileError(u"Not a regular file") else: # Standard normal file src_stream = io.open(filename_bytes, mode='rb') size = stat[ST_SIZE] src_stream.stream_name = filename except (IOError, OSError) as e: raise InvalidFileError(u"%s" % e.strerror) headers = SortedDict(ignore_case=True) if extra_headers: headers.update(extra_headers) ## Set server side encryption if self.config.server_side_encryption: headers["x-amz-server-side-encryption"] = "AES256" ## Set kms headers if self.config.kms_key: headers['x-amz-server-side-encryption'] = 'aws:kms' headers['x-amz-server-side-encryption-aws-kms-key-id'] = self.config.kms_key ## MIME-type handling headers["content-type"] = self.content_type(filename=filename, is_dir=is_dir) ## Other Amazon S3 attributes if self.config.acl_public: headers["x-amz-acl"] = "public-read" headers["x-amz-storage-class"] = self.storage_class() ## Multipart decision multipart = False if not self.config.enable_multipart and is_stream: raise ParameterError("Multi-part upload is required to upload from stdin") if self.config.enable_multipart: if size > self.config.multipart_chunk_size_mb * SIZE_1MB or is_stream: multipart = True if size > self.config.multipart_max_chunks * self.config.multipart_chunk_size_mb * SIZE_1MB: raise ParameterError("Chunk size %d MB results in more than %d chunks. Please increase --multipart-chunk-size-mb" % \ (self.config.multipart_chunk_size_mb, self.config.multipart_max_chunks)) if multipart: # Multipart requests are quite different... drop here return self.send_file_multipart(src_stream, headers, uri, size, extra_label) ## Not multipart... if self.config.put_continue: # Note, if input was stdin, we would be performing multipart upload. # So this will always work as long as the file already uploaded was # not uploaded via MultiUpload, in which case its ETag will not be # an md5. try: info = self.object_info(uri) except Exception: info = None if info is not None: remote_size = int(info['headers']['content-length']) remote_checksum = info['headers']['etag'].strip('"\'') if size == remote_size: checksum = calculateChecksum('', src_stream, 0, size, self.config.send_chunk) if remote_checksum == checksum: warning("Put: size and md5sum match for %s, skipping." % uri) return else: warning("MultiPart: checksum (%s vs %s) does not match for %s, reuploading." % (remote_checksum, checksum, uri)) else: warning("MultiPart: size (%d vs %d) does not match for %s, reuploading." % (remote_size, size, uri)) headers["content-length"] = str(size) request = self.create_request("OBJECT_PUT", uri = uri, headers = headers) labels = { 'source' : filename, 'destination' : uri.uri(), 'extra' : extra_label } response = self.send_file(request, src_stream, labels) return response def object_get(self, uri, stream, dest_name, start_position = 0, extra_label = ""): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) request = self.create_request("OBJECT_GET", uri = uri) labels = { 'source' : uri.uri(), 'destination' : dest_name, 'extra' : extra_label } response = self.recv_file(request, stream, labels, start_position) return response def object_batch_delete(self, remote_list): """ Batch delete given a remote_list """ uris = [remote_list[item]['object_uri_str'] for item in remote_list] return self.object_batch_delete_uri_strs(uris) def object_batch_delete_uri_strs(self, uris): """ Batch delete given a list of object uris """ def compose_batch_del_xml(bucket, key_list): body = u"" for key in key_list: uri = S3Uri(key) if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) if not uri.has_object(): raise ValueError("URI '%s' has no object" % key) if uri.bucket() != bucket: raise ValueError("The batch should contain keys from the same bucket") object = saxutils.escape(uri.object()) body += u"%s" % object body += u"" body = encode_to_s3(body) return body batch = uris if len(batch) == 0: raise ValueError("Key list is empty") bucket = S3Uri(batch[0]).bucket() request_body = compose_batch_del_xml(bucket, batch) headers = SortedDict({'content-md5': generate_content_md5(request_body), 'content-type': 'application/xml'}, ignore_case=True) request = self.create_request("BATCH_DELETE", bucket = bucket, headers = headers, body = request_body, uri_params = {'delete': None}) response = self.send_request(request) return response def object_delete(self, uri): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) request = self.create_request("OBJECT_DELETE", uri = uri) response = self.send_request(request) return response def object_restore(self, uri): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) if self.config.restore_days < 1: raise ParameterError("You must restore a file for 1 or more days") if self.config.restore_priority not in ['Standard', 'Expedited', 'Bulk']: raise ParameterError("Valid restoration priorities: bulk, standard, expedited") body = '' body += (' %s' % self.config.restore_days) body += ' ' body += (' %s' % self.config.restore_priority) body += ' ' body += '' request = self.create_request("OBJECT_POST", uri = uri, body = body, uri_params = {'restore': None}) response = self.send_request(request) debug("Received response '%s'" % (response)) return response def _sanitize_headers(self, headers): to_remove = [ # from http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html 'date', 'content-length', 'last-modified', 'content-md5', 'x-amz-version-id', 'x-amz-delete-marker', # other headers returned from object_info() we don't want to send 'accept-ranges', 'connection', 'etag', 'server', 'x-amz-id-2', 'x-amz-request-id', # Cloudflare's R2 header we don't want to send 'cf-ray', # Other headers that are not copying by a direct copy 'x-amz-storage-class', ## We should probably also add server-side encryption headers ] for h in to_remove + self.config.remove_headers: if h.lower() in headers: del headers[h.lower()] return headers def object_copy(self, src_uri, dst_uri, extra_headers=None, src_size=None, extra_label="", replace_meta=False): """Remote copy an object and eventually set metadata Note: A little memo description of the nightmare for performance here: ** FOR AWS, 2 cases: - COPY will copy the metadata of the source to dest, but you can't modify them. Any additional header will be ignored anyway. - REPLACE will set the additional metadata headers that are provided but will not copy any of the source headers. So, to add to existing meta during copy, you have to do an object_info to get original source headers, then modify, then use REPLACE for the copy operation. ** For Minio and maybe other implementations: - if additional headers are sent, they will be set to the destination on top of source original meta in all cases COPY and REPLACE. It is a nice behavior except that it is different of the aws one. As it was still too easy, there is another catch: In all cases, for multipart copies, metadata data are never copied from the source. """ if src_uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % src_uri.type) if dst_uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % dst_uri.type) if self.config.acl_public is None: try: acl = self.get_acl(src_uri) except S3Error as exc: # Ignore the exception and don't fail the copy # if the server doesn't support setting ACLs if exc.status != 501: raise exc acl = None multipart = False headers = None if extra_headers or self.config.mime_type: # Force replace, that will force getting meta with object_info() replace_meta = True if replace_meta: src_info = self.object_info(src_uri) headers = src_info['headers'] src_size = int(headers["content-length"]) if self.config.enable_multipart: # Get size of remote source only if multipart is enabled and that no # size info was provided src_headers = headers if src_size is None: src_info = self.object_info(src_uri) src_headers = src_info['headers'] src_size = int(src_headers["content-length"]) # If we are over the grand maximum size for a normal copy/modify # (> 5GB) go nuclear and use multipart copy as the only option to # modify an object. # Reason is an aws s3 design bug. See: # https://github.com/aws/aws-sdk-java/issues/367 if src_uri is dst_uri: # optimisation in the case of modify threshold = MultiPartUpload.MAX_CHUNK_SIZE_MB * SIZE_1MB else: threshold = self.config.multipart_copy_chunk_size_mb * SIZE_1MB if src_size > threshold: # Sadly, s3 has a bad logic as metadata will not be copied for # multipart copy unlike what is done for direct copies. # TODO: Optimize by re-using the object_info request done # earlier earlier at fetch remote stage, and preserve headers. if src_headers is None: src_info = self.object_info(src_uri) src_headers = src_info['headers'] src_size = int(src_headers["content-length"]) headers = src_headers multipart = True if headers: self._sanitize_headers(headers) headers = SortedDict(headers, ignore_case=True) else: headers = SortedDict(ignore_case=True) # Following meta data are updated even in COPY by aws if self.config.acl_public: headers["x-amz-acl"] = "public-read" headers["x-amz-storage-class"] = self.storage_class() ## Set server side encryption if self.config.server_side_encryption: headers["x-amz-server-side-encryption"] = "AES256" ## Set kms headers if self.config.kms_key: headers['x-amz-server-side-encryption'] = 'aws:kms' headers['x-amz-server-side-encryption-aws-kms-key-id'] = \ self.config.kms_key # Following meta data are not updated in simple COPY by aws. if extra_headers: headers.update(extra_headers) if self.config.mime_type: headers["content-type"] = self.config.mime_type # "COPY" or "REPLACE" if not replace_meta: headers['x-amz-metadata-directive'] = "COPY" else: headers['x-amz-metadata-directive'] = "REPLACE" if multipart: # Multipart decision. Only do multipart copy for remote s3 files # bigger than the multipart copy threshold. # Multipart requests are quite different... delegate response = self.copy_file_multipart(src_uri, dst_uri, src_size, headers, extra_label) else: # Not multipart... direct request headers['x-amz-copy-source'] = s3_quote( "/%s/%s" % (src_uri.bucket(), src_uri.object()), quote_backslashes=False, unicode_output=True) request = self.create_request("OBJECT_PUT", uri=dst_uri, headers=headers) response = self.send_request(request) if response["data"] and getRootTagName(response["data"]) == "Error": # http://doc.s3.amazonaws.com/proposals/copy.html # Error during copy, status will be 200, so force error code 500 response["status"] = 500 error("Server error during the COPY operation. Overwrite response " "status to 500") raise S3Error(response) if self.config.acl_public is None and acl: try: self.set_acl(dst_uri, acl) except S3Error as exc: # Ignore the exception and don't fail the copy # if the server doesn't support setting ACLs if exc.status != 501: raise exc return response def object_modify(self, src_uri, dst_uri, extra_headers=None, src_size=None, extra_label=""): # dst_uri = src_uri Will optimize by using multipart just in worst case return self.object_copy(src_uri, src_uri, extra_headers, src_size, extra_label, replace_meta=True) def object_move(self, src_uri, dst_uri, extra_headers=None, src_size=None, extra_label=""): response_copy = self.object_copy(src_uri, dst_uri, extra_headers, src_size, extra_label) debug("Object %s copied to %s" % (src_uri, dst_uri)) if not response_copy["data"] \ or getRootTagName(response_copy["data"]) \ in ["CopyObjectResult", "CompleteMultipartUploadResult"]: self.object_delete(src_uri) debug("Object '%s' deleted", src_uri) else: warning("Object '%s' NOT deleted because of an unexpected " "response data content.", src_uri) return response_copy def object_info(self, uri): request = self.create_request("OBJECT_HEAD", uri=uri) try: response = self.send_request(request) except S3Error as exc: # A HEAD request will not have body, even in the case of an error # so we can't get the usual XML error content. # Add fake similar content in such a case if exc.status == 404 and not exc.code: exc.code = 'NoSuchKey' exc.message = 'The specified key does not exist.' exc.resource = uri raise exc return response def get_acl(self, uri): if uri.has_object(): request = self.create_request("OBJECT_GET", uri=uri, uri_params={'acl': None}) else: request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), uri_params={'acl': None}) response = self.send_request(request) acl = ACL(response['data']) return acl def set_acl(self, uri, acl): body = u"%s"% acl debug(u"set_acl(%s): acl-xml: %s" % (uri, body)) headers = SortedDict({'content-type': 'application/xml'}, ignore_case = True) if uri.has_object(): request = self.create_request("OBJECT_PUT", uri = uri, headers = headers, body = body, uri_params = {'acl': None}) else: request = self.create_request("BUCKET_CREATE", bucket = uri.bucket(), headers = headers, body = body, uri_params = {'acl': None}) response = self.send_request(request) return response def set_versioning(self, uri, enabled): headers = SortedDict(ignore_case = True) status = "Enabled" if enabled is True else "Suspended" body = '' body += '%s' % status body += '' debug(u"set_versioning(%s)" % body) headers['content-md5'] = generate_content_md5(body) request = self.create_request("BUCKET_CREATE", uri = uri, headers = headers, body = body, uri_params = {'versioning': None}) response = self.send_request(request) return response def get_versioning(self, uri): request = self.create_request("BUCKET_LIST", uri = uri, uri_params = {'versioning': None}) response = self.send_request(request) return getTextFromXml(response['data'], "Status") def get_policy(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), uri_params = {'policy': None}) response = self.send_request(request) return decode_from_s3(response['data']) def set_object_legal_hold(self, uri, legal_hold_status): body = '' body += '%s' % legal_hold_status body += '' headers = SortedDict(ignore_case = True) headers['content-type'] = 'application/xml' headers['content-md5'] = generate_content_md5(body) request = self.create_request("OBJECT_PUT", uri = uri, headers = headers, body = body, uri_params = {'legal-hold': None}) response = self.send_request(request) return response def set_object_retention(self, uri, mode, retain_until_date): body = '' body += '%s' % mode body += '%s' % retain_until_date body += '' headers = SortedDict(ignore_case = True) headers['content-type'] = 'application/xml' headers['content-md5'] = generate_content_md5(body) request = self.create_request("OBJECT_PUT", uri = uri, headers = headers, body = body, uri_params = {'retention': None}) response = self.send_request(request) return response def set_policy(self, uri, policy): headers = SortedDict(ignore_case = True) # TODO check policy is proper json string headers['content-type'] = 'application/json' request = self.create_request("BUCKET_CREATE", uri = uri, headers=headers, body = policy, uri_params = {'policy': None}) response = self.send_request(request) return response def delete_policy(self, uri): request = self.create_request("BUCKET_DELETE", uri = uri, uri_params = {'policy': None}) debug(u"delete_policy(%s)" % uri) response = self.send_request(request) return response def get_cors(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), uri_params = {'cors': None}) response = self.send_request(request) return decode_from_s3(response['data']) def set_cors(self, uri, cors): headers = SortedDict(ignore_case = True) # TODO check cors is proper json string headers['content-type'] = 'application/xml' headers['content-md5'] = generate_content_md5(cors) request = self.create_request("BUCKET_CREATE", uri = uri, headers=headers, body = cors, uri_params = {'cors': None}) response = self.send_request(request) return response def delete_cors(self, uri): request = self.create_request("BUCKET_DELETE", uri = uri, uri_params = {'cors': None}) debug(u"delete_cors(%s)" % uri) response = self.send_request(request) return response def set_lifecycle_policy(self, uri, policy): headers = SortedDict(ignore_case = True) headers['content-md5'] = generate_content_md5(policy) request = self.create_request("BUCKET_CREATE", uri = uri, headers=headers, body = policy, uri_params = {'lifecycle': None}) debug(u"set_lifecycle_policy(%s): policy-xml: %s" % (uri, policy)) response = self.send_request(request) return response def set_payer(self, uri): headers = SortedDict(ignore_case = True) headers['content-type'] = 'application/xml' body = '\n' if self.config.requester_pays: body += 'Requester\n' else: body += 'BucketOwner\n' body += '\n' request = self.create_request("BUCKET_CREATE", uri = uri, body = body, uri_params = {'requestPayment': None}) response = self.send_request(request) return response def get_lifecycle_policy(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), uri_params = {'lifecycle': None}) debug(u"get_lifecycle_policy(%s)" % uri) response = self.send_request(request) debug(u"%s: Got Lifecycle Policy" % response['status']) return response def delete_lifecycle_policy(self, uri): request = self.create_request("BUCKET_DELETE", uri = uri, uri_params = {'lifecycle': None}) debug(u"delete_lifecycle_policy(%s)" % uri) response = self.send_request(request) return response def set_notification_policy(self, uri, policy): headers = SortedDict(ignore_case = True) if self.config.skip_destination_validation: headers["x-amz-skip-destination-validation"] = "True" request = self.create_request("BUCKET_CREATE", uri = uri, headers = headers, body = policy, uri_params = {'notification': None}) debug(u"set_notification_policy(%s): policy-xml: %s" % (uri, policy)) response = self.send_request(request) return response def get_notification_policy(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), uri_params = {'notification': None}) debug(u"get_notification_policy(%s)" % uri) response = self.send_request(request) debug(u"%s: Got notification Policy" % response['status']) return response def delete_notification_policy(self, uri): empty_config = '' return self.set_notification_policy(uri, empty_config) def set_tagging(self, uri, tagsets): if uri.type != "s3": raise ValueError("Expected URI type 's3', got '%s'" % uri.type) body = '' body += '' for (key, val) in tagsets: body += '' body += (' %s' % key) body += (' %s' % val) body += '' body += '' body += '' headers = SortedDict(ignore_case=True) headers['content-md5'] = generate_content_md5(body) if uri.has_object(): request = self.create_request("OBJECT_PUT", uri=uri, headers=headers, body=body, uri_params={'tagging': None}) else: request = self.create_request("BUCKET_CREATE", bucket=uri.bucket(), headers=headers, body=body, uri_params={'tagging': None}) debug(u"set_tagging(%s): tagset-xml: %s" % (uri, body)) response = self.send_request(request) return response def get_tagging(self, uri): if uri.has_object(): request = self.create_request("OBJECT_GET", uri=uri, uri_params={'tagging': None}) else: request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), uri_params={'tagging': None}) debug(u"get_tagging(%s)" % uri) response = self.send_request(request) xml_data = response["data"] # extract list of tag sets tagsets = getListFromXml(xml_data, "Tag") debug(u"%s: Got object tagging" % response['status']) return tagsets def delete_tagging(self, uri): if uri.has_object(): request = self.create_request("OBJECT_DELETE", uri=uri, uri_params={'tagging': None}) else: request = self.create_request("BUCKET_DELETE", bucket=uri.bucket(), uri_params={'tagging': None}) debug(u"delete_tagging(%s)" % uri) response = self.send_request(request) return response def get_multipart(self, uri, uri_params=None, limit=-1): upload_list = [] for truncated, uploads in self.get_multipart_streaming(uri, uri_params, limit): upload_list.extend(uploads) return upload_list def get_multipart_streaming(self, uri, uri_params=None, limit=-1): uri_params = uri_params and uri_params.copy() or {} bucket = uri.bucket() truncated = True num_objects = 0 max_keys = limit # It is the "uploads: None" in uri_params that will change the # behavior of bucket_list to return multiparts instead of keys uri_params['uploads'] = None while truncated: response = self.bucket_list_noparse(bucket, recursive=True, uri_params=uri_params, max_keys=max_keys) xml_data = response["data"] # extract list of info of uploads upload_list = getListFromXml(xml_data, "Upload") num_objects += len(upload_list) if limit > num_objects: max_keys = limit - num_objects xml_truncated = getTextFromXml(xml_data, ".//IsTruncated") if not xml_truncated or xml_truncated.lower() == "false": truncated = False if truncated: if limit == -1 or num_objects < limit: if upload_list: next_key = getTextFromXml(xml_data, "NextKeyMarker") if not next_key: next_key = upload_list[-1]["Key"] uri_params['KeyMarker'] = next_key upload_id_marker = getTextFromXml( xml_data, "NextUploadIdMarker") if upload_id_marker: uri_params['UploadIdMarker'] = upload_id_marker elif 'UploadIdMarker' in uri_params: # Clear any pre-existing value del uri_params['UploadIdMarker'] else: # Unexpectedly, the server lied, and so the previous # response was not truncated. So, no new key to get. yield False, upload_list break debug("Listing continues after '%s'" % uri_params['KeyMarker']) else: yield truncated, upload_list break yield truncated, upload_list def list_multipart(self, uri, upload_id, uri_params=None, limit=-1): part_list = [] for truncated, parts in self.list_multipart_streaming(uri, upload_id, uri_params, limit): part_list.extend(parts) return part_list def list_multipart_streaming(self, uri, upload_id, uri_params=None, limit=-1): uri_params = uri_params and uri_params.copy() or {} truncated = True num_objects = 0 max_parts = limit while truncated: response = self.list_multipart_noparse(uri, upload_id, uri_params, max_parts) xml_data = response["data"] # extract list of multipart upload parts part_list = getListFromXml(xml_data, "Part") num_objects += len(part_list) if limit > num_objects: max_parts = limit - num_objects xml_truncated = getTextFromXml(xml_data, ".//IsTruncated") if not xml_truncated or xml_truncated.lower() == "false": truncated = False if truncated: if limit == -1 or num_objects < limit: if part_list: next_part_number = getTextFromXml( xml_data, "NextPartNumberMarker") if not next_part_number: next_part_number = part_list[-1]["PartNumber"] uri_params['part-number-marker'] = next_part_number else: # Unexpectedly, the server lied, and so the previous # response was not truncated. So, no new part to get. yield False, part_list break debug("Listing continues after Part '%s'" % uri_params['part-number-marker']) else: yield truncated, part_list break yield truncated, part_list def list_multipart_noparse(self, uri, upload_id, uri_params=None, max_parts=-1): if uri_params is None: uri_params = {} if max_parts != -1: uri_params['max-parts'] = str(max_parts) uri_params['uploadId'] = upload_id request = self.create_request("OBJECT_GET", uri=uri, uri_params=uri_params) response = self.send_request(request) return response def abort_multipart(self, uri, id): request = self.create_request("OBJECT_DELETE", uri = uri, uri_params = {'uploadId': id}) response = self.send_request(request) return response def get_accesslog(self, uri): request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), uri_params = {'logging': None}) response = self.send_request(request) accesslog = AccessLog(response['data']) return accesslog def set_accesslog_acl(self, uri): acl = self.get_acl(uri) debug("Current ACL(%s): %s" % (uri.uri(), acl)) acl.appendGrantee(GranteeLogDelivery("READ_ACP")) acl.appendGrantee(GranteeLogDelivery("WRITE")) debug("Updated ACL(%s): %s" % (uri.uri(), acl)) self.set_acl(uri, acl) def set_accesslog(self, uri, enable, log_target_prefix_uri = None, acl_public = False): accesslog = AccessLog() if enable: accesslog.enableLogging(log_target_prefix_uri) accesslog.setAclPublic(acl_public) else: accesslog.disableLogging() body = "%s" % accesslog debug(u"set_accesslog(%s): accesslog-xml: %s" % (uri, body)) request = self.create_request("BUCKET_CREATE", bucket = uri.bucket(), body = body, uri_params = {'logging': None}) try: response = self.send_request(request) except S3Error as e: if e.info['Code'] == "InvalidTargetBucketForLogging": info("Setting up log-delivery ACL for target bucket.") self.set_accesslog_acl(S3Uri(u"s3://%s" % log_target_prefix_uri.bucket())) response = self.send_request(request) else: raise return accesslog, response def create_request(self, operation, uri = None, bucket = None, object = None, headers = None, body = "", uri_params = None): resource = { 'bucket' : None, 'uri' : "/" } if uri and (bucket or object): raise ValueError("Both 'uri' and either 'bucket' or 'object' parameters supplied") ## If URI is given use that instead of bucket/object parameters if uri: bucket = uri.bucket() object = uri.has_object() and uri.object() or None if bucket: resource['bucket'] = bucket if object: resource['uri'] = "/" + object method_string = S3.http_methods.getkey(S3.operations[operation] & S3.http_methods["MASK"]) request = S3Request(self, method_string, resource, headers, body, uri_params) debug("CreateRequest: resource[uri]=%s", resource['uri']) return request def _fail_wait(self, retries): # Wait a few seconds. The more it fails the more we wait. return (self.config.max_retries - retries + 1) * 3 def _http_redirection_handler(self, request, response, fn, *args, **kwargs): # Region info might already be available through the x-amz-bucket-region header redir_region = response['headers'].get('x-amz-bucket-region') if 'data' in response and len(response['data']) > 0: redir_bucket = getTextFromXml(response['data'], ".//Bucket") redir_hostname = getTextFromXml(response['data'], ".//Endpoint") self.set_hostname(redir_bucket, redir_hostname) info(u'Redirected to: %s', redir_hostname) if redir_region: S3Request.region_map[redir_bucket] = redir_region info(u'Redirected to region: %s', redir_region) return fn(*args, **kwargs) elif request.method_string == 'HEAD': # Head is a special case, redirection info usually are in the body # but there is no body for an HEAD request. location_url = response['headers'].get('location') if location_url: # Sometimes a "location" http header could be available and # can help us deduce the redirection path. # It is the case of "dns-style" syntax, but not for "path-style" syntax. if location_url.startswith("http://"): location_url = location_url[7:] elif location_url.startswith("https://"): location_url = location_url[8:] location_url = urlparse('https://' + location_url).hostname redir_bucket = request.resource['bucket'] self.set_hostname(redir_bucket, location_url) info(u'Redirected to: %s', location_url) if redir_region: S3Request.region_map[redir_bucket] = redir_region info(u'Redirected to region: %s', redir_region) return fn(*args, **kwargs) warning(u'Redirection error: No info provided by the server to where should be forwarded the request (HEAD request). (Hint target region: %s)', redir_region) raise S3Error(response) def _http_400_handler(self, request, response, fn, *args, **kwargs): """ Returns None if no handler available for the specific error code """ # AWS response AuthorizationHeaderMalformed means we sent the request to the wrong region # get the right region out of the response and send it there. if 'data' in response and len(response['data']) > 0: failureCode = getTextFromXml(response['data'], 'Code') if failureCode == 'AuthorizationHeaderMalformed': # we sent the request to the wrong region region = getTextFromXml(response['data'], 'Region') if region is not None: S3Request.region_map[request.resource['bucket']] = region info('Forwarding request to %s', region) return fn(*args, **kwargs) else: warning(u'Could not determine bucket the location. Please consider using the --region parameter.') elif failureCode == 'InvalidRequest': message = getTextFromXml(response['data'], 'Message') if message == 'The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.': debug(u'Endpoint requires signature v4') self.endpoint_requires_signature_v4 = True return fn(*args, **kwargs) elif failureCode == 'InvalidArgument': # returned by DreamObjects on send_request and send_file, # which doesn't support signature v4. Retry with signature v2 if not request.use_signature_v2() and not self.fallback_to_signature_v2: # have not tried with v2 yet debug(u'Falling back to signature v2') self.fallback_to_signature_v2 = True return fn(*args, **kwargs) else: # returned by DreamObjects on recv_file, which doesn't support signature v4. Retry with signature v2 if not request.use_signature_v2() and not self.fallback_to_signature_v2: # have not tried with v2 yet debug(u'Falling back to signature v2') self.fallback_to_signature_v2 = True return fn(*args, **kwargs) return None def _http_403_handler(self, request, response, fn, *args, **kwargs): if 'data' in response and len(response['data']) > 0: failureCode = getTextFromXml(response['data'], 'Code') if failureCode == 'AccessDenied': # traditional HTTP 403 message = getTextFromXml(response['data'], 'Message') if message == 'AWS authentication requires a valid Date or x-amz-date header': # message from an Eucalyptus walrus server if not request.use_signature_v2() and not self.fallback_to_signature_v2: # have not tried with v2 yet debug(u'Falling back to signature v2') self.fallback_to_signature_v2 = True return fn(*args, **kwargs) raise S3Error(response) def update_region_inner_request(self, request): """Get and update region for the request if needed. Signature v4 needs the region of the bucket or the request will fail with the indication of the correct region. We are trying to avoid this failure by pre-emptively getting the correct region to use, if not provided by the user. """ if request.resource.get('bucket') and not request.use_signature_v2() \ and S3Request.region_map.get( request.resource['bucket'], Config().bucket_location ) == "US": debug("===== SEND Inner request to determine the bucket region " "=====") try: s3_uri = S3Uri(u's3://' + request.resource['bucket']) # "force_us_default" should prevent infinite recursivity because # it will set the region_map dict. region = self.get_bucket_location(s3_uri, force_us_default=True) if region is not None: S3Request.region_map[request.resource['bucket']] = region debug("===== SUCCESS Inner request to determine the bucket " "region (%r) =====", region) except Exception as exc: # Ignore errors, it is just an optimisation, so nothing critical debug("getlocation inner request failure reason: %s", exc) debug("===== FAILED Inner request to determine the bucket " "region =====") def send_request(self, request, retries=None): if retries is None: retries = self.config.max_retries self.update_region_inner_request(request) request.body = encode_to_s3(request.body) headers = request.headers method_string, resource, headers = request.get_triplet() response = {} debug("Processing request, please wait...") conn = None try: conn = ConnMan.get(self.get_hostname(resource['bucket'])) # TODO: Check what was supposed to be the usage of conn.path here # Currently this is always "None" all the time as not defined in ConnMan uri = self.format_uri(resource, conn.path) debug("Sending request method_string=%r, uri=%r, headers=%r, body=(%i bytes)" % (method_string, uri, headers, len(request.body or ""))) conn.c.request(method_string, uri, request.body, headers) http_response = conn.c.getresponse() response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertHeaderTupleListToDict(http_response.getheaders()) response["data"] = http_response.read() if "x-amz-meta-s3cmd-attrs" in response["headers"]: attrs = parse_attrs_header(response["headers"]["x-amz-meta-s3cmd-attrs"]) response["s3cmd-attrs"] = attrs ConnMan.put(conn) except (S3SSLError, S3SSLCertificateError): # In case of failure to validate the certificate for a ssl # connection,no need to retry, abort immediately raise except (IOError, Exception) as e: debug("Response:\n" + pprint.pformat(response)) if ((hasattr(e, 'errno') and e.errno and e.errno not in (errno.EPIPE, errno.ECONNRESET, errno.ETIMEDOUT)) or "[Errno 104]" in str(e) or "[Errno 32]" in str(e) ) and not isinstance(e, SocketTimeoutException): raise # When the connection is broken, BadStatusLine is raised with py2 # and RemoteDisconnected is raised by py3 with a trap: # RemoteDisconnected has an errno field with a None value. # close the connection and re-establish ConnMan.close(conn) if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(request, retries - 1) else: raise S3RequestError("Request failed for: %s" % resource['uri']) except: # Only KeyboardInterrupt and SystemExit will not be covered by Exception debug("Response:\n" + pprint.pformat(response)) raise debug("Response:\n" + pprint.pformat(response)) if response["status"] in [301, 307]: ## RedirectTemporary or RedirectPermanent return self._http_redirection_handler(request, response, self.send_request, request) if response["status"] == 400: handler_fn = self._http_400_handler(request, response, self.send_request, request) if handler_fn: return handler_fn err = S3Error(response) if retries and err.code in ['BadDigest', 'OperationAborted', 'TokenRefreshRequired', 'RequestTimeout']: warning(u"Retrying failed request: %s (%s)" % (resource['uri'], err)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(request, retries - 1) raise err if response["status"] == 403: return self._http_403_handler(request, response, self.send_request, request) if response["status"] == 405: # Method Not Allowed. Don't retry. raise S3Error(response) if response["status"] >= 500 or response["status"] == 429: e = S3Error(response) if response["status"] == 501: ## NotImplemented server error - no need to retry retries = 0 if retries: warning(u"Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_request(request, retries - 1) else: raise e if response["status"] < 200 or response["status"] > 299: raise S3Error(response) return response def send_request_with_progress(self, request, labels, operation_size=0): """Wrapper around send_request for slow requests. To be able to show progression for small requests """ if not self.config.progress_meter: info("Sending slow request, please wait...") return self.send_request(request) if 'action' not in labels: labels[u'action'] = u'request' progress = self.config.progress_class(labels, operation_size) try: response = self.send_request(request) except Exception as exc: progress.done("failed") raise progress.update(current_position=operation_size) progress.done("done") return response def send_file(self, request, stream, labels, buffer = '', throttle = 0, retries = None, offset = 0, chunk_size = -1, use_expect_continue = None): if retries is None: retries = self.config.max_retries self.update_region_inner_request(request) if use_expect_continue is None: use_expect_continue = self.config.use_http_expect if self.expect_continue_not_supported and use_expect_continue: use_expect_continue = False headers = request.headers size_left = size_total = int(headers["content-length"]) filename = stream.stream_name if self.config.progress_meter: labels[u'action'] = u'upload' progress = self.config.progress_class(labels, size_total) else: info("Sending file '%s', please wait..." % filename) timestamp_start = time.time() if buffer: sha256_hash = checksum_sha256_buffer(buffer, offset, size_total) else: sha256_hash = checksum_sha256_file(stream, offset, size_total) request.body = sha256_hash if use_expect_continue: if not size_total: use_expect_continue = False else: headers['expect'] = '100-continue' method_string, resource, headers = request.get_triplet() try: conn = ConnMan.get(self.get_hostname(resource['bucket'])) conn.c.putrequest(method_string, self.format_uri(resource, conn.path)) for header in headers.keys(): conn.c.putheader(encode_to_s3(header), encode_to_s3(headers[header])) conn.c.endheaders() except ParameterError as e: raise except Exception as e: if self.config.progress_meter: progress.done("failed") if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.send_file(request, stream, labels, buffer, throttle, retries - 1, offset, chunk_size) else: raise S3UploadError("Upload failed for: %s" % resource['uri']) if buffer == '': stream.seek(offset) md5_hash = md5() try: http_response = None if use_expect_continue: # Wait for the 100-Continue before sending the content readable, writable, exceptional = select.select([conn.c.sock],[], [], EXPECT_CONTINUE_TIMEOUT) if readable: # 100-CONTINUE STATUS RECEIVED, get it before continuing. http_response = conn.c.getresponse() elif not writable and not exceptional: warning("HTTP Expect Continue feature disabled because of no reply of the server in %.2fs.", EXPECT_CONTINUE_TIMEOUT) self.expect_continue_not_supported = True use_expect_continue = False if not use_expect_continue or (http_response and http_response.status == ConnMan.CONTINUE): if http_response: # CONTINUE case. Reset the response http_response.read() conn.c._HTTPConnection__state = ConnMan._CS_REQ_SENT while size_left > 0: #debug("SendFile: Reading up to %d bytes from '%s' - remaining bytes: %s" % (self.config.send_chunk, filename, size_left)) l = min(self.config.send_chunk, size_left) if buffer == '': data = stream.read(l) else: data = buffer if not data: raise InvalidFileError("File smaller than expected. Was the file truncated?") if self.config.limitrate > 0: start_time = time.time() md5_hash.update(data) conn.c.wrapper_send_body(data) if self.config.progress_meter: progress.update(delta_position = len(data)) size_left -= len(data) #throttle limitrate_throttle = throttle if self.config.limitrate > 0: real_duration = time.time() - start_time expected_duration = float(l) / self.config.limitrate limitrate_throttle = max(expected_duration - real_duration, limitrate_throttle) if limitrate_throttle: time.sleep(min(limitrate_throttle, self.config.throttle_max)) md5_computed = md5_hash.hexdigest() http_response = conn.c.getresponse() response = {} response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertHeaderTupleListToDict(http_response.getheaders()) response["data"] = http_response.read() response["size"] = size_total ConnMan.put(conn) debug(u"Response:\n" + pprint.pformat(response)) except ParameterError as e: raise except InvalidFileError as e: if self.config.progress_meter: progress.done("failed") raise except Exception as e: if self.config.progress_meter: progress.done("failed") if retries: known_error = False if ((hasattr(e, 'errno') and e.errno and e.errno not in (errno.EPIPE, errno.ECONNRESET, errno.ETIMEDOUT)) or "[Errno 104]" in str(e) or "[Errno 32]" in str(e) ) and not isinstance(e, SocketTimeoutException): # We have to detect these errors by looking at the error string # Connection reset by peer and Broken pipe # The server broke the connection early with an error like # in a HTTP Expect Continue case even if asked nothing. try: http_response = conn.c.getresponse() response = {} response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertHeaderTupleListToDict(http_response.getheaders()) response["data"] = http_response.read() response["size"] = size_total known_error = True except Exception: error("Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception") if not known_error: warning("Upload failed: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.send_file(request, stream, labels, buffer, throttle, retries - 1, offset, chunk_size, use_expect_continue) else: debug("Giving up on '%s' %s" % (filename, e)) raise S3UploadError("Upload failed for: %s" % resource['uri']) timestamp_end = time.time() response["elapsed"] = timestamp_end - timestamp_start response["speed"] = response["elapsed"] and float(response["size"]) / response["elapsed"] or float(-1) if self.config.progress_meter: ## Finalising the upload takes some time -> update() progress meter ## to correct the average speed. Otherwise people will complain that ## 'progress' and response["speed"] are inconsistent ;-) progress.update() progress.done("done") if response["status"] in [301, 307]: ## RedirectTemporary or RedirectPermanent return self._http_redirection_handler(request, response, self.send_file, request, stream, labels, buffer, offset = offset, chunk_size = chunk_size, use_expect_continue = use_expect_continue) if response["status"] == 400: handler_fn = self._http_400_handler(request, response, self.send_file, request, stream, labels, buffer, offset = offset, chunk_size = chunk_size, use_expect_continue = use_expect_continue) if handler_fn: return handler_fn err = S3Error(response) if err.code not in ['BadDigest', 'OperationAborted', 'TokenRefreshRequired', 'RequestTimeout']: raise err # else the error will be handled later with a retry if response["status"] == 403: return self._http_403_handler(request, response, self.send_file, request, stream, labels, buffer, offset = offset, chunk_size = chunk_size, use_expect_continue = use_expect_continue) if response["status"] == 417 and retries: # Expect 100-continue not supported by proxy/server self.expect_continue_not_supported = True return self.send_file(request, stream, labels, buffer, throttle, retries - 1, offset, chunk_size, use_expect_continue = False) # S3 from time to time doesn't send ETag back in a response :-( # Force re-upload here. if 'etag' not in response['headers']: response['headers']['etag'] = '' if response["status"] < 200 or response["status"] > 299: try_retry = False if response["status"] >= 500: # AWS internal error - retry try_retry = True if response["status"] == 503: ## SlowDown error throttle = throttle and throttle * 5 or 0.01 elif response["status"] == 507: # Not an AWS error, but s3 compatible server possible error: # InsufficientStorage try_retry = False elif response["status"] == 429: # Not an AWS error, but s3 compatible server possible error: # TooManyRequests/Busy/slowdown try_retry = True throttle = throttle and throttle * 5 or 0.01 elif response["status"] >= 400: err = S3Error(response) ## Retriable client error? if err.code in ['BadDigest', 'OperationAborted', 'TokenRefreshRequired', 'RequestTimeout']: try_retry = True err = S3Error(response) if try_retry: if retries: warning("Upload failed: %s (%s)" % (resource['uri'], err)) if throttle: warning("Retrying on lower speed (throttle=%0.2f)" % throttle) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) return self.send_file(request, stream, labels, buffer, throttle, retries - 1, offset, chunk_size, use_expect_continue) else: warning("Too many failures. Giving up on '%s'" % filename) raise S3UploadError("%s" % err) ## Non-recoverable error raise err debug("MD5 sums: computed=%s, received=%s" % (md5_computed, response["headers"].get('etag', '').strip('"\''))) ## when using KMS encryption, MD5 etag value will not match md5_from_s3 = response["headers"].get("etag", "").strip('"\'') if ('-' not in md5_from_s3) and (md5_from_s3 != md5_hash.hexdigest()) and response["headers"].get("x-amz-server-side-encryption") != 'aws:kms': warning("MD5 Sums don't match!") if retries: warning("Retrying upload of %s" % (filename)) return self.send_file(request, stream, labels, buffer, throttle, retries - 1, offset, chunk_size, use_expect_continue) else: warning("Too many failures. Giving up on '%s'" % filename) raise S3UploadError("MD5 sums of sent and received files don't match!") return response def send_file_multipart(self, stream, headers, uri, size, extra_label=""): timestamp_start = time.time() upload = MultiPartUpload(self, stream, uri, headers, size) upload.upload_all_parts(extra_label) response = upload.complete_multipart_upload() timestamp_end = time.time() response["elapsed"] = timestamp_end - timestamp_start response["size"] = size response["speed"] = response["elapsed"] and float(response["size"]) / response["elapsed"] or float(-1) if response["data"] and getRootTagName(response["data"]) == "Error": #http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html # Error Complete Multipart UPLOAD, status may be 200 # raise S3UploadError raise S3UploadError(getTextFromXml(response["data"], 'Message')) return response def copy_file_multipart(self, src_uri, dst_uri, size, headers, extra_label=""): return self.send_file_multipart(src_uri, headers, dst_uri, size, extra_label) def recv_file(self, request, stream, labels, start_position=0, retries=None): if retries is None: retries = self.config.max_retries self.update_region_inner_request(request) method_string, resource, headers = request.get_triplet() filename = stream.stream_name if self.config.progress_meter: labels[u'action'] = u'download' progress = self.config.progress_class(labels, 0) else: info("Receiving file '%s', please wait..." % filename) timestamp_start = time.time() conn = None try: conn = ConnMan.get(self.get_hostname(resource['bucket'])) conn.c.putrequest(method_string, self.format_uri(resource, conn.path)) for header in headers.keys(): conn.c.putheader(encode_to_s3(header), encode_to_s3(headers[header])) if start_position > 0: debug("Requesting Range: %d .. end" % start_position) conn.c.putheader("Range", "bytes=%d-" % start_position) conn.c.endheaders() response = {} http_response = conn.c.getresponse() response["status"] = http_response.status response["reason"] = http_response.reason response["headers"] = convertHeaderTupleListToDict(http_response.getheaders()) if "x-amz-meta-s3cmd-attrs" in response["headers"]: attrs = parse_attrs_header(response["headers"]["x-amz-meta-s3cmd-attrs"]) response["s3cmd-attrs"] = attrs debug("Response:\n" + pprint.pformat(response)) except ParameterError as e: raise except (IOError, Exception) as e: if self.config.progress_meter: progress.done("failed") if ((hasattr(e, 'errno') and e.errno and e.errno not in (errno.EPIPE, errno.ECONNRESET, errno.ETIMEDOUT)) or "[Errno 104]" in str(e) or "[Errno 32]" in str(e) ) and not isinstance(e, SocketTimeoutException): raise # close the connection and re-establish ConnMan.close(conn) if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.recv_file(request, stream, labels, start_position, retries=retries - 1) else: raise S3DownloadError("Download failed for: %s" % resource['uri']) if response["status"] < 200 or response["status"] > 299: # In case of error, we still need to flush the read buffer to be able to reuse # the connection response['data'] = http_response.read() if response["status"] in [301, 307]: ## RedirectPermanent or RedirectTemporary return self._http_redirection_handler(request, response, self.recv_file, request, stream, labels, start_position) if response["status"] == 400: handler_fn = self._http_400_handler(request, response, self.recv_file, request, stream, labels, start_position) if handler_fn: return handler_fn raise S3Error(response) if response["status"] == 403: return self._http_403_handler(request, response, self.recv_file, request, stream, labels, start_position) if response["status"] < 200 or response["status"] > 299: try_retry = False if response["status"] == 429: # Not an AWS error, but s3 compatible server possible error: # TooManyRequests/Busy/slowdown try_retry = True elif response["status"] == 503: # SlowDown error try_retry = True if try_retry: resource_uri = resource['uri'] if retries: retry_delay = self._fail_wait(retries) warning("Retrying failed request: %s (%s)" % (resource_uri, S3Error(response))) warning("Waiting %d sec..." % retry_delay) time.sleep(retry_delay) return self.recv_file(request, stream, labels, start_position, retries=retries - 1) else: warning("Too many failures. Giving up on '%s'" % resource_uri) raise S3DownloadError("Download failed for: %s" % resource_uri) # Non-recoverable error raise S3Error(response) if start_position == 0: # Only compute MD5 on the fly if we're downloading from beginning # Otherwise we'd get a nonsense. md5_hash = md5() size_left = int(response["headers"]["content-length"]) size_total = start_position + size_left current_position = start_position if self.config.progress_meter: progress.total_size = size_total progress.initial_position = current_position progress.current_position = current_position try: # Fix for issue #432. Even when content size is 0, httplib expect the response to be read. if size_left == 0: data = http_response.read(1) # It is not supposed to be some data returned in that case assert(len(data) == 0) while (current_position < size_total): this_chunk = size_left > self.config.recv_chunk and self.config.recv_chunk or size_left if self.config.limitrate > 0: start_time = time.time() data = http_response.read(this_chunk) if len(data) == 0: raise S3ResponseError("EOF from S3!") #throttle if self.config.limitrate > 0: real_duration = time.time() - start_time expected_duration = float(this_chunk) / self.config.limitrate if expected_duration > real_duration: time.sleep(expected_duration - real_duration) stream.write(data) if start_position == 0: md5_hash.update(data) current_position += len(data) ## Call progress meter from here... if self.config.progress_meter: progress.update(delta_position = len(data)) ConnMan.put(conn) except OSError: raise except (IOError, Exception) as e: if self.config.progress_meter: progress.done("failed") if ((hasattr(e, 'errno') and e.errno and e.errno not in (errno.EPIPE, errno.ECONNRESET, errno.ETIMEDOUT)) or "[Errno 104]" in str(e) or "[Errno 32]" in str(e) ) and not isinstance(e, SocketTimeoutException): raise # close the connection and re-establish ConnMan.close(conn) if retries: warning("Retrying failed request: %s (%s)" % (resource['uri'], e)) warning("Waiting %d sec..." % self._fail_wait(retries)) time.sleep(self._fail_wait(retries)) # Connection error -> same throttle value return self.recv_file(request, stream, labels, current_position, retries=retries - 1) else: raise S3DownloadError("Download failed for: %s" % resource['uri']) stream.flush() timestamp_end = time.time() if self.config.progress_meter: ## The above stream.flush() may take some time -> update() progress meter ## to correct the average speed. Otherwise people will complain that ## 'progress' and response["speed"] are inconsistent ;-) progress.update() progress.done("done") md5_from_s3 = response["headers"].get("etag", "").strip('"\'') if not 'x-amz-meta-s3tools-gpgenc' in response["headers"]: # we can't trust our stored md5 because we # encrypted the file after calculating it but before # uploading it. try: md5_from_s3 = response["s3cmd-attrs"]["md5"] except KeyError: pass # we must have something to compare against to bother with the calculation if '-' not in md5_from_s3: if start_position == 0: # Only compute MD5 on the fly if we were downloading from the beginning response["md5"] = md5_hash.hexdigest() else: # Otherwise try to compute MD5 of the output file try: response["md5"] = hash_file_md5(filename) except IOError as e: if e.errno != errno.ENOENT: warning("Unable to open file: %s: %s" % (filename, e)) warning("Unable to verify MD5. Assume it matches.") response["md5match"] = response.get("md5") == md5_from_s3 response["elapsed"] = timestamp_end - timestamp_start response["size"] = current_position response["speed"] = response["elapsed"] and float(response["size"]) / response["elapsed"] or float(-1) if response["size"] != start_position + int(response["headers"]["content-length"]): warning("Reported size (%s) does not match received size (%s)" % ( start_position + int(response["headers"]["content-length"]), response["size"])) debug("ReceiveFile: Computed MD5 = %s" % response.get("md5")) # avoid ETags from multipart uploads that aren't the real md5 if ('-' not in md5_from_s3 and not response["md5match"]) and (response["headers"].get("x-amz-server-side-encryption") != 'aws:kms'): warning("MD5 signatures do not match: computed=%s, received=%s" % ( response.get("md5"), md5_from_s3)) return response __all__.append("S3") def parse_attrs_header(attrs_header): attrs = {} for attr in attrs_header.split("/"): key, val = attr.split(":") attrs[key] = val return attrs # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/__init__.py0000664000175100017510000000003014534034713014507 0ustar floflo00000000000000# -*- coding: utf-8 -*- s3cmd-2.4.0/S3/Utils.py0000664000175100017510000002543114534034713014064 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, division import os import time import re import string as string_mod import random import errno from logging import debug try: unicode except NameError: # python 3 support # In python 3, unicode -> str, and str -> bytes unicode = str import S3.Config import S3.Exceptions from S3.BaseUtils import (base_urlencode_string, base_replace_nonprintables, base_unicodise, base_deunicodise, md5) __all__ = [] def formatSize(size, human_readable=False, floating_point=False): size = floating_point and float(size) or int(size) if human_readable: coeffs = ['K', 'M', 'G', 'T'] coeff = "" while size > 2048: size /= 1024 coeff = coeffs.pop(0) return (floating_point and float(size) or int(size), coeff) else: return (size, "") __all__.append("formatSize") def convertHeaderTupleListToDict(list): """ Header keys are always in lowercase in python2 but not in python3. """ retval = {} for tuple in list: retval[tuple[0].lower()] = tuple[1] return retval __all__.append("convertHeaderTupleListToDict") _rnd_chars = string_mod.ascii_letters + string_mod.digits _rnd_chars_len = len(_rnd_chars) def rndstr(len): retval = "" while len > 0: retval += _rnd_chars[random.randint(0, _rnd_chars_len-1)] len -= 1 return retval __all__.append("rndstr") def mktmpsomething(prefix, randchars, createfunc): old_umask = os.umask(0o077) tries = 5 while tries > 0: dirname = prefix + rndstr(randchars) try: createfunc(dirname) break except OSError as e: if e.errno != errno.EEXIST: os.umask(old_umask) raise tries -= 1 os.umask(old_umask) return dirname __all__.append("mktmpsomething") def mktmpdir(prefix = os.getenv('TMP','/tmp') + "/tmpdir-", randchars = 10): return mktmpsomething(prefix, randchars, os.mkdir) __all__.append("mktmpdir") def mktmpfile(prefix = os.getenv('TMP','/tmp') + "/tmpfile-", randchars = 20): createfunc = lambda filename : os.close(os.open(deunicodise(filename), os.O_CREAT | os.O_EXCL)) return mktmpsomething(prefix, randchars, createfunc) __all__.append("mktmpfile") def mkdir_with_parents(dir_name): """ mkdir_with_parents(dst_dir) Create directory 'dir_name' with all parent directories Returns True on success, False otherwise. """ pathmembers = dir_name.split(os.sep) tmp_stack = [] while pathmembers and not os.path.isdir(deunicodise(os.sep.join(pathmembers))): tmp_stack.append(pathmembers.pop()) while tmp_stack: pathmembers.append(tmp_stack.pop()) cur_dir = os.sep.join(pathmembers) try: debug("mkdir(%s)" % cur_dir) os.mkdir(deunicodise(cur_dir)) except (OSError, IOError) as e: debug("Can not make directory '%s' (Reason: %s)" % (cur_dir, e.strerror)) return False except Exception as e: debug("Can not make directory '%s' (Reason: %s)" % (cur_dir, e)) return False return True __all__.append("mkdir_with_parents") def unicodise(string, encoding=None, errors='replace', silent=False): if not encoding: encoding = S3.Config.Config().encoding return base_unicodise(string, encoding, errors, silent) __all__.append("unicodise") def unicodise_s(string, encoding=None, errors='replace'): """ Alias to silent version of unicodise """ return unicodise(string, encoding, errors, True) __all__.append("unicodise_s") def deunicodise(string, encoding=None, errors='replace', silent=False): if not encoding: encoding = S3.Config.Config().encoding return base_deunicodise(string, encoding, errors, silent) __all__.append("deunicodise") def deunicodise_s(string, encoding=None, errors='replace'): """ Alias to silent version of deunicodise """ return deunicodise(string, encoding, errors, True) __all__.append("deunicodise_s") def unicodise_safe(string, encoding=None): """ Convert 'string' to Unicode according to current encoding and replace all invalid characters with '?' """ return unicodise(deunicodise(string, encoding), encoding).replace(u'\ufffd', '?') __all__.append("unicodise_safe") ## Low level methods def urlencode_string(string, urlencoding_mode=None, unicode_output=False): if urlencoding_mode is None: urlencoding_mode = S3.Config.Config().urlencoding_mode return base_urlencode_string(string, urlencoding_mode, unicode_output) __all__.append("urlencode_string") def replace_nonprintables(string): """ replace_nonprintables(string) Replaces all non-printable characters 'ch' in 'string' where ord(ch) <= 26 with ^@, ^A, ... ^Z """ warning_message = (S3.Config.Config().urlencoding_mode != "fixbucket") return base_replace_nonprintables(string, warning_message) __all__.append("replace_nonprintables") def time_to_epoch(t): """Convert time specified in a variety of forms into UNIX epoch time. Accepts datetime.datetime, int, anything that has a strftime() method, and standard time 9-tuples """ if isinstance(t, int): # Already an int return t elif isinstance(t, tuple) or isinstance(t, time.struct_time): # Assume it's a time 9-tuple return int(time.mktime(t)) elif hasattr(t, 'timetuple'): # Looks like a datetime object or compatible return int(time.mktime(t.timetuple())) elif hasattr(t, 'strftime'): # Looks like the object supports standard srftime() return int(t.strftime('%s')) elif isinstance(t, str) or isinstance(t, unicode) or isinstance(t, bytes): # See if it's a string representation of an epoch try: # Support relative times (eg. "+60") if t.startswith('+'): return time.time() + int(t[1:]) return int(t) except ValueError: # Try to parse it as a timestamp string try: return time.strptime(t) except ValueError as ex: # Will fall through debug("Failed to parse date with strptime: %s", ex) pass raise S3.Exceptions.ParameterError('Unable to convert %r to an epoch time. Pass an epoch time. Try `date -d \'now + 1 year\' +%%s` (shell) or time.mktime (Python).' % t) def check_bucket_name(bucket, dns_strict=True): if dns_strict: invalid = re.search(r"([^a-z0-9\.-])", bucket, re.UNICODE) if invalid: raise S3.Exceptions.ParameterError("Bucket name '%s' contains disallowed character '%s'. The only supported ones are: lowercase us-ascii letters (a-z), digits (0-9), dot (.) and hyphen (-)." % (bucket, invalid.groups()[0])) else: invalid = re.search(r"([^A-Za-z0-9\._-])", bucket, re.UNICODE) if invalid: raise S3.Exceptions.ParameterError("Bucket name '%s' contains disallowed character '%s'. The only supported ones are: us-ascii letters (a-z, A-Z), digits (0-9), dot (.), hyphen (-) and underscore (_)." % (bucket, invalid.groups()[0])) if len(bucket) < 3: raise S3.Exceptions.ParameterError("Bucket name '%s' is too short (min 3 characters)" % bucket) if len(bucket) > 255: raise S3.Exceptions.ParameterError("Bucket name '%s' is too long (max 255 characters)" % bucket) if dns_strict: if len(bucket) > 63: raise S3.Exceptions.ParameterError("Bucket name '%s' is too long (max 63 characters)" % bucket) if re.search(r"-\.", bucket, re.UNICODE): raise S3.Exceptions.ParameterError("Bucket name '%s' must not contain sequence '-.' for DNS compatibility" % bucket) if re.search(r"\.\.", bucket, re.UNICODE): raise S3.Exceptions.ParameterError("Bucket name '%s' must not contain sequence '..' for DNS compatibility" % bucket) if not re.search(r"^[0-9a-z]", bucket, re.UNICODE): raise S3.Exceptions.ParameterError("Bucket name '%s' must start with a letter or a digit" % bucket) if not re.search(r"[0-9a-z]$", bucket, re.UNICODE): raise S3.Exceptions.ParameterError("Bucket name '%s' must end with a letter or a digit" % bucket) return True __all__.append("check_bucket_name") def check_bucket_name_dns_conformity(bucket): try: return check_bucket_name(bucket, dns_strict = True) except S3.Exceptions.ParameterError: return False __all__.append("check_bucket_name_dns_conformity") def check_bucket_name_dns_support(bucket_host, bucket_name): """ Check whether either the host_bucket support buckets and either bucket name is dns compatible """ if "%(bucket)s" not in bucket_host: return False return check_bucket_name_dns_conformity(bucket_name) __all__.append("check_bucket_name_dns_support") def getBucketFromHostname(hostname): """ bucket, success = getBucketFromHostname(hostname) Only works for hostnames derived from bucket names using Config.host_bucket pattern. Returns bucket name and a boolean success flag. """ if "%(bucket)s" not in S3.Config.Config().host_bucket: return (hostname, False) # Create RE pattern from Config.host_bucket pattern = S3.Config.Config().host_bucket.lower() % { 'bucket' : '(?P.*)' } m = re.match(pattern, hostname, re.UNICODE) if not m: return (hostname, False) return m.group(1), True __all__.append("getBucketFromHostname") def getHostnameFromBucket(bucket): return S3.Config.Config().host_bucket.lower() % { 'bucket' : bucket } __all__.append("getHostnameFromBucket") # Deal with the fact that pwd and grp modules don't exist for Windows try: import pwd def getpwuid_username(uid): """returns a username from the password database for the given uid""" return unicodise_s(pwd.getpwuid(uid).pw_name) except ImportError: import getpass def getpwuid_username(uid): return unicodise_s(getpass.getuser()) __all__.append("getpwuid_username") try: import grp def getgrgid_grpname(gid): """returns a groupname from the group database for the given gid""" return unicodise_s(grp.getgrgid(gid).gr_name) except ImportError: def getgrgid_grpname(gid): return u"nobody" __all__.append("getgrgid_grpname") # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/SortedDict.py0000664000175100017510000000612014534034713015022 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, print_function from .BidirMap import BidirMap class SortedDictIterator(object): def __init__(self, sorted_dict, keys, reverse=False): self.sorted_dict = sorted_dict self.keys = keys if reverse: self.pop_index = -1 else: self.pop_index = 0 def __iter__(self): return self def __next__(self): try: return self.keys.pop(self.pop_index) except IndexError: raise StopIteration next = __next__ class SortedDict(dict): def __init__(self, mapping = {}, ignore_case = True, **kwargs): """ WARNING: SortedDict() with ignore_case==True will drop entries differing only in capitalisation! Eg: SortedDict({'auckland':1, 'Auckland':2}).keys() => ['Auckland'] With ignore_case==False it's all right """ dict.__init__(self, mapping, **kwargs) self.ignore_case = ignore_case def keys(self): # TODO fix # Probably not anymore memory efficient on python2 # as now 2 copies of keys to sort them. keys = dict.keys(self) if self.ignore_case: # Translation map xlat_map = BidirMap() for key in keys: xlat_map[key.lower()] = key # Lowercase keys lc_keys = sorted(xlat_map.keys()) return [xlat_map[k] for k in lc_keys] else: keys = sorted(keys) return keys def __iter__(self): return SortedDictIterator(self, self.keys()) def __reversed__(self): return SortedDictIterator(self, self.keys(), reverse=True) def __getitem__(self, index): """Override to support the "get_slice" for python3 """ if isinstance(index, slice): r = SortedDict(ignore_case = self.ignore_case) for k in self.keys()[index]: r[k] = self[k] else: r = super(SortedDict, self).__getitem__(index) return r if __name__ == "__main__": d = { 'AWS' : 1, 'Action' : 2, 'america' : 3, 'Auckland' : 4, 'America' : 5 } sd = SortedDict(d) print("Wanted: Action, america, Auckland, AWS, [ignore case]") print("Got: ", end=' ') for key in sd: print("%s," % key, end=' ') print(" [used: __iter__()]") d = SortedDict(d, ignore_case = False) print("Wanted: AWS, Action, America, Auckland, america, [case sensitive]") print("Got: ", end=' ') for key in d.keys(): print("%s," % key, end=' ') print(" [used: keys()]") # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/S3Uri.py0000664000175100017510000001730014534034713013725 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, print_function import os import re import sys from .Utils import unicodise, deunicodise, check_bucket_name_dns_support from . import Config PY3 = (sys.version_info >= (3, 0)) class S3Uri(object): type = None _subclasses = None def __new__(self, string): if not self._subclasses: ## Generate a list of all subclasses of S3Uri self._subclasses = [] dict = sys.modules[__name__].__dict__ for something in dict: if type(dict[something]) is not type(self): continue if issubclass(dict[something], self) and dict[something] != self: self._subclasses.append(dict[something]) for subclass in self._subclasses: try: instance = object.__new__(subclass) instance.__init__(string) return instance except ValueError: continue raise ValueError("%s: not a recognized URI" % string) def __str__(self): if PY3: return self.uri() else: return deunicodise(self.uri()) def __unicode__(self): return self.uri() def __repr__(self): return repr("<%s: %s>" % (self.__class__.__name__, self.__unicode__())) def public_url(self): raise ValueError("This S3 URI does not have Anonymous URL representation") def basename(self): return self.__unicode__().split("/")[-1] class S3UriS3(S3Uri): type = "s3" _re = re.compile("^s3:///*([^/]*)/?(.*)", re.IGNORECASE | re.UNICODE) def __init__(self, string): match = self._re.match(string) if not match: raise ValueError("%s: not a S3 URI" % string) groups = match.groups() self._bucket = groups[0] self._object = groups[1] def bucket(self): return self._bucket def object(self): return self._object def has_bucket(self): return bool(self._bucket) def has_object(self): return bool(self._object) def uri(self): return u"/".join([u"s3:/", self._bucket, self._object]) def is_dns_compatible(self): return check_bucket_name_dns_support(Config.Config().host_bucket, self._bucket) def public_url(self): public_url_protocol = "http" if Config.Config().public_url_use_https: public_url_protocol = "https" if self.is_dns_compatible(): return "%s://%s.%s/%s" % (public_url_protocol, self._bucket, Config.Config().host_base, self._object) else: return "%s://%s/%s/%s" % (public_url_protocol, Config.Config().host_base, self._bucket, self._object) def host_name(self): if self.is_dns_compatible(): return "%s.s3.amazonaws.com" % (self._bucket) else: return "s3.amazonaws.com" @staticmethod def compose_uri(bucket, object = ""): return u"s3://%s/%s" % (bucket, object) @staticmethod def httpurl_to_s3uri(http_url): m = re.match("(https?://)?([^/]+)/?(.*)", http_url, re.IGNORECASE | re.UNICODE) hostname, object = m.groups()[1:] hostname = hostname.lower() # Worst case scenario, we would like to be able to match something like # my.website.com.s3-fips.dualstack.us-west-1.amazonaws.com.cn m = re.match("(.*\.)?s3(?:\-[^\.]*)?(?:\.dualstack)?(?:\.[^\.]*)?\.amazonaws\.com(?:\.cn)?$", hostname, re.IGNORECASE | re.UNICODE) if not m: raise ValueError("Unable to parse URL: %s" % http_url) bucket = m.groups()[0] if not bucket: ## old-style url: http://s3.amazonaws.com/bucket/object if "/" not in object: ## no object given bucket = object object = "" else: ## bucket/object bucket, object = object.split("/", 1) else: ## new-style url: http://bucket.s3.amazonaws.com/object bucket = bucket.rstrip('.') return S3Uri( u"s3://%(bucket)s/%(object)s" % { 'bucket' : bucket, 'object' : object } ) class S3UriS3FS(S3Uri): type = "s3fs" _re = re.compile("^s3fs:///*([^/]*)/?(.*)", re.IGNORECASE | re.UNICODE) def __init__(self, string): match = self._re.match(string) if not match: raise ValueError("%s: not a S3fs URI" % string) groups = match.groups() self._fsname = groups[0] self._path = groups[1].split("/") def fsname(self): return self._fsname def path(self): return "/".join(self._path) def uri(self): return u"/".join([u"s3fs:/", self._fsname, self.path()]) class S3UriFile(S3Uri): type = "file" _re = re.compile("^(\w+://)?(.*)", re.UNICODE) def __init__(self, string): match = self._re.match(string) groups = match.groups() if groups[0] not in (None, "file://"): raise ValueError("%s: not a file:// URI" % string) if groups[0] is None: self._path = groups[1].split(os.sep) else: self._path = groups[1].split("/") def path(self): return os.sep.join(self._path) def uri(self): return u"/".join([u"file:/"]+self._path) def isdir(self): return os.path.isdir(deunicodise(self.path())) def dirname(self): return unicodise(os.path.dirname(deunicodise(self.path()))) def basename(self): return unicodise(os.path.basename(deunicodise(self.path()))) class S3UriCloudFront(S3Uri): type = "cf" _re = re.compile("^cf://([^/]*)/*(.*)", re.IGNORECASE | re.UNICODE) def __init__(self, string): match = self._re.match(string) if not match: raise ValueError("%s: not a CloudFront URI" % string) groups = match.groups() self._dist_id = groups[0] self._request_id = groups[1] != "/" and groups[1] or None def dist_id(self): return self._dist_id def request_id(self): return self._request_id def uri(self): uri = u"cf://" + self.dist_id() if self.request_id(): uri += u"/" + self.request_id() return uri if __name__ == "__main__": uri = S3Uri("s3://bucket/object") print("type() =", type(uri)) print("uri =", uri) print("uri.type=", uri.type) print("bucket =", uri.bucket()) print("object =", uri.object()) print() uri = S3Uri("s3://bucket") print("type() =", type(uri)) print("uri =", uri) print("uri.type=", uri.type) print("bucket =", uri.bucket()) print() uri = S3Uri("s3fs://filesystem1/path/to/remote/file.txt") print("type() =", type(uri)) print("uri =", uri) print("uri.type=", uri.type) print("path =", uri.path()) print() uri = S3Uri("/path/to/local/file.txt") print("type() =", type(uri)) print("uri =", uri) print("uri.type=", uri.type) print("path =", uri.path()) print() uri = S3Uri("cf://1234567890ABCD/") print("type() =", type(uri)) print("uri =", uri) print("uri.type=", uri.type) print("dist_id =", uri.dist_id()) print() # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/FileDict.py0000664000175100017510000000527014534034713014446 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import import logging from .SortedDict import SortedDict from .Crypto import hash_file_md5 from . import Utils from . import Config zero_length_md5 = "d41d8cd98f00b204e9800998ecf8427e" cfg = Config.Config() class FileDict(SortedDict): def __init__(self, mapping = None, ignore_case = True, **kwargs): SortedDict.__init__(self, mapping = mapping or {}, ignore_case = ignore_case, **kwargs) self.hardlinks_md5 = dict() # { dev: { inode : {'md5':, 'relative_files':}}} self.by_md5 = dict() # {md5: set(relative_files)} def record_md5(self, relative_file, md5): if not relative_file: return if md5 is None: return if md5 == zero_length_md5: return if md5 not in self.by_md5: self.by_md5[md5] = relative_file def find_md5_one(self, md5): if not md5: return None return self.by_md5.get(md5, None) def get_md5(self, relative_file): """returns md5 if it can, or raises IOError if file is unreadable""" md5 = None if 'md5' in self[relative_file]: return self[relative_file]['md5'] md5 = self.get_hardlink_md5(relative_file) if md5 is None and 'md5' in cfg.sync_checks: logging.debug(u"doing file I/O to read md5 of %s" % relative_file) md5 = hash_file_md5(self[relative_file]['full_name']) self.record_md5(relative_file, md5) self[relative_file]['md5'] = md5 return md5 def record_hardlink(self, relative_file, dev, inode, md5, size): if md5 is None: return if size == 0: # don't record 0-length files return if dev == 0 or inode == 0: # Windows return if dev not in self.hardlinks_md5: self.hardlinks_md5[dev] = dict() if inode not in self.hardlinks_md5[dev]: self.hardlinks_md5[dev][inode] = md5 def get_hardlink_md5(self, relative_file): try: dev = self[relative_file]['dev'] inode = self[relative_file]['inode'] md5 = self.hardlinks_md5[dev][inode] except KeyError: md5 = None return md5 s3cmd-2.4.0/S3/Crypto.py0000664000175100017510000003117414535730271014250 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import import sys import hmac try: from base64 import encodebytes as encodestring except ImportError: # Python 2 support from base64 import encodestring from . import Config from logging import debug from .BaseUtils import encode_to_s3, decode_from_s3, s3_quote, md5, unicode from .Utils import time_to_epoch, deunicodise, check_bucket_name_dns_support from .SortedDict import SortedDict import datetime from hashlib import sha1, sha256 __all__ = [] def format_param_str(params, always_have_equal=False, limited_keys=None): """ Format URL parameters from a params dict and returns ?parm1=val1&parm2=val2 or an empty string if there are no parameters. Output of this function should be appended directly to self.resource['uri'] - Set "always_have_equal" to always have the "=" char for a param even when there is no value for it. - Set "limited_keys" list to restrict the param string to keys that are defined in it. """ if not params: return "" param_str = "" equal_str = always_have_equal and u'=' or '' for key in sorted(params.keys()): if limited_keys and key not in limited_keys: continue value = params[key] if value in (None, ""): param_str += "&%s%s" % (s3_quote(key, unicode_output=True), equal_str) else: param_str += "&%s=%s" % (key, s3_quote(params[key], unicode_output=True)) return param_str and "?" + param_str[1:] __all__.append("format_param_str") ### AWS Version 2 signing def sign_string_v2(string_to_sign): """Sign a string with the secret key, returning base64 encoded results. By default the configured secret key is used, but may be overridden as an argument. Useful for REST authentication. See http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html string_to_sign should be utf-8 "bytes". and returned signature will be utf-8 encoded "bytes". """ secret_key = Config.Config().secret_key signature = encodestring(hmac.new(encode_to_s3(secret_key), string_to_sign, sha1).digest()).strip() return signature __all__.append("sign_string_v2") def sign_request_v2(method='GET', canonical_uri='/', params=None, cur_headers=None): """Sign a string with the secret key, returning base64 encoded results. By default the configured secret key is used, but may be overridden as an argument. Useful for REST authentication. See http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html string_to_sign should be utf-8 "bytes". """ # valid sub-resources to be included in sign v2: SUBRESOURCES_TO_INCLUDE = ['acl', 'lifecycle', 'location', 'logging', 'notification', 'partNumber', 'policy', 'requestPayment', 'tagging', 'torrent', 'uploadId', 'uploads', 'versionId', 'versioning', 'versions', 'website', # Missing of aws s3 doc but needed 'delete', 'cors', 'restore'] if cur_headers is None: cur_headers = SortedDict(ignore_case = True) access_key = Config.Config().access_key string_to_sign = method + "\n" string_to_sign += cur_headers.get("content-md5", "") + "\n" string_to_sign += cur_headers.get("content-type", "") + "\n" string_to_sign += cur_headers.get("date", "") + "\n" for header in sorted(cur_headers.keys()): if header.startswith("x-amz-"): string_to_sign += header + ":" + cur_headers[header] + "\n" if header.startswith("x-emc-"): string_to_sign += header + ":"+ cur_headers[header] + "\n" canonical_uri = s3_quote(canonical_uri, quote_backslashes=False, unicode_output=True) canonical_querystring = format_param_str(params, limited_keys=SUBRESOURCES_TO_INCLUDE) # canonical_querystring would be empty if no param given, otherwise it will # starts with a "?" canonical_uri += canonical_querystring string_to_sign += canonical_uri debug("SignHeaders: " + repr(string_to_sign)) signature = decode_from_s3(sign_string_v2(encode_to_s3(string_to_sign))) new_headers = SortedDict(list(cur_headers.items()), ignore_case=True) new_headers["Authorization"] = "AWS " + access_key + ":" + signature return new_headers __all__.append("sign_request_v2") def sign_url_v2(url_to_sign, expiry): """Sign a URL in s3://bucket/object form with the given expiry time. The object will be accessible via the signed URL until the AWS key and secret are revoked or the expiry time is reached, even if the object is otherwise private. See: http://s3.amazonaws.com/doc/s3-developer-guide/RESTAuthentication.html """ return sign_url_base_v2( bucket = url_to_sign.bucket(), object = url_to_sign.object(), expiry = expiry ) __all__.append("sign_url_v2") def sign_url_base_v2(**parms): """Shared implementation of sign_url methods. Takes a hash of 'bucket', 'object' and 'expiry' as args.""" content_disposition=Config.Config().content_disposition content_type=Config.Config().content_type parms['expiry']=time_to_epoch(parms['expiry']) parms['access_key']=Config.Config().access_key parms['host_base']=Config.Config().host_base parms['object'] = s3_quote(parms['object'], quote_backslashes=False, unicode_output=True) parms['proto'] = 'http' if Config.Config().signurl_use_https: parms['proto'] = 'https' debug("Expiry interpreted as epoch time %s", parms['expiry']) signtext = 'GET\n\n\n%(expiry)d\n/%(bucket)s/%(object)s' % parms param_separator = '?' if content_disposition: signtext += param_separator + 'response-content-disposition=' + content_disposition param_separator = '&' if content_type: signtext += param_separator + 'response-content-type=' + content_type param_separator = '&' debug("Signing plaintext: %r", signtext) parms['sig'] = s3_quote(sign_string_v2(encode_to_s3(signtext)), unicode_output=True) debug("Urlencoded signature: %s", parms['sig']) if check_bucket_name_dns_support(Config.Config().host_bucket, parms['bucket']): url = "%(proto)s://%(bucket)s.%(host_base)s/%(object)s" else: url = "%(proto)s://%(host_base)s/%(bucket)s/%(object)s" url += "?AWSAccessKeyId=%(access_key)s&Expires=%(expiry)d&Signature=%(sig)s" url = url % parms if content_disposition: url += "&response-content-disposition=" + s3_quote(content_disposition, unicode_output=True) if content_type: url += "&response-content-type=" + s3_quote(content_type, unicode_output=True) return url __all__.append("sign_url_base_v2") def sign(key, msg): return hmac.new(key, encode_to_s3(msg), sha256).digest() def getSignatureKey(key, dateStamp, regionName, serviceName): """ Input: unicode params Output: bytes """ kDate = sign(encode_to_s3('AWS4' + key), dateStamp) kRegion = sign(kDate, regionName) kService = sign(kRegion, serviceName) kSigning = sign(kService, 'aws4_request') return kSigning def sign_request_v4(method='GET', host='', canonical_uri='/', params=None, region='us-east-1', cur_headers=None, body=b''): service = 's3' if cur_headers is None: cur_headers = SortedDict(ignore_case = True) cfg = Config.Config() access_key = cfg.access_key secret_key = cfg.secret_key t = datetime.datetime.utcnow() amzdate = t.strftime('%Y%m%dT%H%M%SZ') datestamp = t.strftime('%Y%m%d') signing_key = getSignatureKey(secret_key, datestamp, region, service) canonical_uri = s3_quote(canonical_uri, quote_backslashes=False, unicode_output=True) canonical_querystring = format_param_str(params, always_have_equal=True).lstrip('?') if type(body) == type(sha256(b'')): payload_hash = decode_from_s3(body.hexdigest()) else: payload_hash = decode_from_s3(sha256(encode_to_s3(body)).hexdigest()) canonical_headers = {'host' : host, 'x-amz-content-sha256': payload_hash, 'x-amz-date' : amzdate } signed_headers = 'host;x-amz-content-sha256;x-amz-date' for header in cur_headers.keys(): # avoid duplicate headers and previous Authorization if header == 'Authorization' or header in signed_headers.split(';'): continue canonical_headers[header.strip()] = cur_headers[header].strip() signed_headers += ';' + header.strip() # sort headers into a string canonical_headers_str = '' for k, v in sorted(canonical_headers.items()): canonical_headers_str += k + ":" + v + "\n" canonical_headers = canonical_headers_str debug(u"canonical_headers = %s" % canonical_headers) signed_headers = ';'.join(sorted(signed_headers.split(';'))) canonical_request = method + '\n' + canonical_uri + '\n' + canonical_querystring + '\n' + canonical_headers + '\n' + signed_headers + '\n' + payload_hash debug('Canonical Request:\n%s\n----------------------' % canonical_request) algorithm = 'AWS4-HMAC-SHA256' credential_scope = datestamp + '/' + region + '/' + service + '/' + 'aws4_request' string_to_sign = algorithm + '\n' + amzdate + '\n' + credential_scope + '\n' + decode_from_s3(sha256(encode_to_s3(canonical_request)).hexdigest()) signature = decode_from_s3(hmac.new(signing_key, encode_to_s3(string_to_sign), sha256).hexdigest()) authorization_header = algorithm + ' ' + 'Credential=' + access_key + '/' + credential_scope + ',' + 'SignedHeaders=' + signed_headers + ',' + 'Signature=' + signature new_headers = SortedDict(cur_headers.items()) new_headers.update({'x-amz-date':amzdate, 'Authorization':authorization_header, 'x-amz-content-sha256': payload_hash}) debug("signature-v4 headers: %s" % new_headers) return new_headers __all__.append("sign_request_v4") def checksum_file_descriptor(file_desc, offset=0, size=None, hash_func=sha256): hash = hash_func() if size is None: for chunk in iter(lambda: file_desc.read(8192), b''): hash.update(chunk) else: file_desc.seek(offset) size_left = size while size_left > 0: chunk = file_desc.read(min(8192, size_left)) if not chunk: break size_left -= len(chunk) hash.update(chunk) return hash __all__.append("checksum_file_stream") def checksum_sha256_file(file, offset=0, size=None): if not isinstance(file, unicode): # file is directly a file descriptor return checksum_file_descriptor(file, offset, size, sha256) # Otherwise, we expect file to be a filename with open(deunicodise(file),'rb') as fp: return checksum_file_descriptor(fp, offset, size, sha256) __all__.append("checksum_sha256_file") def checksum_sha256_buffer(buffer, offset=0, size=None): hash = sha256() if size is None: hash.update(buffer) else: hash.update(buffer[offset:offset+size]) return hash __all__.append("checksum_sha256_buffer") def generate_content_md5(body): m = md5(encode_to_s3(body)) base64md5 = encodestring(m.digest()) base64md5 = decode_from_s3(base64md5) if base64md5[-1] == '\n': base64md5 = base64md5[0:-1] return decode_from_s3(base64md5) __all__.append("generate_content_md5") def hash_file_md5(filename): h = md5() with open(deunicodise(filename), "rb") as fp: while True: # Hash 32kB chunks data = fp.read(32*1024) if not data: break h.update(data) return h.hexdigest() __all__.append("hash_file_md5") def calculateChecksum(buffer, mfile, offset, chunk_size, send_chunk): md5_hash = md5() size_left = chunk_size if buffer == '': mfile.seek(offset) while size_left > 0: data = mfile.read(min(send_chunk, size_left)) if not data: break md5_hash.update(data) size_left -= len(data) else: md5_hash.update(buffer) return md5_hash.hexdigest() __all__.append("calculateChecksum") s3cmd-2.4.0/S3/PkgInfo.py0000664000175100017510000000164614535731234014326 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- package = "s3cmd" version = "2.4.0" url = "http://s3tools.org" license = "GNU GPL v2+" short_description = "Command line tool for managing Amazon S3 and CloudFront services" long_description = """ S3cmd lets you copy files from/to Amazon S3 (Simple Storage Service) using a simple to use command line client. Supports rsync-like backup, GPG encryption, and more. Also supports management of Amazon's CloudFront content delivery network. """ # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/ConnMan.py0000664000175100017510000003175514534034713014323 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import import sys if sys.version_info >= (3, 0): from .Custom_httplib3x import httplib else: from .Custom_httplib27 import httplib import ssl from logging import debug from threading import Semaphore from time import time try: # python 3 support from urlparse import urlparse except ImportError: from urllib.parse import urlparse from .Config import Config from .Exceptions import ParameterError, S3SSLCertificateError from .Utils import getBucketFromHostname __all__ = ["ConnMan"] class http_connection(object): context = None context_set = False @staticmethod def _ssl_verified_context(cafile): cfg = Config() context = None try: context = ssl.create_default_context(cafile=cafile) except AttributeError: # no ssl.create_default_context pass if context and not cfg.check_ssl_hostname: context.check_hostname = False debug(u'Disabling SSL certificate hostname checking') return context @staticmethod def _ssl_unverified_context(cafile): debug(u'Disabling SSL certificate checking') context = None try: context = ssl._create_unverified_context(cafile=cafile, cert_reqs=ssl.CERT_NONE) except AttributeError: # no ssl._create_unverified_context pass return context @staticmethod def _ssl_client_auth_context(certfile, keyfile, check_server_cert, cafile): context = None try: cert_reqs = ssl.CERT_REQUIRED if check_server_cert else ssl.CERT_NONE context = ssl._create_unverified_context(cafile=cafile, keyfile=keyfile, certfile=certfile, cert_reqs=cert_reqs) except AttributeError: # no ssl._create_unverified_context pass return context @staticmethod def _ssl_context(): if http_connection.context_set: return http_connection.context cfg = Config() cafile = cfg.ca_certs_file if cafile == "": cafile = None certfile = cfg.ssl_client_cert_file or None keyfile = cfg.ssl_client_key_file or None # the key may be embedded into cert file debug(u"Using ca_certs_file %s", cafile) debug(u"Using ssl_client_cert_file %s", certfile) debug(u"Using ssl_client_key_file %s", keyfile) if certfile is not None: context = http_connection._ssl_client_auth_context(certfile, keyfile, cfg.check_ssl_certificate, cafile) elif cfg.check_ssl_certificate: context = http_connection._ssl_verified_context(cafile) else: context = http_connection._ssl_unverified_context(cafile) http_connection.context = context http_connection.context_set = True return context def forgive_wildcard_cert(self, cert, hostname): """ Wildcard matching for *.s3.amazonaws.com and similar per region. Per http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html: "We recommend that all bucket names comply with DNS naming conventions." Per http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html: "When using virtual hosted-style buckets with SSL, the SSL wild card certificate only matches buckets that do not contain periods. To work around this, use HTTP or write your own certificate verification logic." Therefore, we need a custom validation routine that allows mybucket.example.com.s3.amazonaws.com to be considered a valid hostname for the *.s3.amazonaws.com wildcard cert, and for the region-specific *.s3-[region].amazonaws.com wildcard cert. We also forgive non-S3 wildcard certificates should the hostname match, to allow compatibility with other S3 API-compatible storage providers. """ debug(u'checking SSL subjectAltName as forgiving wildcard cert') san = cert.get('subjectAltName', ()) hostname = hostname.lower() cleaned_host_bucket_config = urlparse('https://' + Config.host_bucket).hostname for key, value in san: if key == 'DNS': value = value.lower() if value.startswith('*.s3') and \ (value.endswith('.amazonaws.com') and hostname.endswith('.amazonaws.com')) or \ (value.endswith('.amazonaws.com.cn') and hostname.endswith('.amazonaws.com.cn')): return True elif value == cleaned_host_bucket_config % \ {'bucket': '*', 'location': Config.bucket_location.lower()} and \ hostname.endswith(cleaned_host_bucket_config % \ {'bucket': '', 'location': Config.bucket_location.lower()}): return True return False def match_hostname(self): cert = self.c.sock.getpeercert() try: ssl.match_hostname(cert, self.hostname) except AttributeError: # old ssl module doesn't have this function return except ValueError: # empty SSL cert means underlying SSL library didn't validate it, we don't either. return except S3CertificateError as e: if not self.forgive_wildcard_cert(cert, self.hostname): raise e @staticmethod def _https_connection(hostname, port=None): try: context = http_connection._ssl_context() # Wildcard certificates do not work with DNS-style named buckets. bucket_name, success = getBucketFromHostname(hostname) if success and '.' in bucket_name: # this merely delays running the hostname check until # after the connection is made and we get control # back. We then run the same check, relaxed for S3's # wildcard certificates. debug(u'Bucket name contains "." character, disabling initial SSL hostname check') check_hostname = False if context: context.check_hostname = False else: if context: check_hostname = context.check_hostname else: # Earliest version of python that don't have context, # don't check hostnames anyway check_hostname = True # Note, we are probably needed to try to set check_hostname because of that bug: # http://bugs.python.org/issue22959 conn = httplib.HTTPSConnection(hostname, port, context=context, check_hostname=check_hostname) debug(u'httplib.HTTPSConnection() has both context and check_hostname') except TypeError: try: # in case check_hostname parameter is not present try again conn = httplib.HTTPSConnection(hostname, port, context=context) debug(u'httplib.HTTPSConnection() has only context') except TypeError: # in case even context parameter is not present try one last time conn = httplib.HTTPSConnection(hostname, port) debug(u'httplib.HTTPSConnection() has neither context nor check_hostname') return conn def __init__(self, id, hostname, ssl, cfg): self.ssl = ssl self.id = id self.counter = 0 # Whatever is the input, ensure to have clean hostname and port parsed_hostname = urlparse('https://' + hostname) self.hostname = parsed_hostname.hostname self.port = parsed_hostname.port if parsed_hostname.path and parsed_hostname.path != '/': self.path = parsed_hostname.path.rstrip('/') debug(u'endpoint path set to %s', self.path) else: self.path = None """ History note: In a perfect world, or in the future: - All http proxies would support CONNECT/tunnel, and so there would be no need for using "absolute URIs" in format_uri. - All s3-like servers would work well whether using relative or ABSOLUTE URIs. But currently, what is currently common: - Proxies without support for CONNECT for http, and so "absolute URIs" have to be used. - Proxies with support for CONNECT for httpS but s3-like servers having issues with "absolute URIs", so relative one still have to be used as the requests will pass as-is, through the proxy because of the CONNECT mode. """ if not cfg.proxy_host: if ssl: self.c = http_connection._https_connection(self.hostname, self.port) debug(u'non-proxied HTTPSConnection(%s, %s)', self.hostname, self.port) else: self.c = httplib.HTTPConnection(self.hostname, self.port) debug(u'non-proxied HTTPConnection(%s, %s)', self.hostname, self.port) else: if ssl: self.c = http_connection._https_connection(cfg.proxy_host, cfg.proxy_port) debug(u'proxied HTTPSConnection(%s, %s)', cfg.proxy_host, cfg.proxy_port) port = self.port and self.port or 443 self.c.set_tunnel(self.hostname, port) debug(u'tunnel to %s, %s', self.hostname, port) else: self.c = httplib.HTTPConnection(cfg.proxy_host, cfg.proxy_port) debug(u'proxied HTTPConnection(%s, %s)', cfg.proxy_host, cfg.proxy_port) # No tunnel here for the moment self.last_used_time = time() class ConnMan(object): _CS_REQ_SENT = httplib._CS_REQ_SENT CONTINUE = httplib.CONTINUE conn_pool_sem = Semaphore() conn_pool = {} conn_max_counter = 800 ## AWS closes connection after some ~90 requests @staticmethod def get(hostname, ssl=None): cfg = Config() if ssl is None: ssl = cfg.use_https conn = None if cfg.proxy_host != "": if ssl and sys.hexversion < 0x02070000: raise ParameterError("use_https=True can't be used with proxy on Python <2.7") conn_id = "proxy://%s:%s" % (cfg.proxy_host, cfg.proxy_port) else: conn_id = "http%s://%s" % (ssl and "s" or "", hostname) ConnMan.conn_pool_sem.acquire() if conn_id not in ConnMan.conn_pool: ConnMan.conn_pool[conn_id] = [] while ConnMan.conn_pool[conn_id]: conn = ConnMan.conn_pool[conn_id].pop() cur_time = time() if cur_time < conn.last_used_time + cfg.connection_max_age \ and cur_time >= conn.last_used_time: debug("ConnMan.get(): re-using connection: %s#%d" % (conn.id, conn.counter)) break # Conn is too old or wall clock went back in the past debug("ConnMan.get(): closing expired connection") ConnMan.close(conn) conn = None ConnMan.conn_pool_sem.release() if not conn: debug("ConnMan.get(): creating new connection: %s" % conn_id) conn = http_connection(conn_id, hostname, ssl, cfg) conn.c.connect() if conn.ssl and cfg.check_ssl_certificate and cfg.check_ssl_hostname: conn.match_hostname() conn.counter += 1 return conn @staticmethod def put(conn): if conn.id.startswith("proxy://"): ConnMan.close(conn) debug("ConnMan.put(): closing proxy connection (keep-alive not yet" " supported)") return if conn.counter >= ConnMan.conn_max_counter: ConnMan.close(conn) debug("ConnMan.put(): closing over-used connection") return cfg = Config() if not cfg.connection_pooling: ConnMan.close(conn) debug("ConnMan.put(): closing connection (connection pooling disabled)") return # Update timestamp of conn to record when was its last use conn.last_used_time = time() ConnMan.conn_pool_sem.acquire() ConnMan.conn_pool[conn.id].append(conn) ConnMan.conn_pool_sem.release() debug("ConnMan.put(): connection put back to pool (%s#%d)" % (conn.id, conn.counter)) @staticmethod def close(conn): if conn: conn.c.close() s3cmd-2.4.0/S3/FileLists.py0000664000175100017510000007022514534034713014663 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Create and compare lists of files/objects ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import from .S3 import S3 from .Config import Config from .S3Uri import S3Uri from .FileDict import FileDict from .BaseUtils import dateS3toUnix, dateRFC822toUnix, s3path from .Utils import unicodise, deunicodise, deunicodise_s, replace_nonprintables from .Exceptions import ParameterError from .HashCache import HashCache from logging import debug, info, warning import os import sys import glob import re import errno import io from stat import S_ISDIR PY3 = (sys.version_info >= (3, 0)) __all__ = ["fetch_local_list", "fetch_remote_list", "compare_filelists"] def _os_walk_unicode(top): ''' Reimplementation of python's os.walk to nicely support unicode in input as in output. ''' try: names = os.listdir(deunicodise(top)) except Exception: return dirs, nondirs = [], [] for name in names: name = unicodise(name) if os.path.isdir(deunicodise(os.path.join(top, name))): if not handle_exclude_include_walk_dir(top, name): dirs.append(name) else: nondirs.append(name) yield top, dirs, nondirs for name in dirs: new_path = os.path.join(top, name) if not os.path.islink(deunicodise(new_path)): for x in _os_walk_unicode(new_path): yield x def handle_exclude_include_walk_dir(root, dirname): ''' Should this root/dirname directory be excluded? (otherwise included by default) Exclude dir matches in the current directory This prevents us from recursing down trees we know we want to ignore return True for excluding, and False for including ''' cfg = Config() # python versions end their patterns (from globs) differently, test for different styles; check python3.6+ styles first directory_patterns = (u'/)$', u'/)\\Z', u'\\/$', u'\\/\\Z(?ms)') d = os.path.join(root, dirname, '') debug(u"CHECK: '%s'" % d) excluded = False for r in cfg.exclude: if not any(r.pattern.endswith(dp) for dp in directory_patterns): # we only check for directory patterns here continue if r.search(d): excluded = True debug(u"EXCL-MATCH: '%s'" % cfg.debug_exclude[r]) break if excluded: ## No need to check for --include if not excluded for r in cfg.include: if not any(r.pattern.endswith(dp) for dp in directory_patterns): # we only check for directory patterns here continue debug(u"INCL-TEST: '%s' ~ %s" % (d, r.pattern)) if r.search(d): excluded = False debug(u"INCL-MATCH: '%s'" % (cfg.debug_include[r])) break if excluded: ## Still excluded - ok, action it debug(u"EXCLUDE: '%s'" % d) else: debug(u"PASS: '%s'" % d) return excluded def _fswalk_follow_symlinks(path): ''' Walk filesystem, following symbolic links (but without recursion), on python2.4 and later If a symlink directory loop is detected, emit a warning and skip. E.g.: dir1/dir2/sym-dir -> ../dir2 ''' assert os.path.isdir(deunicodise(path)) # only designed for directory argument walkdirs = set([path]) for dirpath, dirnames, filenames in _os_walk_unicode(path): real_dirpath = unicodise(os.path.realpath(deunicodise(dirpath))) for dirname in dirnames: current = os.path.join(dirpath, dirname) real_current = unicodise(os.path.realpath(deunicodise(current))) if os.path.islink(deunicodise(current)): if (real_dirpath == real_current or real_dirpath.startswith(real_current + os.path.sep)): warning("Skipping recursively symlinked directory %s" % dirname) else: walkdirs.add(current) for walkdir in walkdirs: for dirpath, dirnames, filenames in _os_walk_unicode(walkdir): yield (dirpath, dirnames, filenames) def _fswalk_no_symlinks(path): ''' Directory tree generator path (str) is the root of the directory tree to walk ''' for dirpath, dirnames, filenames in _os_walk_unicode(path): yield (dirpath, dirnames, filenames) def filter_exclude_include(src_list): debug(u"Applying --exclude/--include") cfg = Config() exclude_list = FileDict(ignore_case = False) for file in src_list.keys(): debug(u"CHECK: '%s'" % file) excluded = False for r in cfg.exclude: if r.search(file): excluded = True debug(u"EXCL-MATCH: '%s'" % cfg.debug_exclude[r]) break if excluded: ## No need to check for --include if not excluded for r in cfg.include: if r.search(file): excluded = False debug(u"INCL-MATCH: '%s'" % cfg.debug_include[r]) break if excluded: ## Still excluded - ok, action it debug(u"EXCLUDE: '%s'" % file) exclude_list[file] = src_list[file] del(src_list[file]) continue else: debug(u"PASS: '%s'" % file) return src_list, exclude_list def _get_filelist_from_file(cfg, local_path): def _append(d, key, value): if key not in d: d[key] = [value] else: d[key].append(value) filelist = {} for fname in cfg.files_from: try: f = None if fname == u'-': f = io.open(sys.stdin.fileno(), mode='r', closefd=False) else: try: f = io.open(deunicodise(fname), mode='r') except IOError as e: warning(u"--files-from input file %s could not be opened for reading (%s), skipping." % (fname, e.strerror)) continue for line in f: line = unicodise(line).strip() line = os.path.normpath(os.path.join(local_path, line)) dirname = unicodise(os.path.dirname(deunicodise(line))) basename = unicodise(os.path.basename(deunicodise(line))) _append(filelist, dirname, basename) finally: if f: f.close() # reformat to match os.walk() result = [] for key in sorted(filelist): values = filelist[key] values.sort() result.append((key, [], values)) return result def fetch_local_list(args, is_src = False, recursive = None, with_dirs=False): def _fetch_local_list_info(loc_list): len_loc_list = len(loc_list) total_size = 0 info(u"Running stat() and reading/calculating MD5 values on %d files, this may take some time..." % len_loc_list) counter = 0 for relative_file in loc_list: counter += 1 if counter % 1000 == 0: info(u"[%d/%d]" % (counter, len_loc_list)) if relative_file == '-': continue loc_list_item = loc_list[relative_file] full_name = loc_list_item['full_name'] is_dir = loc_list_item['is_dir'] try: sr = os.stat_result(os.stat(deunicodise(full_name))) except OSError as e: if e.errno == errno.ENOENT: # file was removed async to us getting the list continue else: raise if is_dir: size = 0 else: size = sr.st_size loc_list[relative_file].update({ 'size' : size, 'mtime' : sr.st_mtime, 'dev' : sr.st_dev, 'inode' : sr.st_ino, 'uid' : sr.st_uid, 'gid' : sr.st_gid, 'sr': sr, # save it all, may need it in preserve_attrs_list ## TODO: Possibly more to save here... }) total_size += sr.st_size if is_dir: # A md5 can't be calculated with a directory path continue if 'md5' in cfg.sync_checks: md5 = cache.md5(sr.st_dev, sr.st_ino, sr.st_mtime, sr.st_size) if md5 is None: try: # this does the file I/O md5 = loc_list.get_md5(relative_file) except IOError: continue cache.add(sr.st_dev, sr.st_ino, sr.st_mtime, sr.st_size, md5) loc_list.record_hardlink(relative_file, sr.st_dev, sr.st_ino, md5, sr.st_size) return total_size def _get_filelist_local(loc_list, local_uri, cache, with_dirs): info(u"Compiling list of local files...") if local_uri.basename() == "-": try: uid = os.geteuid() gid = os.getegid() except Exception: uid = 0 gid = 0 loc_list["-"] = { 'full_name' : '-', 'size' : -1, 'mtime' : -1, 'uid' : uid, 'gid' : gid, 'dev' : 0, 'inode': 0, 'is_dir': False, } return loc_list, True if local_uri.isdir(): local_base = local_uri.basename() local_path = local_uri.path() if is_src and len(cfg.files_from): filelist = _get_filelist_from_file(cfg, local_path) single_file = False else: if cfg.follow_symlinks: filelist = _fswalk_follow_symlinks(local_path) else: filelist = _fswalk_no_symlinks(local_path) single_file = False else: local_base = "" local_path = local_uri.dirname() filelist = [( local_path, [], [local_uri.basename()] )] single_file = True for root, dirs, files in filelist: rel_root = root.replace(local_path, local_base, 1) if not with_dirs: iter_elements = ((files, False),) else: iter_elements = ((dirs, True), (files, False)) for elements, is_dir in iter_elements: for f in elements: full_name = os.path.join(root, f) if not is_dir and not os.path.isfile(deunicodise(full_name)): if os.path.exists(deunicodise(full_name)): warning(u"Skipping over non regular file: %s" % full_name) continue if os.path.islink(deunicodise(full_name)): if not cfg.follow_symlinks: warning(u"Skipping over symbolic link: %s" % full_name) continue relative_file = os.path.join(rel_root, f) if os.path.sep != "/": # Convert non-unix dir separators to '/' relative_file = "/".join(relative_file.split(os.path.sep)) if cfg.urlencoding_mode == "normal": relative_file = replace_nonprintables(relative_file) if relative_file.startswith('./'): relative_file = relative_file[2:] if is_dir and relative_file and relative_file[-1] != '/': relative_file += '/' loc_list[relative_file] = { 'full_name' : full_name, 'is_dir': is_dir, } return loc_list, single_file def _maintain_cache(cache, local_list): # if getting the file list from files_from, it is going to be # a subset of the actual tree. We should not purge content # outside of that subset as we don't know if it's valid or # not. Leave it to a non-files_from run to purge. if cfg.cache_file and len(cfg.files_from) == 0: cache.mark_all_for_purge() if PY3: local_list_val_iter = local_list.values() else: local_list_val_iter = local_list.itervalues() for f_info in local_list_val_iter: inode = f_info.get('inode', 0) if not inode: continue cache.unmark_for_purge(f_info['dev'], inode, f_info['mtime'], f_info['size']) cache.purge() cache.save(cfg.cache_file) cfg = Config() cache = HashCache() if cfg.cache_file and os.path.isfile(deunicodise_s(cfg.cache_file)) and os.path.getsize(deunicodise_s(cfg.cache_file)) > 0: cache.load(cfg.cache_file) else: info(u"Cache file not found or empty, creating/populating it.") local_uris = [] local_list = FileDict(ignore_case = False) single_file = False if type(args) not in (list, tuple, set): args = [args] if recursive == None: recursive = cfg.recursive for arg in args: uri = S3Uri(arg) if not uri.type == 'file': raise ParameterError("Expecting filename or directory instead of: %s" % arg) if uri.isdir() and not recursive: raise ParameterError("Use --recursive to upload a directory: %s" % arg) local_uris.append(uri) for uri in local_uris: list_for_uri, single_file = _get_filelist_local(local_list, uri, cache, with_dirs) ## Single file is True if and only if the user ## specified one local URI and that URI represents ## a FILE. Ie it is False if the URI was of a DIR ## and that dir contained only one FILE. That's not ## a case of single_file==True. if len(local_list) > 1: single_file = False local_list, exclude_list = filter_exclude_include(local_list) total_size = _fetch_local_list_info(local_list) _maintain_cache(cache, local_list) return local_list, single_file, exclude_list, total_size def fetch_remote_list(args, require_attribs = False, recursive = None, uri_params = {}): def _get_remote_attribs(uri, remote_item): response = S3(cfg).object_info(uri) if not response.get('headers'): return remote_item.update({ 'size': int(response['headers']['content-length']), 'md5': response['headers']['etag'].strip('"\''), 'timestamp': dateRFC822toUnix(response['headers']['last-modified']) }) try: md5 = response['s3cmd-attrs']['md5'] remote_item.update({'md5': md5}) debug(u"retrieved md5=%s from headers" % md5) except KeyError: pass def _get_filelist_remote(remote_uri, recursive = True): ## If remote_uri ends with '/' then all remote files will have ## the remote_uri prefix removed in the relative path. ## If, on the other hand, the remote_uri ends with something else ## (probably alphanumeric symbol) we'll use the last path part ## in the relative path. ## ## Complicated, eh? See an example: ## _get_filelist_remote("s3://bckt/abc/def") may yield: ## { 'def/file1.jpg' : {}, 'def/xyz/blah.txt' : {} } ## _get_filelist_remote("s3://bckt/abc/def/") will yield: ## { 'file1.jpg' : {}, 'xyz/blah.txt' : {} } ## Furthermore a prefix-magic can restrict the return list: ## _get_filelist_remote("s3://bckt/abc/def/x") yields: ## { 'xyz/blah.txt' : {} } info(u"Retrieving list of remote files for %s ..." % remote_uri) total_size = 0 s3 = S3(Config()) response = s3.bucket_list(remote_uri.bucket(), prefix = remote_uri.object(), recursive = recursive, uri_params = uri_params) rem_base_original = rem_base = remote_uri.object() remote_uri_original = remote_uri if rem_base != '' and rem_base[-1] != '/': rem_base = rem_base[:rem_base.rfind('/')+1] remote_uri = S3Uri(u"s3://%s/%s" % (remote_uri.bucket(), rem_base)) rem_base_len = len(rem_base) rem_list = FileDict(ignore_case = False) break_now = False for object in response['list']: object_key = object['Key'] object_size = int(object['Size']) is_dir = (object_key[-1] == '/') if object_key == rem_base_original and not is_dir: ## We asked for one file and we got that file :-) key = s3path.basename(object_key) object_uri_str = remote_uri_original.uri() break_now = True # Remove whatever has already been put to rem_list rem_list = FileDict(ignore_case = False) else: # Beware - this may be '' if object_key==rem_base !! key = object_key[rem_base_len:] object_uri_str = remote_uri.uri() + key if not key: # Objects may exist on S3 with empty names (''), which don't map so well to common filesystems. warning(u"Found empty root object name on S3, ignoring.") continue rem_list[key] = { 'size' : object_size, 'timestamp' : dateS3toUnix(object['LastModified']), ## Sadly it's upload time, not our lastmod time :-( 'md5' : object['ETag'].strip('"\''), 'object_key' : object_key, 'object_uri_str' : object_uri_str, 'base_uri' : remote_uri, 'dev' : None, 'inode' : None, 'is_dir': is_dir, } if '-' in rem_list[key]['md5']: # always get it for multipart uploads _get_remote_attribs(S3Uri(object_uri_str), rem_list[key]) md5 = rem_list[key]['md5'] rem_list.record_md5(key, md5) total_size += object_size if break_now: break return rem_list, total_size cfg = Config() remote_uris = [] remote_list = FileDict(ignore_case = False) if type(args) not in (list, tuple, set): args = [args] if recursive == None: recursive = cfg.recursive for arg in args: uri = S3Uri(arg) if not uri.type == 's3': raise ParameterError("Expecting S3 URI instead of '%s'" % arg) remote_uris.append(uri) total_size = 0 if recursive: for uri in remote_uris: objectlist, tmp_total_size = _get_filelist_remote(uri, recursive = True) total_size += tmp_total_size for key in objectlist: remote_list[key] = objectlist[key] remote_list.record_md5(key, objectlist.get_md5(key)) else: for uri in remote_uris: uri_str = uri.uri() ## Wildcards used in remote URI? ## If yes we'll need a bucket listing... wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1) if len(wildcard_split_result) == 2: ## If wildcards found prefix, rest = wildcard_split_result ## Only request recursive listing if the 'rest' of the URI, ## i.e. the part after first wildcard, contains '/' need_recursion = '/' in rest objectlist, tmp_total_size = _get_filelist_remote(S3Uri(prefix), recursive = need_recursion) total_size += tmp_total_size for key in objectlist: ## Check whether the 'key' matches the requested wildcards if glob.fnmatch.fnmatch(objectlist[key]['object_uri_str'], uri_str): remote_list[key] = objectlist[key] else: ## No wildcards - simply append the given URI to the list key = s3path.basename(uri.object()) if not key: raise ParameterError(u"Expecting S3 URI with a filename or --recursive: %s" % uri.uri()) is_dir = (key and key[-1] == '/') remote_item = { 'base_uri': uri, 'object_uri_str': uri.uri(), 'object_key': uri.object(), 'is_dir': is_dir, } if require_attribs: _get_remote_attribs(uri, remote_item) remote_list[key] = remote_item md5 = remote_item.get('md5') if md5: remote_list.record_md5(key, md5) total_size += remote_item.get('size', 0) remote_list, exclude_list = filter_exclude_include(remote_list) return remote_list, exclude_list, total_size def compare_filelists(src_list, dst_list, src_remote, dst_remote): def __direction_str(is_remote): return is_remote and "remote" or "local" def _compare(src_list, dst_lst, src_remote, dst_remote, file): """Return True if src_list[file] matches dst_list[file], else False""" attribs_match = True src_file = src_list.get(file) dst_file = dst_list.get(file) if not src_file or not dst_file: info(u"%s: does not exist in one side or the other: src_list=%s, dst_list=%s" % (file, bool(src_file), bool(dst_file))) return False ## check size first if 'size' in cfg.sync_checks: src_size = src_file.get('size') dst_size = dst_file.get('size') if dst_size is not None and src_size is not None and dst_size != src_size: debug(u"xfer: %s (size mismatch: src=%s dst=%s)" % (file, src_size, dst_size)) attribs_match = False ## check md5 compare_md5 = 'md5' in cfg.sync_checks # Multipart-uploaded files don't have a valid md5 sum - it ends with "...-nn" if compare_md5: if (src_remote == True and '-' in src_file['md5']) or (dst_remote == True and '-' in dst_file['md5']): compare_md5 = False info(u"disabled md5 check for %s" % file) if compare_md5 and src_file['is_dir'] == True: # For directories, nothing to do if they already exist compare_md5 = False if attribs_match and compare_md5: try: src_md5 = src_list.get_md5(file) dst_md5 = dst_list.get_md5(file) except (IOError, OSError): # md5 sum verification failed - ignore that file altogether debug(u"IGNR: %s (disappeared)" % (file)) warning(u"%s: file disappeared, ignoring." % (file)) raise if src_md5 != dst_md5: ## checksums are different. attribs_match = False debug(u"XFER: %s (md5 mismatch: src=%s dst=%s)" % (file, src_md5, dst_md5)) return attribs_match # we don't support local->local sync, use 'rsync' or something like that instead ;-) assert(not(src_remote == False and dst_remote == False)) info(u"Verifying attributes...") cfg = Config() ## Items left on src_list will be transferred ## Items left on update_list will be transferred after src_list ## Items left on copy_pairs will be copied from dst1 to dst2 update_list = FileDict(ignore_case = False) ## Items left on dst_list will be deleted copy_pairs = {} debug("Comparing filelists (direction: %s -> %s)" % (__direction_str(src_remote), __direction_str(dst_remote))) src_dir_cache = set() for relative_file in src_list.keys(): debug(u"CHECK: '%s'" % relative_file) if src_remote: # Most of the time, there will not be dir objects on the remote side # we still need to have a "virtual" list of them to not think that there # are unmatched dirs with the local side. dir_idx = relative_file.rfind('/') if dir_idx > 0: path = relative_file[:dir_idx+1] while path and path not in src_dir_cache: src_dir_cache.add(path) # Also add to cache, all the parent dirs try: path = path[:path.rindex('/', 0, -1)+1] except ValueError: continue if relative_file in dst_list: ## Was --skip-existing requested? if cfg.skip_existing: debug(u"IGNR: '%s' (used --skip-existing)" % relative_file) del(src_list[relative_file]) del(dst_list[relative_file]) continue try: same_file = _compare(src_list, dst_list, src_remote, dst_remote, relative_file) except (IOError,OSError): debug(u"IGNR: '%s' (disappeared)" % relative_file) warning(u"%s: file disappeared, ignoring." % relative_file) del(src_list[relative_file]) del(dst_list[relative_file]) continue if same_file: debug(u"IGNR: '%s' (transfer not needed)" % relative_file) del(src_list[relative_file]) del(dst_list[relative_file]) else: # look for matching file in src try: md5 = src_list.get_md5(relative_file) except IOError: md5 = None if md5 is not None and md5 in dst_list.by_md5: # Found one, we want to copy copy_src_file = dst_list.find_md5_one(md5) debug(u"DST COPY src: '%s' -> '%s'" % (copy_src_file, relative_file)) src_item = src_list[relative_file] src_item["md5"] = md5 src_item["copy_src"] = copy_src_file copy_pairs[relative_file] = src_item del(src_list[relative_file]) del(dst_list[relative_file]) else: # record that we will get this file transferred to us (before all the copies), so if we come across it later again, # we can copy from _this_ copy (e.g. we only upload it once, and copy thereafter). dst_list.record_md5(relative_file, md5) update_list[relative_file] = src_list[relative_file] del src_list[relative_file] del dst_list[relative_file] else: # dst doesn't have this file # look for matching file elsewhere in dst try: md5 = src_list.get_md5(relative_file) except IOError: md5 = None copy_src_file = dst_list.find_md5_one(md5) if copy_src_file is not None: # Found one, we want to copy debug(u"DST COPY dst: '%s' -> '%s'" % (copy_src_file, relative_file)) src_item = src_list[relative_file] src_item["md5"] = md5 src_item["copy_src"] = copy_src_file copy_pairs[relative_file] = src_item del(src_list[relative_file]) else: # we don't have this file, and we don't have a copy of this file elsewhere. Get it. # record that we will get this file transferred to us (before all the copies), so if we come across it later again, # we can copy from _this_ copy (e.g. we only upload it once, and copy thereafter). dst_list.record_md5(relative_file, md5) for f in dst_list.keys(): if f in src_list or f in update_list or f in src_dir_cache: # leave only those not on src_list + update_list + src_dir_cache del dst_list[f] return src_list, dst_list, update_list, copy_pairs # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/AccessLog.py0000664000175100017510000000704614534034713014631 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 - Access Control List representation ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- from __future__ import absolute_import, print_function import sys from . import S3Uri from .Exceptions import ParameterError from .BaseUtils import getTreeFromXml, decode_from_s3 from .ACL import GranteeAnonRead try: import xml.etree.ElementTree as ET except ImportError: import elementtree.ElementTree as ET PY3 = (sys.version_info >= (3,0)) __all__ = [] class AccessLog(object): LOG_DISABLED = "" LOG_TEMPLATE = "" def __init__(self, xml = None): if not xml: xml = self.LOG_DISABLED self.tree = getTreeFromXml(xml) self.tree.attrib['xmlns'] = "http://doc.s3.amazonaws.com/2006-03-01" def isLoggingEnabled(self): return (self.tree.find(".//LoggingEnabled") is not None) def disableLogging(self): el = self.tree.find(".//LoggingEnabled") if el: self.tree.remove(el) def enableLogging(self, target_prefix_uri): el = self.tree.find(".//LoggingEnabled") if not el: el = getTreeFromXml(self.LOG_TEMPLATE) self.tree.append(el) el.find(".//TargetBucket").text = target_prefix_uri.bucket() el.find(".//TargetPrefix").text = target_prefix_uri.object() def targetPrefix(self): if self.isLoggingEnabled(): target_prefix = u"s3://%s/%s" % ( self.tree.find(".//LoggingEnabled//TargetBucket").text, self.tree.find(".//LoggingEnabled//TargetPrefix").text) return S3Uri.S3Uri(target_prefix) else: return "" def setAclPublic(self, acl_public): le = self.tree.find(".//LoggingEnabled") if le is None: raise ParameterError("Logging not enabled, can't set default ACL for logs") tg = le.find(".//TargetGrants") if not acl_public: if not tg: ## All good, it's not been there return else: le.remove(tg) else: # acl_public == True anon_read = GranteeAnonRead().getElement() if not tg: tg = ET.SubElement(le, "TargetGrants") ## What if TargetGrants already exists? We should check if ## AnonRead is there before appending a new one. Later... tg.append(anon_read) def isAclPublic(self): raise NotImplementedError() def __unicode__(self): return decode_from_s3(ET.tostring(self.tree)) def __str__(self): if PY3: # Return unicode return ET.tostring(self.tree, encoding="unicode") else: # Return bytes return ET.tostring(self.tree) __all__.append("AccessLog") if __name__ == "__main__": log = AccessLog() print(log) log.enableLogging(S3Uri.S3Uri(u"s3://targetbucket/prefix/log-")) print(log) log.setAclPublic(True) print(log) log.setAclPublic(False) print(log) log.disableLogging() print(log) # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/Custom_httplib27.py0000664000175100017510000001776414534034713016147 0ustar floflo00000000000000from __future__ import absolute_import, print_function import os import httplib from httplib import (_CS_REQ_SENT, _CS_REQ_STARTED, CONTINUE, UnknownProtocol, CannotSendHeader, NO_CONTENT, NOT_MODIFIED, EXPECTATION_FAILED, HTTPMessage, HTTPException) try: from cStringIO import StringIO except ImportError: from StringIO import StringIO from .BaseUtils import encode_to_s3 _METHODS_EXPECTING_BODY = ['PATCH', 'POST', 'PUT'] # Fixed python 2.X httplib to be able to support # Expect: 100-Continue http feature # Inspired by: # http://bugs.python.org/file26357/issue1346874-273.patch def httpresponse_patched_begin(self): """ Re-implemented httplib begin function to not loop over "100 CONTINUE" status replies but to report it to higher level so it can be processed. """ if self.msg is not None: # we've already started reading the response return # read only one status even if we get a non-100 response version, status, reason = self._read_status() self.status = status self.reason = reason.strip() if version == 'HTTP/1.0': self.version = 10 elif version.startswith('HTTP/1.'): self.version = 11 # use HTTP/1.1 code for HTTP/1.x where x>=1 elif version == 'HTTP/0.9': self.version = 9 else: raise UnknownProtocol(version) if self.version == 9: self.length = None self.chunked = 0 self.will_close = 1 self.msg = HTTPMessage(StringIO()) return self.msg = HTTPMessage(self.fp, 0) if self.debuglevel > 0: for hdr in self.msg.headers: print("header:", hdr, end=" ") # don't let the msg keep an fp self.msg.fp = None # are we using the chunked-style of transfer encoding? tr_enc = self.msg.getheader('transfer-encoding') if tr_enc and tr_enc.lower() == "chunked": self.chunked = 1 self.chunk_left = None else: self.chunked = 0 # will the connection close at the end of the response? self.will_close = self._check_close() # do we have a Content-Length? # NOTE: RFC 2616, S4.4, #3 says we ignore this if tr_enc is "chunked" length = self.msg.getheader('content-length') if length and not self.chunked: try: self.length = int(length) except ValueError: self.length = None else: if self.length < 0: # ignore nonsensical negative lengths self.length = None else: self.length = None # does the body have a fixed length? (of zero) if (status == NO_CONTENT or status == NOT_MODIFIED or 100 <= status < 200 or # 1xx codes self._method == 'HEAD'): self.length = 0 # if the connection remains open, and we aren't using chunked, and # a content-length was not provided, then assume that the connection # WILL close. if not self.will_close and \ not self.chunked and \ self.length is None: self.will_close = 1 def httpconnection_patched_set_content_length(self, body, method): ## REIMPLEMENTED because new in last httplib but needed by send_request # Set the content-length based on the body. If the body is "empty", we # set Content-Length: 0 for methods that expect a body (RFC 7230, # Section 3.3.2). If the body is set for other methods, we set the # header provided we can figure out what the length is. thelen = None if body is None and method.upper() in _METHODS_EXPECTING_BODY: thelen = '0' elif body is not None: try: thelen = str(len(body)) except (TypeError, AttributeError): # If this is a file-like object, try to # fstat its file descriptor try: thelen = str(os.fstat(body.fileno()).st_size) except (AttributeError, OSError): # Don't send a length if this failed if self.debuglevel > 0: print("Cannot stat!!") if thelen is not None: self.putheader('Content-Length', thelen) def httpconnection_patched_send_request(self, method, url, body, headers): # Honor explicitly requested Host: and Accept-Encoding: headers. header_names = dict.fromkeys([k.lower() for k in headers]) skips = {} if 'host' in header_names: skips['skip_host'] = 1 if 'accept-encoding' in header_names: skips['skip_accept_encoding'] = 1 expect_continue = False for hdr, value in headers.iteritems(): if 'expect' == hdr.lower() and '100-continue' in value.lower(): expect_continue = True url = encode_to_s3(url) self.putrequest(method, url, **skips) if 'content-length' not in header_names: self._set_content_length(body, method) for hdr, value in headers.iteritems(): self.putheader(encode_to_s3(hdr), encode_to_s3(value)) # If an Expect: 100-continue was sent, we need to check for a 417 # Expectation Failed to avoid unnecessarily sending the body # See RFC 2616 8.2.3 if not expect_continue: self.endheaders(body) else: if not body: raise HTTPException("A body is required when expecting " "100-continue") self.endheaders() resp = self.getresponse() resp.read() self._HTTPConnection__state = _CS_REQ_SENT if resp.status == EXPECTATION_FAILED: raise ExpectationFailed() elif resp.status == CONTINUE: self.send(body) def httpconnection_patched_endheaders(self, message_body=None): """Indicate that the last header line has been sent to the server. This method sends the request to the server. The optional message_body argument can be used to pass a message body associated with the request. The message body will be sent in the same packet as the message headers if it is string, otherwise it is sent as a separate packet. """ if self._HTTPConnection__state == _CS_REQ_STARTED: self._HTTPConnection__state = _CS_REQ_SENT else: raise CannotSendHeader() self._send_output(message_body) # TCP Maximum Segment Size (MSS) is determined by the TCP stack on # a per-connection basis. There is no simple and efficient # platform independent mechanism for determining the MSS, so # instead a reasonable estimate is chosen. The getsockopt() # interface using the TCP_MAXSEG parameter may be a suitable # approach on some operating systems. A value of 16KiB is chosen # as a reasonable estimate of the maximum MSS. mss = 16384 def httpconnection_patched_send_output(self, message_body=None): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. A message_body may be specified, to be appended to the request. """ self._buffer.extend((b"", b"")) msg = b"\r\n".join(self._buffer) del self._buffer[:] msg = encode_to_s3(msg) # If msg and message_body are sent in a single send() call, # it will avoid performance problems caused by the interaction # between delayed ack and the Nagle algorithm. if isinstance(message_body, str) and len(message_body) < mss: msg += message_body message_body = None self.send(msg) if message_body is not None: #message_body was not a string (i.e. it is a file) and #we must run the risk of Nagle self.send(message_body) class ExpectationFailed(HTTPException): pass # Wrappers # def httpconnection_patched_wrapper_send_body(self, message_body): self.send(message_body) httplib.HTTPResponse.begin = httpresponse_patched_begin httplib.HTTPConnection.endheaders = httpconnection_patched_endheaders httplib.HTTPConnection._send_output = httpconnection_patched_send_output httplib.HTTPConnection._set_content_length = httpconnection_patched_set_content_length httplib.HTTPConnection._send_request = httpconnection_patched_send_request # Interfaces added to httplib.HTTPConnection: httplib.HTTPConnection.wrapper_send_body = httpconnection_patched_wrapper_send_body s3cmd-2.4.0/S3/MultiPart.py0000664000175100017510000003252114534034713014703 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## Amazon S3 Multipart upload support ## Author: Jerome Leclanche ## License: GPL Version 2 from __future__ import absolute_import import sys from logging import debug, info, warning, error from .Crypto import calculateChecksum from .Exceptions import ParameterError from .S3Uri import S3UriS3 from .BaseUtils import getTextFromXml, getTreeFromXml, s3_quote, parseNodes from .Utils import formatSize SIZE_1MB = 1024 * 1024 class MultiPartUpload(object): """Supports MultiPartUpload and MultiPartUpload(Copy) operation""" MIN_CHUNK_SIZE_MB = 5 # 5MB MAX_CHUNK_SIZE_MB = 5 * 1024 # 5GB MAX_FILE_SIZE = 5 * 1024 * 1024 # 5TB def __init__(self, s3, src, dst_uri, headers_baseline=None, src_size=None): self.s3 = s3 self.file_stream = None self.src_uri = None self.src_size = src_size self.dst_uri = dst_uri self.parts = {} self.headers_baseline = headers_baseline or {} if isinstance(src, S3UriS3): # Source is the uri of an object to s3-to-s3 copy with multipart. self.src_uri = src if not src_size: raise ParameterError("Source size is missing for " "MultipartUploadCopy operation") c_size = self.s3.config.multipart_copy_chunk_size_mb * SIZE_1MB else: # Source is a file_stream to upload self.file_stream = src c_size = self.s3.config.multipart_chunk_size_mb * SIZE_1MB self.chunk_size = c_size self.upload_id = self.initiate_multipart_upload() def get_parts_information(self, uri, upload_id): part_list = self.s3.list_multipart(uri, upload_id) parts = dict() for elem in part_list: try: parts[int(elem['PartNumber'])] = { 'checksum': elem['ETag'], 'size': elem['Size'] } except KeyError: pass return parts def get_unique_upload_id(self, uri): upload_id = "" multipart_list = self.s3.get_multipart(uri) for mpupload in multipart_list: try: mp_upload_id = mpupload['UploadId'] mp_path = mpupload['Key'] info("mp_path: %s, object: %s" % (mp_path, uri.object())) if mp_path == uri.object(): if upload_id: raise ValueError( "More than one UploadId for URI %s. Disable " "multipart upload, or use\n %s multipart %s\n" "to list the Ids, then pass a unique --upload-id " "into the put command." % (uri, sys.argv[0], uri)) upload_id = mp_upload_id except KeyError: pass return upload_id def initiate_multipart_upload(self): """ Begin a multipart upload http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadInitiate.html """ if self.s3.config.upload_id: self.upload_id = self.s3.config.upload_id elif self.s3.config.put_continue: self.upload_id = self.get_unique_upload_id(self.dst_uri) else: self.upload_id = "" if not self.upload_id: request = self.s3.create_request("OBJECT_POST", uri=self.dst_uri, headers=self.headers_baseline, uri_params={'uploads': None}) response = self.s3.send_request(request) data = response["data"] self.upload_id = getTextFromXml(data, "UploadId") return self.upload_id def upload_all_parts(self, extra_label=''): """ Execute a full multipart upload on a file Returns the seq/etag dict TODO use num_processes to thread it """ if not self.upload_id: raise ParameterError("Attempting to use a multipart upload that " "has not been initiated.") remote_statuses = {} if self.src_uri: filename = self.src_uri.uri() # Continue is not possible with multipart copy else: filename = self.file_stream.stream_name if self.s3.config.put_continue: remote_statuses = self.get_parts_information(self.dst_uri, self.upload_id) if extra_label: extra_label = u' ' + extra_label labels = { 'source': filename, 'destination': self.dst_uri.uri(), } seq = 1 if self.src_size: size_left = self.src_size nr_parts = self.src_size // self.chunk_size \ + (self.src_size % self.chunk_size and 1) debug("MultiPart: Uploading %s in %d parts" % (filename, nr_parts)) while size_left > 0: offset = self.chunk_size * (seq - 1) current_chunk_size = min(self.src_size - offset, self.chunk_size) size_left -= current_chunk_size labels['extra'] = "[part %d of %d, %s]%s" % ( seq, nr_parts, "%d%sB" % formatSize(current_chunk_size, human_readable=True), extra_label) try: if self.file_stream: self.upload_part( seq, offset, current_chunk_size, labels, remote_status=remote_statuses.get(seq)) else: self.copy_part( seq, offset, current_chunk_size, labels, remote_status=remote_statuses.get(seq)) except: error(u"\nUpload of '%s' part %d failed. Use\n " "%s abortmp %s %s\nto abort the upload, or\n " "%s --upload-id %s put ...\nto continue the upload." % (filename, seq, sys.argv[0], self.dst_uri, self.upload_id, sys.argv[0], self.upload_id)) raise seq += 1 debug("MultiPart: Upload finished: %d parts", seq - 1) return # Else -> Case of u"" source debug("MultiPart: Uploading from %s" % filename) while True: buffer = self.file_stream.read(self.chunk_size) offset = 0 # send from start of the buffer current_chunk_size = len(buffer) labels['extra'] = "[part %d of -, %s]%s" % ( seq, "%d%sB" % formatSize(current_chunk_size, human_readable=True), extra_label) if not buffer: # EOF break try: self.upload_part(seq, offset, current_chunk_size, labels, buffer, remote_status=remote_statuses.get(seq)) except: error(u"\nUpload of '%s' part %d failed. Use\n " "%s abortmp %s %s\nto abort, or\n " "%s --upload-id %s put ...\nto continue the upload." % (filename, seq, sys.argv[0], self.dst_uri, self.upload_id, sys.argv[0], self.upload_id)) raise seq += 1 debug("MultiPart: Upload finished: %d parts", seq - 1) def upload_part(self, seq, offset, chunk_size, labels, buffer='', remote_status=None): """ Upload a file chunk http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadUploadPart.html """ # TODO implement Content-MD5 debug("Uploading part %i of %r (%s bytes)" % (seq, self.upload_id, chunk_size)) if remote_status is not None: if int(remote_status['size']) == chunk_size: checksum = calculateChecksum(buffer, self.file_stream, offset, chunk_size, self.s3.config.send_chunk) remote_checksum = remote_status['checksum'].strip('"\'') if remote_checksum == checksum: warning("MultiPart: size and md5sum match for %s part %d, " "skipping." % (self.dst_uri, seq)) self.parts[seq] = remote_status['checksum'] return None else: warning("MultiPart: checksum (%s vs %s) does not match for" " %s part %d, reuploading." % (remote_checksum, checksum, self.dst_uri, seq)) else: warning("MultiPart: size (%d vs %d) does not match for %s part" " %d, reuploading." % (int(remote_status['size']), chunk_size, self.dst_uri, seq)) headers = {"content-length": str(chunk_size)} query_string_params = {'partNumber': '%s' % seq, 'uploadId': self.upload_id} request = self.s3.create_request("OBJECT_PUT", uri=self.dst_uri, headers=headers, uri_params=query_string_params) response = self.s3.send_file(request, self.file_stream, labels, buffer, offset=offset, chunk_size=chunk_size) self.parts[seq] = response["headers"].get('etag', '').strip('"\'') return response def copy_part(self, seq, offset, chunk_size, labels, remote_status=None): """ Copy a remote file chunk http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadUploadPart.html http://docs.amazonwebservices.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html """ debug("Copying part %i of %r (%s bytes)" % (seq, self.upload_id, chunk_size)) # set up headers with copy-params. # Examples: # x-amz-copy-source: /source_bucket/sourceObject # x-amz-copy-source-range:bytes=first-last # x-amz-copy-source-if-match: etag # x-amz-copy-source-if-none-match: etag # x-amz-copy-source-if-unmodified-since: time_stamp # x-amz-copy-source-if-modified-since: time_stamp headers = { "x-amz-copy-source": s3_quote("/%s/%s" % (self.src_uri.bucket(), self.src_uri.object()), quote_backslashes=False, unicode_output=True) } # byte range, with end byte included. A 10 byte file has bytes=0-9 headers["x-amz-copy-source-range"] = \ "bytes=%d-%d" % (offset, (offset + chunk_size - 1)) query_string_params = {'partNumber': '%s' % seq, 'uploadId': self.upload_id} request = self.s3.create_request("OBJECT_PUT", uri=self.dst_uri, headers=headers, uri_params=query_string_params) labels[u'action'] = u'remote copy' response = self.s3.send_request_with_progress(request, labels, chunk_size) # NOTE: Amazon sends whitespace while upload progresses, which # accumulates in response body and seems to confuse XML parser. # Strip newlines to find ETag in XML response data #data = response["data"].replace("\n", '') self.parts[seq] = getTextFromXml(response['data'], "ETag") or '' return response def complete_multipart_upload(self): """ Finish a multipart upload http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadComplete.html """ debug("MultiPart: Completing upload: %s" % self.upload_id) parts_xml = [] part_xml = "%i%s" for seq, etag in self.parts.items(): parts_xml.append(part_xml % (seq, etag)) body = "%s" \ % "".join(parts_xml) headers = {"content-length": str(len(body))} request = self.s3.create_request( "OBJECT_POST", uri=self.dst_uri, headers=headers, body=body, uri_params={'uploadId': self.upload_id}) response = self.s3.send_request(request) return response def abort_upload(self): """ Abort multipart upload http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadAbort.html """ debug("MultiPart: Aborting upload: %s" % self.upload_id) #request = self.s3.create_request("OBJECT_DELETE", uri = self.uri, # uri_params = {'uploadId': self.upload_id}) #response = self.s3.send_request(request) response = None return response # vim:et:ts=4:sts=4:ai s3cmd-2.4.0/S3/BidirMap.py0000664000175100017510000000257314534034713014455 0ustar floflo00000000000000# -*- coding: utf-8 -*- ## -------------------------------------------------------------------- ## Amazon S3 manager ## ## Authors : Michal Ludvig (https://www.logix.cz/michal) ## Florent Viard (https://www.sodria.com) ## Copyright : TGRMN Software, Sodria SAS and contributors ## License : GPL Version 2 ## Website : https://s3tools.org ## -------------------------------------------------------------------- class BidirMap(object): def __init__(self, **map): self.k2v = {} self.v2k = {} for key in map: self.__setitem__(key, map[key]) def __setitem__(self, key, value): if value in self.v2k: if self.v2k[value] != key: raise KeyError("Value '"+str(value)+"' already in use with key '"+str(self.v2k[value])+"'") try: del(self.v2k[self.k2v[key]]) except KeyError: pass self.k2v[key] = value self.v2k[value] = key def __getitem__(self, key): return self.k2v[key] def __str__(self): return self.v2k.__str__() def getkey(self, value): return self.v2k[value] def getvalue(self, key): return self.k2v[key] def keys(self): return [key for key in self.k2v] def values(self): return [value for value in self.v2k] # vim:et:ts=4:sts=4:ai