scalene-0.7.5/0000755000077200000240000000000013620052734013401 5ustar emerystaff00000000000000scalene-0.7.5/PKG-INFO0000644000077200000240000002506313620052734014504 0ustar emerystaff00000000000000Metadata-Version: 2.1 Name: scalene Version: 0.7.5 Summary: Scalene: A high-resolution, low-overhead CPU and memory profiler for Python Home-page: https://github.com/emeryberger/scalene Author: Emery Berger Author-email: emery@cs.umass.edu License: Apache License 2.0 Description: ![scalene](https://github.com/emeryberger/scalene/raw/master/docs/scalene-image.png) # scalene: a high-performance CPU and memory profiler for Python by [Emery Berger](https://emeryberger.com) ------------ # About Scalene Scalene is a high-performance CPU *and* memory profiler for Python that does a few things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. 1. Scalene is _fast_. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less). 1. Scalene is _precise_. Unlike most other Python profilers, Scalene performs CPU profiling _at the line level_, pointing to the specific lines of code that are responsible for the execution time in your program. This level of detail can be much more useful than the function-level profiles returned by most profilers. 1. Scalene separates out time spent running in Python from time spent in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve. 1. Scalene _profiles memory usage_. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator. ## Installation Scalene is distributed as a `pip` package and works on Linux and Mac OS X platforms. You can install it as follows: ``` % pip install scalene ``` or ``` % python -m pip install scalene ``` _NOTE_: Currently, installing Scalene in this way does not install its memory profiling library, so you will only be able to use it to perform CPU profiling. To take advantage of its memory profiling capability, you will need to download this repository. **NEW**: You can now install the memory profiling part on Mac OS X using Homebrew. ``` % brew tap emeryberger/scalene % brew install --head libscalene ``` This will install a `scalene` script you can use (see below). # Usage The following command will run Scalene to only perform line-level CPU profiling on a provided example program. ``` % python -m scalene test/testme.py ``` If you have installed the Scalene library with Homebrew, you can just invoke `scalene` to perform both line-level CPU and memory profiling: ``` % scalene test/testme.py ``` Otherwise, you first need to build the specialized memory allocator by running `make`: ``` % make ``` Profiling on a Mac OS X system (without using Homebrew): ``` % DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene test/testme.py ``` Profiling on a Linux system: ``` % LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py ``` # Comparison to Other Profilers ## Performance and Features Below is a table comparing various profilers to scalene, running on an example Python program (`benchmarks/julia1_nopil.py`) from the book _High Performance Python_, by Gorelick and Ozsvald. All of these were run on a 2016 MacBook Pro. | | Time (seconds) | Slowdown | Line-level? | CPU? | Separates Python from native? | Memory? | Unmodified code? | | :--- | ---: | ---: | :---: | :---: | :---: | :---: | :---: | | _original program_ | 6.71s | 1.0x | | | | | | | | | | | | | `cProfile` | 11.04s | 1.65x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `Profile` | 202.26s | 30.14x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `pyinstrument` | 9.83s | 1.46x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `line_profiler` | 78.0s | 11.62x | :heavy_check_mark: | :heavy_check_mark: | | | needs `@profile` decorators | | `pprofile` _(deterministic)_ | 403.67s | 60.16x | :heavy_check_mark: | :heavy_check_mark: | | | :heavy_check_mark: | | `pprofile` _(statistical)_ | 7.47s | 1.11x | :heavy_check_mark: | :heavy_check_mark: | | | :heavy_check_mark: | | `yappi` _(CPU)_ | 127.53s | 19.01x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `yappi` _(wallclock)_ | 21.45s | 3.2x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `memory_profiler` | _aborted after 2 hours_ | **>1000x**| line_level | | | :heavy_check_mark: | needs `@profile` decorators | | | | | | | | `scalene` _(CPU only)_ | 6.98s | **1.04x** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | | `scalene` _(CPU + memory)_ | 7.68s | **1.14x** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | ## Output Scalene prints annotated source code for the program being profiled and any modules it uses in the same directory or subdirectories. Here is a snippet from `pystone.py`, just using CPU profiling: ``` benchmarks/pystone.py: % of CPU time = 98.78% out of 3.47s. | CPU % | CPU % | Line | (Python) | (C) | [benchmarks/pystone.py] -------------------------------------------------------------------------------- [... lines omitted ...] 137 | 0.87% | 0.13% | def Proc1(PtrParIn): 138 | 1.46% | 0.36% | PtrParIn.PtrComp = NextRecord = PtrGlb.copy() 139 | | | PtrParIn.IntComp = 5 140 | 0.87% | 0.04% | NextRecord.IntComp = PtrParIn.IntComp 141 | 1.46% | 0.30% | NextRecord.PtrComp = PtrParIn.PtrComp 142 | 2.33% | 0.26% | NextRecord.PtrComp = Proc3(NextRecord.PtrComp) 143 | 1.46% | -0.00% | if NextRecord.Discr == Ident1: 144 | 0.29% | 0.04% | NextRecord.IntComp = 6 145 | 1.75% | 0.40% | NextRecord.EnumComp = Proc6(PtrParIn.EnumComp) 146 | 1.75% | 0.29% | NextRecord.PtrComp = PtrGlb.PtrComp 147 | 0.58% | 0.12% | NextRecord.IntComp = Proc7(NextRecord.IntComp, 10) 148 | | | else: 149 | | | PtrParIn = NextRecord.copy() 150 | 0.87% | 0.15% | NextRecord.PtrComp = None 151 | | | return PtrParIn ``` And here is an example with memory profiling enabled, running the Julia benchmark. ``` benchmarks/julia1_nopil.py: % of CPU time = 99.22% out of 12.06s. | CPU % | CPU % | Memory (MB) | Line | (Python) | (C) | | [benchmarks/julia1_nopil.py] -------------------------------------------------------------------------------- 1 | | | | # Pasted from Chapter 2, High Performance Python - O'Reilly Media; 2 | | | | # minor modifications for Python 3 by Emery Berger 3 | | | | 4 | | | | """Julia set generator without optional PIL-based image drawing""" 5 | | | | import time 6 | | | | # area of complex space to investigate 7 | | | | x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8 8 | | | | c_real, c_imag = -0.62772, -.42193 9 | | | | 10 | | | | #@profile 11 | | | | def calculate_z_serial_purepython(maxiter, zs, cs): 12 | | | | """Calculate output list using Julia update rule""" 13 | 0.08% | 0.02% | 0.06 | output = [0] * len(zs) 14 | 0.25% | 0.01% | 9.50 | for i in range(len(zs)): 15 | | | | n = 0 16 | 1.34% | 0.05% | -9.88 | z = zs[i] 17 | 0.50% | 0.01% | -8.44 | c = cs[i] 18 | 1.25% | 0.04% | | while abs(z) < 2 and n < maxiter: 19 | 68.67% | 2.27% | 42.50 | z = z * z + c 20 | 18.46% | 0.74% | -33.62 | n += 1 21 | | | | output[i] = n 22 | | | | return output ``` Positive memory numbers indicate total memory allocation in megabytes; negative memory numbers indicate memory reclamation. Note that because of the way Python's memory management works, frequent allocation and de-allocation (as in lines 19-20 above) show up as high positive memory on one line followed by an (approximately) corresponding negative memory on the following line(s). # Acknowledgements Logo created by [Sophia Berger](https://www.linkedin.com/in/sophia-berger/). Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: License :: OSI Approved :: Apache Software License Classifier: Operating System :: OS Independent Description-Content-Type: text/markdown scalene-0.7.5/README.md0000644000077200000240000002145113615313743014667 0ustar emerystaff00000000000000![scalene](https://github.com/emeryberger/scalene/raw/master/docs/scalene-image.png) # scalene: a high-performance CPU and memory profiler for Python by [Emery Berger](https://emeryberger.com) ------------ # About Scalene Scalene is a high-performance CPU *and* memory profiler for Python that does a few things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. 1. Scalene is _fast_. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less). 1. Scalene is _precise_. Unlike most other Python profilers, Scalene performs CPU profiling _at the line level_, pointing to the specific lines of code that are responsible for the execution time in your program. This level of detail can be much more useful than the function-level profiles returned by most profilers. 1. Scalene separates out time spent running in Python from time spent in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve. 1. Scalene _profiles memory usage_. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator. ## Installation Scalene is distributed as a `pip` package and works on Linux and Mac OS X platforms. You can install it as follows: ``` % pip install scalene ``` or ``` % python -m pip install scalene ``` _NOTE_: Currently, installing Scalene in this way does not install its memory profiling library, so you will only be able to use it to perform CPU profiling. To take advantage of its memory profiling capability, you will need to download this repository. **NEW**: You can now install the memory profiling part on Mac OS X using Homebrew. ``` % brew tap emeryberger/scalene % brew install --head libscalene ``` This will install a `scalene` script you can use (see below). # Usage The following command will run Scalene to only perform line-level CPU profiling on a provided example program. ``` % python -m scalene test/testme.py ``` If you have installed the Scalene library with Homebrew, you can just invoke `scalene` to perform both line-level CPU and memory profiling: ``` % scalene test/testme.py ``` Otherwise, you first need to build the specialized memory allocator by running `make`: ``` % make ``` Profiling on a Mac OS X system (without using Homebrew): ``` % DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene test/testme.py ``` Profiling on a Linux system: ``` % LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py ``` # Comparison to Other Profilers ## Performance and Features Below is a table comparing various profilers to scalene, running on an example Python program (`benchmarks/julia1_nopil.py`) from the book _High Performance Python_, by Gorelick and Ozsvald. All of these were run on a 2016 MacBook Pro. | | Time (seconds) | Slowdown | Line-level? | CPU? | Separates Python from native? | Memory? | Unmodified code? | | :--- | ---: | ---: | :---: | :---: | :---: | :---: | :---: | | _original program_ | 6.71s | 1.0x | | | | | | | | | | | | | `cProfile` | 11.04s | 1.65x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `Profile` | 202.26s | 30.14x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `pyinstrument` | 9.83s | 1.46x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `line_profiler` | 78.0s | 11.62x | :heavy_check_mark: | :heavy_check_mark: | | | needs `@profile` decorators | | `pprofile` _(deterministic)_ | 403.67s | 60.16x | :heavy_check_mark: | :heavy_check_mark: | | | :heavy_check_mark: | | `pprofile` _(statistical)_ | 7.47s | 1.11x | :heavy_check_mark: | :heavy_check_mark: | | | :heavy_check_mark: | | `yappi` _(CPU)_ | 127.53s | 19.01x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `yappi` _(wallclock)_ | 21.45s | 3.2x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `memory_profiler` | _aborted after 2 hours_ | **>1000x**| line_level | | | :heavy_check_mark: | needs `@profile` decorators | | | | | | | | `scalene` _(CPU only)_ | 6.98s | **1.04x** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | | `scalene` _(CPU + memory)_ | 7.68s | **1.14x** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | ## Output Scalene prints annotated source code for the program being profiled and any modules it uses in the same directory or subdirectories. Here is a snippet from `pystone.py`, just using CPU profiling: ``` benchmarks/pystone.py: % of CPU time = 98.78% out of 3.47s. | CPU % | CPU % | Line | (Python) | (C) | [benchmarks/pystone.py] -------------------------------------------------------------------------------- [... lines omitted ...] 137 | 0.87% | 0.13% | def Proc1(PtrParIn): 138 | 1.46% | 0.36% | PtrParIn.PtrComp = NextRecord = PtrGlb.copy() 139 | | | PtrParIn.IntComp = 5 140 | 0.87% | 0.04% | NextRecord.IntComp = PtrParIn.IntComp 141 | 1.46% | 0.30% | NextRecord.PtrComp = PtrParIn.PtrComp 142 | 2.33% | 0.26% | NextRecord.PtrComp = Proc3(NextRecord.PtrComp) 143 | 1.46% | -0.00% | if NextRecord.Discr == Ident1: 144 | 0.29% | 0.04% | NextRecord.IntComp = 6 145 | 1.75% | 0.40% | NextRecord.EnumComp = Proc6(PtrParIn.EnumComp) 146 | 1.75% | 0.29% | NextRecord.PtrComp = PtrGlb.PtrComp 147 | 0.58% | 0.12% | NextRecord.IntComp = Proc7(NextRecord.IntComp, 10) 148 | | | else: 149 | | | PtrParIn = NextRecord.copy() 150 | 0.87% | 0.15% | NextRecord.PtrComp = None 151 | | | return PtrParIn ``` And here is an example with memory profiling enabled, running the Julia benchmark. ``` benchmarks/julia1_nopil.py: % of CPU time = 99.22% out of 12.06s. | CPU % | CPU % | Memory (MB) | Line | (Python) | (C) | | [benchmarks/julia1_nopil.py] -------------------------------------------------------------------------------- 1 | | | | # Pasted from Chapter 2, High Performance Python - O'Reilly Media; 2 | | | | # minor modifications for Python 3 by Emery Berger 3 | | | | 4 | | | | """Julia set generator without optional PIL-based image drawing""" 5 | | | | import time 6 | | | | # area of complex space to investigate 7 | | | | x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8 8 | | | | c_real, c_imag = -0.62772, -.42193 9 | | | | 10 | | | | #@profile 11 | | | | def calculate_z_serial_purepython(maxiter, zs, cs): 12 | | | | """Calculate output list using Julia update rule""" 13 | 0.08% | 0.02% | 0.06 | output = [0] * len(zs) 14 | 0.25% | 0.01% | 9.50 | for i in range(len(zs)): 15 | | | | n = 0 16 | 1.34% | 0.05% | -9.88 | z = zs[i] 17 | 0.50% | 0.01% | -8.44 | c = cs[i] 18 | 1.25% | 0.04% | | while abs(z) < 2 and n < maxiter: 19 | 68.67% | 2.27% | 42.50 | z = z * z + c 20 | 18.46% | 0.74% | -33.62 | n += 1 21 | | | | output[i] = n 22 | | | | return output ``` Positive memory numbers indicate total memory allocation in megabytes; negative memory numbers indicate memory reclamation. Note that because of the way Python's memory management works, frequent allocation and de-allocation (as in lines 19-20 above) show up as high positive memory on one line followed by an (approximately) corresponding negative memory on the following line(s). # Acknowledgements Logo created by [Sophia Berger](https://www.linkedin.com/in/sophia-berger/). scalene-0.7.5/scalene/0000755000077200000240000000000013620052734015013 5ustar emerystaff00000000000000scalene-0.7.5/scalene/__init__.py0000644000077200000240000000000013602276071017114 0ustar emerystaff00000000000000scalene-0.7.5/scalene/__main__.py0000644000077200000240000000003413602656174017112 0ustar emerystaff00000000000000from scalene import scalene scalene-0.7.5/scalene/scalene-both.py0000644000077200000240000006015313620052621017731 0ustar emerystaff00000000000000"""Scalene: a high-performance, high-precision CPU *and* memory profiler for Python. Scalene uses interrupt-driven sampling for CPU profiling. For memory profiling, it uses a similar mechanism but with interrupts generated by a "sampling memory allocator" that produces signals everytime the heap grows or shrinks by a certain amount. See libscalene.cpp for details (sampling logic is in include/sampleheap.hpp). by Emery Berger https://emeryberger.com usage: # for CPU profiling only python -m Scalene test/testme.py # for CPU and memory profiling (Mac OS X) DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene test/testme.py # for CPU and memory profiling (Linux) LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py """ # import random import sys import atexit import signal import math from collections import defaultdict import time from pathlib import Path import os import traceback import argparse from contextlib import contextmanager from functools import lru_cache from textwrap import dedent # Logic to ignore @profile decorators. import builtins try: builtins.profile except AttributeError: # No line profiler, provide a pass-through version def profile(func): return func builtins.profile = profile the_globals = { '__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': globals()['__loader__'], '__spec__': None, '__annotations__': {}, '__builtins__': globals()['__builtins__'], '__file__': None, '__cached__': None, } assert sys.version_info[0] == 3 and sys.version_info[1] >= 5, "Scalene requires Python version 3.5 or above." # Scalene currently only supports Unix-like operating systems; in particular, Linux and Mac OS X. if sys.platform == 'win32': print("Scalene currently does not support Windows, but works on Linux and Mac OS X.") sys.exit(-1) class Scalene(): """The Scalene profiler itself.""" # Statistics counters. cpu_samples_python = defaultdict(lambda: defaultdict(float)) # CPU samples for each location in the program # spent in the interpreter cpu_samples_c = defaultdict(lambda: defaultdict(float)) # CPU samples for each location in the program # spent in C / libraries / system calls memory_malloc_samples = defaultdict(lambda: defaultdict(float)) # free samples for each location in the program memory_malloc_count = defaultdict(lambda: defaultdict(int)) # number of times samples were added for the above memory_free_samples = defaultdict(lambda: defaultdict(float)) # malloc samples for each location in the program memory_free_count = defaultdict(lambda: defaultdict(int)) # number of times samples were added for the above memory_max_samples = defaultdict(lambda: defaultdict(float)) # malloc samples for each location in the program total_max_samples = 0 total_cpu_samples = 0.0 # how many CPU samples have been collected. total_memory_free_samples = 0 # " " malloc " " " " total_memory_malloc_samples = 0 # " " free " " " " current_footprint = 0 max_footprint = 0 mean_signal_interval = 0.01 # mean seconds between interrupts for CPU sampling. last_signal_interval = 0.01 # last num seconds between interrupts for CPU sampling. original_path = "" # original working directory. program_path = "" # path for the program being profiled. output_file = "" # where we write profile info. output_profile_interval = float("inf") # how long between outputting stats during execution. next_output_time = float("inf") # when do we output the next profile. elapsed_time = 0 # total time spent in program being profiled. # Things that need to be in sync with include/sampleheap.hpp: malloc_signal_filename = "/tmp/scalene-malloc-signal" # file to communicate the number of malloc samples free_signal_filename = "/tmp/scalene-free-signal" # " " " " " " free " #malloc_sampling_rate = 16777259 # 1048583 # we get signals after this many bytes are allocated. #free_sampling_rate = 16777289 # 1048589 # as above, for frees. # NB: the below MUST BE IN SYNC WITH include/sampleheap.hpp! malloc_sampling_rate = 1048583 # 33554467 # 16777259 # 1048583 # we get signals after this many bytes are allocated. free_sampling_rate = 1048589 # 33554473 # 16777289 # 1048589 # as above, for frees. # The specific signals we use. Malloc and free signals are generated by include/sampleheap.hpp. cpu_signal = signal.SIGVTALRM malloc_signal = signal.SIGXCPU free_signal = signal.SIGPROF # Program-specific information. program_being_profiled = "" # the name of the program being profiled. program_path = "" # the path " " " " " @staticmethod def set_timer_signal(use_wallclock_time = False): if use_wallclock_time: Scalene.cpu_timer_signal = signal.ITIMER_REAL else: Scalene.cpu_timer_signal = signal.ITIMER_VIRTUAL # Now set the appropriate timer signal. if Scalene.cpu_timer_signal == signal.ITIMER_REAL: Scalene.cpu_signal = signal.SIGALRM elif Scalene.cpu_timer_signal == signal.ITIMER_VIRTUAL: Scalene.cpu_signal = signal.SIGVTALRM elif Scalene.cpu_timer_signal == signal.ITIMER_PROF: # NOT SUPPORTED assert False, "ITIMER_PROF is not currently supported." @staticmethod def enable_signals(): # Set up the signal handler to handle periodic timer interrupts (for CPU). signal.signal(Scalene.cpu_signal, Scalene.cpu_signal_handler) # Set up the signal handler to handle malloc/free interrupts (for memory allocations). signal.signal(Scalene.malloc_signal, Scalene.alloc_event_signal_handler) signal.signal(Scalene.free_signal, Scalene.alloc_event_signal_handler) # signal.signal(Scalene.malloc_signal, Scalene.malloc_signal_handler) # signal.signal(Scalene.free_signal, Scalene.free_signal_handler) # Turn on the CPU profiling timer to run every signal_interval seconds. signal.setitimer(Scalene.cpu_timer_signal, Scalene.mean_signal_interval, Scalene.mean_signal_interval) Scalene.last_signal_time = Scalene.gettime() @staticmethod def gettime(): """High-precision timer of time spent running in or on behalf of this process.""" return time.process_time() def __init__(self, program_being_profiled): # Register the exit handler to run when the program terminates or we quit. atexit.register(Scalene.exit_handler) # Store relevant names (program, path). Scalene.program_being_profiled = os.path.abspath(program_being_profiled) Scalene.program_path = os.path.dirname(Scalene.program_being_profiled) # Set signal handlers. @staticmethod def cpu_signal_handler(_, frame): """Handle interrupts for CPU profiling.""" # Record how long it has been since we received a timer # before. See the logic below. now = Scalene.gettime() # If it's time to print some profiling info, do so. if now >= Scalene.next_output_time: # Print out the profile. # Set the next output time, stop signals, print the profile, and then start signals again. Scalene.next_output_time += Scalene.output_profile_interval Scalene.stop() Scalene.output_profiles() Scalene.start() fname = frame.f_code.co_filename # Record samples only for files we care about. if (len(fname)) == 0: # 'eval/compile' gives no f_code.co_filename. # We have to look back into the outer frame in order to check the co_filename. fname = frame.f_back.f_code.co_filename if not Scalene.should_trace(fname): Scalene.last_signal_time = Scalene.gettime() # Currently disabled: random sampling for CPU timing. Just use the same interval all the time. # Scalene.last_signal_interval = random.uniform(Scalene.mean_signal_interval / 2, Scalene.mean_signal_interval * 3 / 2) # signal.setitimer(Scalene.cpu_timer_signal, Scalene.last_signal_interval, Scalene.last_signal_interval) return # Here we take advantage of an apparent limitation of Python: # it only delivers signals after the interpreter has given up # control. This seems to mean that sampling is limited to code # running purely in the interpreter, and in fact, that was a limitation # of the first version of Scalene. # # (cf. https://docs.python.org/3.9/library/signal.html#execution-of-python-signal-handlers) # # However: lemons -> lemonade: this "problem" is in fact # an effective way to separate out time spent in # Python vs. time spent in native code "for free"! If we get # the signal immediately, we must be running in the # interpreter. On the other hand, if it was delayed, that means # we are running code OUTSIDE the interpreter, e.g., # native code (be it inside of Python or in a library). We # account for this time by tracking the elapsed (process) time # and compare it to the interval, and add any computed delay # (as if it were sampled) to the C counter. python_time = Scalene.last_signal_interval c_time = now - Scalene.last_signal_time - Scalene.last_signal_interval Scalene.cpu_samples_python[fname][frame.f_lineno] += python_time Scalene.cpu_samples_c[fname][frame.f_lineno] += c_time Scalene.total_cpu_samples += python_time + c_time # disabled randomness for now # Scalene.last_signal_interval = random.uniform(Scalene.mean_signal_interval / 2, Scalene.mean_signal_interval * 3 / 2) # signal.setitimer(Scalene.cpu_timer_signal, Scalene.last_signal_interval, Scalene.last_signal_interval) Scalene.last_signal_time = Scalene.gettime() return @staticmethod def malloc_signal_handler(_, frame): """Handle interrupts for memory profiling (mallocs).""" fname = frame.f_code.co_filename # Record samples only for files we care about. if not Scalene.should_trace(fname): return count = 1.0 lineno = frame.f_lineno read_something = False try: with open(Scalene.malloc_signal_filename, "r") as f: for l, count_str in enumerate(f, 1): read_something = True count_str = count_str.rstrip() count = float(count_str) Scalene.memory_malloc_samples[fname][lineno] += count # print(str(lineno) + " - SCALENE (" + str(l) + ") : malloc = " + str(count) + ", " + str(Scalene.memory_malloc_samples[fname][lineno])) Scalene.total_memory_malloc_samples += count Scalene.current_footprint += count if Scalene.current_footprint > Scalene.max_footprint: # Scalene.memory_max_samples[fname][lineno] += count # Scalene.total_max_samples += count Scalene.max_footprint = Scalene.current_footprint os.remove(Scalene.malloc_signal_filename) except Exception as e: # print(e) pass if read_something: Scalene.memory_malloc_count[fname][lineno] += 1 return @staticmethod def free_signal_handler(_, frame): """Handle interrupts for memory profiling (frees).""" fname = frame.f_code.co_filename # Record samples only for files we care about. if not Scalene.should_trace(fname): return count = 1.0 lineno = frame.f_lineno read_something = False try: with open(Scalene.free_signal_filename, "r") as f: for l, count_str in enumerate(f, 1): read_something = True count_str = count_str.rstrip() count = float(count_str) Scalene.memory_free_samples[fname][lineno] += count # print(str(lineno) + " - SCALENE (" + str(l) + ") : free = " + str(count) + ", " + str(Scalene.memory_free_samples[fname][lineno])) Scalene.total_memory_free_samples += count Scalene.current_footprint -= count # print("free " + str(count)) if Scalene.current_footprint < 0: # In principle, this should not happen, but with sampling, it's _possible_, so we handle it here. Scalene.current_footprint = 0 os.remove(Scalene.free_signal_filename) except Exception as e: # print(e) pass if read_something: Scalene.memory_free_count[fname][lineno] += 1 return @staticmethod def alloc_event_signal_handler(_, frame): Scalene.malloc_signal_handler(True, frame) Scalene.free_signal_handler(True, frame) @staticmethod @lru_cache(128) def should_trace(filename): """Return true if the filename is one we should trace.""" # Profile anything in the program's directory or a child directory, # but nothing else. if filename[0] == '<': # Don't profile Python internals. return False if 'scalene.py' in filename: # Don't profile the profiler. return False filename = os.path.abspath(filename) return Scalene.program_path in filename @staticmethod def start(): """Initiate profiling.""" os.chdir(Scalene.program_path) Scalene.enable_signals() Scalene.elapsed_time = Scalene.gettime() @staticmethod def stop(): """Complete profiling.""" Scalene.disable_signals() Scalene.elapsed_time = Scalene.gettime() - Scalene.elapsed_time os.chdir(Scalene.original_path) @staticmethod @contextmanager def file_or_stdout(file_name): """Returns a file handle for writing; if no argument is passed, returns stdout.""" # from https://stackoverflow.com/questions/9836370/fallback-to-stdout-if-no-file-name-provided if file_name is None: yield sys.stdout else: with open(file_name, 'w') as out_file: yield out_file @staticmethod def output_profiles(): """Write the profile out (currently to stdout).""" # If we've collected any samples, dump them. if Scalene.total_cpu_samples == 0 and Scalene.total_memory_malloc_samples == 0 and Scalene.total_memory_free_samples == 0: # Nothing to output. return False # If I have at least one memory sample, then we are profiling memory. did_sample_memory = (Scalene.total_memory_free_samples + Scalene.total_memory_malloc_samples) > 1 # Collect all instrumented filenames. all_instrumented_files = list(set(list(Scalene.cpu_samples_python.keys()) + list(Scalene.memory_free_samples.keys()) + list(Scalene.memory_malloc_samples.keys()))) with Scalene.file_or_stdout(Scalene.output_file) as out: for fname in sorted(all_instrumented_files): this_cpu_samples = sum(Scalene.cpu_samples_c[fname].values()) + sum(Scalene.cpu_samples_python[fname].values()) try: percent_cpu_time = 100 * this_cpu_samples / Scalene.total_cpu_samples except ZeroDivisionError: percent_cpu_time = 0 # percent_cpu_time = 100 * this_cpu_samples * Scalene.mean_signal_interval / Scalene.elapsed_time print("%s: %% of CPU time = %6.2f%% out of %6.2fs." % (fname, percent_cpu_time, Scalene.elapsed_time), file=out) print(" \t | %9s | %9s | %s %s " % ('CPU %', 'CPU %', 'Avg memory |' if did_sample_memory else '', 'Memory |' if did_sample_memory else ''), file=out) print(" Line\t | %9s | %9s | %s%s [%s]" % ('(Python)', '(native)', 'growth (MB) |' if did_sample_memory else '', ' usage (%) |' if did_sample_memory else '', fname), file=out) print("-" * 80, file=out) with open(fname, 'r') as source_file: for line_no, line in enumerate(source_file, 1): line = line.rstrip() # Strip newline # Prepare output values. n_cpu_samples_c = Scalene.cpu_samples_c[fname][line_no] # Correct for negative CPU sample counts. # This can happen because of floating point inaccuracies, since we perform subtraction to compute it. if n_cpu_samples_c < 0: n_cpu_samples_c = 0 n_cpu_samples_python = Scalene.cpu_samples_python[fname][line_no] # Compute percentages of CPU time. if Scalene.total_cpu_samples != 0: n_cpu_percent_c = n_cpu_samples_c * 100 / Scalene.total_cpu_samples n_cpu_percent_python = n_cpu_samples_python * 100 / Scalene.total_cpu_samples else: n_cpu_percent_c = 0 n_cpu_percent_python = 0 # Now, memory stats. n_free_mb = (Scalene.memory_free_samples[fname][line_no] * Scalene.free_sampling_rate) / (1024 * 1024) n_free_count = Scalene.memory_free_count[fname][line_no] n_avg_free_mb = 0 if n_free_count == 0 else n_free_mb / n_free_count n_malloc_mb = (Scalene.memory_malloc_samples[fname][line_no] * Scalene.malloc_sampling_rate) / (1024 * 1024) n_malloc_count = Scalene.memory_malloc_count[fname][line_no] n_avg_malloc_mb = 0 if n_malloc_count == 0 else n_malloc_mb / n_malloc_count n_growth_mb = 0 if n_malloc_count == 0 and n_free_count == 0 else n_avg_malloc_mb # - n_avg_free_mb n_drop_mb = 0 if n_malloc_count == 0 and n_free_count == 0 else n_avg_free_mb n_usage_mb = 0 if Scalene.total_memory_malloc_samples == 0 else Scalene.memory_malloc_samples[fname][line_no] / Scalene.total_memory_malloc_samples # Finally, print results. n_cpu_percent_c_str = "" if n_cpu_percent_c == 0 else '%6.2f%%' % n_cpu_percent_c n_cpu_percent_python_str = "" if n_cpu_percent_python == 0 else '%6.2f%%' % n_cpu_percent_python n_growth_mb_str = "" if (n_growth_mb == 0 and n_usage_mb == 0) else '%11.0f' % n_growth_mb # n_usage_mb_str = "" if n_usage_mb == 0 else '%9.2f%%' % (100 * n_usage_mb) n_usage_mb_str = "" if n_drop_mb == 0 else '%11.0f' % n_drop_mb if did_sample_memory: print("%6d\t | %9s | %9s | %11s | %11s | %s" % (line_no, n_cpu_percent_python_str, n_cpu_percent_c_str, n_growth_mb_str, n_usage_mb_str, line), file=out) else: print("%6d\t | %9s | %9s | %s" % (line_no, n_cpu_percent_python_str, n_cpu_percent_c_str, line), file=out) print("", file=out) return True @staticmethod def disable_signals(): """Turn off the profiling signals.""" try: signal.signal(Scalene.cpu_timer_signal, signal.SIG_IGN) except Exception as ex: pass signal.signal(Scalene.malloc_signal, signal.SIG_IGN) signal.signal(Scalene.free_signal, signal.SIG_IGN) signal.setitimer(Scalene.cpu_timer_signal, 0) @staticmethod def exit_handler(): """When we exit, disable all signals.""" Scalene.disable_signals() @staticmethod def main(): """Invokes the profiler from the command-line.""" usage = dedent("""Scalene: a high-precision CPU and memory profiler. https://github.com/emeryberger/Scalene for CPU profiling only: % python -m scalene yourprogram.py for CPU and memory profiling (Mac OS X): % DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene yourprogram.py for CPU and memory profiling (Linux): % LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene yourprogram.py """) parser = argparse.ArgumentParser(prog='scalene', description=usage, formatter_class=argparse.RawTextHelpFormatter) parser.add_argument('prog', type=str, help='program to be profiled') parser.add_argument('-o', '--outfile', type=str, default=None, help='file to hold profiler output (default: stdout)') parser.add_argument('--profile-interval', type=float, default=float("inf"), help='output profiles every so many seconds.') parser.add_argument('--wallclock', dest='wallclock', action='store_const', const=True, default=False, help='use wall clock time (default: virtual time)') # Parse out all Scalene arguments and jam the remaining ones into argv. # See https://stackoverflow.com/questions/35733262/is-there-any-way-to-instruct-argparse-python-2-7-to-remove-found-arguments-fro args, left = parser.parse_known_args() sys.argv = sys.argv[:1]+left Scalene.set_timer_signal(args.wallclock) Scalene.output_profile_interval = args.profile_interval Scalene.next_output_time = Scalene.gettime() + Scalene.output_profile_interval try: with open(args.prog, 'rb') as prog_being_profiled: Scalene.original_path = os.getcwd() # Read in the code and compile it. code = compile(prog_being_profiled.read(), args.prog, "exec") # Push the program's path. program_path = os.path.dirname(os.path.abspath(args.prog)) sys.path.insert(0, program_path) Scalene.program_path = program_path # os.chdir(program_path) # Set the file being executed. the_globals['__file__'] = args.prog Scalene.output_file = args.outfile # Start the profiler. profiler = Scalene(os.path.join(program_path, os.path.basename(args.prog))) try: profiler.start() # Run the code being profiled. exec(code, the_globals) profiler.stop() # Go back home. # os.chdir(Scalene.original_path) # If we've collected any samples, dump them. if profiler.output_profiles(): pass else: print("Scalene: Program did not run for long enough to profile.") except Exception as ex: template = "Scalene: An exception of type {0} occurred. Arguments:\n{1!r}" message = template.format(type(ex).__name__, ex.args) print(message) print(traceback.format_exc()) except (FileNotFoundError, IOError): print("Scalene: could not find input file.") Scalene.main() scalene-0.7.5/scalene/scalene.py0000644000077200000240000005760613620052640017011 0ustar emerystaff00000000000000"""Scalene: a high-performance, high-precision CPU *and* memory profiler for Python. Scalene uses interrupt-driven sampling for CPU profiling. For memory profiling, it uses a similar mechanism but with interrupts generated by a "sampling memory allocator" that produces signals everytime the heap grows or shrinks by a certain amount. See libscalene.cpp for details (sampling logic is in include/sampleheap.hpp). by Emery Berger https://emeryberger.com usage: # for CPU profiling only python -m Scalene test/testme.py # for CPU and memory profiling (Mac OS X) DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene test/testme.py # for CPU and memory profiling (Linux) LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py """ # import random import sys import atexit import signal import math from collections import defaultdict import time from pathlib import Path import os import traceback import argparse from contextlib import contextmanager from functools import lru_cache from textwrap import dedent # Logic to ignore @profile decorators. import builtins try: builtins.profile except AttributeError: # No line profiler, provide a pass-through version def profile(func): return func builtins.profile = profile the_globals = { '__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': globals()['__loader__'], '__spec__': None, '__annotations__': {}, '__builtins__': globals()['__builtins__'], '__file__': None, '__cached__': None, } assert sys.version_info[0] == 3 and sys.version_info[1] >= 5, "Scalene requires Python version 3.5 or above." # Scalene currently only supports Unix-like operating systems; in particular, Linux and Mac OS X. if sys.platform == 'win32': print("Scalene currently does not support Windows, but works on Linux and Mac OS X.") sys.exit(-1) class Scalene(): """The Scalene profiler itself.""" # Statistics counters. cpu_samples_python = defaultdict(lambda: defaultdict(float)) # CPU samples for each location in the program # spent in the interpreter cpu_samples_c = defaultdict(lambda: defaultdict(float)) # CPU samples for each location in the program # spent in C / libraries / system calls memory_malloc_samples = defaultdict(lambda: defaultdict(float)) # free samples for each location in the program memory_malloc_count = defaultdict(lambda: defaultdict(int)) # number of times samples were added for the above memory_free_samples = defaultdict(lambda: defaultdict(float)) # malloc samples for each location in the program memory_free_count = defaultdict(lambda: defaultdict(int)) # number of times samples were added for the above memory_max_samples = defaultdict(lambda: defaultdict(float)) # malloc samples for each location in the program total_max_samples = 0 total_cpu_samples = 0.0 # how many CPU samples have been collected. total_memory_free_samples = 0 # " " malloc " " " " total_memory_malloc_samples = 0 # " " free " " " " current_footprint = 0 max_footprint = 0 mean_signal_interval = 0.01 # mean seconds between interrupts for CPU sampling. last_signal_interval = 0.01 # last num seconds between interrupts for CPU sampling. original_path = "" # original working directory. program_path = "" # path for the program being profiled. output_file = "" # where we write profile info. output_profile_interval = float("inf") # how long between outputting stats during execution. next_output_time = float("inf") # when do we output the next profile. elapsed_time = 0 # total time spent in program being profiled. # Things that need to be in sync with include/sampleheap.hpp: malloc_signal_filename = "/tmp/scalene-malloc-signal" # file to communicate the number of malloc samples free_signal_filename = "/tmp/scalene-free-signal" # " " " " " " free " #malloc_sampling_rate = 16777259 # 1048583 # we get signals after this many bytes are allocated. #free_sampling_rate = 16777289 # 1048589 # as above, for frees. # NB: the below MUST BE IN SYNC WITH include/sampleheap.hpp! malloc_sampling_rate = 1048583 # 33554467 # 16777259 # 1048583 # we get signals after this many bytes are allocated. free_sampling_rate = 1048589 # 33554473 # 16777289 # 1048589 # as above, for frees. # The specific signals we use. Malloc and free signals are generated by include/sampleheap.hpp. cpu_signal = signal.SIGVTALRM malloc_signal = signal.SIGXCPU free_signal = signal.SIGPROF # Program-specific information. program_being_profiled = "" # the name of the program being profiled. program_path = "" # the path " " " " " @staticmethod def set_timer_signal(use_wallclock_time = False): if use_wallclock_time: Scalene.cpu_timer_signal = signal.ITIMER_REAL else: Scalene.cpu_timer_signal = signal.ITIMER_VIRTUAL # Now set the appropriate timer signal. if Scalene.cpu_timer_signal == signal.ITIMER_REAL: Scalene.cpu_signal = signal.SIGALRM elif Scalene.cpu_timer_signal == signal.ITIMER_VIRTUAL: Scalene.cpu_signal = signal.SIGVTALRM elif Scalene.cpu_timer_signal == signal.ITIMER_PROF: # NOT SUPPORTED assert False, "ITIMER_PROF is not currently supported." @staticmethod def enable_signals(): # Set up the signal handler to handle periodic timer interrupts (for CPU). signal.signal(Scalene.cpu_signal, Scalene.cpu_signal_handler) # Set up the signal handler to handle malloc/free interrupts (for memory allocations). signal.signal(Scalene.malloc_signal, Scalene.alloc_event_signal_handler) signal.signal(Scalene.free_signal, Scalene.alloc_event_signal_handler) # signal.signal(Scalene.malloc_signal, Scalene.malloc_signal_handler) # signal.signal(Scalene.free_signal, Scalene.free_signal_handler) # Turn on the CPU profiling timer to run every signal_interval seconds. signal.setitimer(Scalene.cpu_timer_signal, Scalene.mean_signal_interval, Scalene.mean_signal_interval) Scalene.last_signal_time = Scalene.gettime() @staticmethod def gettime(): """High-precision timer of time spent running in or on behalf of this process.""" return time.process_time() def __init__(self, program_being_profiled): # Register the exit handler to run when the program terminates or we quit. atexit.register(Scalene.exit_handler) # Store relevant names (program, path). Scalene.program_being_profiled = os.path.abspath(program_being_profiled) Scalene.program_path = os.path.dirname(Scalene.program_being_profiled) # Set signal handlers. @staticmethod def cpu_signal_handler(_, frame): """Handle interrupts for CPU profiling.""" # Record how long it has been since we received a timer # before. See the logic below. now = Scalene.gettime() # If it's time to print some profiling info, do so. if now >= Scalene.next_output_time: # Print out the profile. # Set the next output time, stop signals, print the profile, and then start signals again. Scalene.next_output_time += Scalene.output_profile_interval Scalene.stop() Scalene.output_profiles() Scalene.start() fname = frame.f_code.co_filename # Record samples only for files we care about. if (len(fname)) == 0: # 'eval/compile' gives no f_code.co_filename. # We have to look back into the outer frame in order to check the co_filename. fname = frame.f_back.f_code.co_filename if not Scalene.should_trace(fname): Scalene.last_signal_time = Scalene.gettime() # Currently disabled: random sampling for CPU timing. Just use the same interval all the time. # Scalene.last_signal_interval = random.uniform(Scalene.mean_signal_interval / 2, Scalene.mean_signal_interval * 3 / 2) # signal.setitimer(Scalene.cpu_timer_signal, Scalene.last_signal_interval, Scalene.last_signal_interval) return # Here we take advantage of an apparent limitation of Python: # it only delivers signals after the interpreter has given up # control. This seems to mean that sampling is limited to code # running purely in the interpreter, and in fact, that was a limitation # of the first version of Scalene. # # (cf. https://docs.python.org/3.9/library/signal.html#execution-of-python-signal-handlers) # # However: lemons -> lemonade: this "problem" is in fact # an effective way to separate out time spent in # Python vs. time spent in native code "for free"! If we get # the signal immediately, we must be running in the # interpreter. On the other hand, if it was delayed, that means # we are running code OUTSIDE the interpreter, e.g., # native code (be it inside of Python or in a library). We # account for this time by tracking the elapsed (process) time # and compare it to the interval, and add any computed delay # (as if it were sampled) to the C counter. python_time = Scalene.last_signal_interval c_time = now - Scalene.last_signal_time - Scalene.last_signal_interval Scalene.cpu_samples_python[fname][frame.f_lineno] += python_time Scalene.cpu_samples_c[fname][frame.f_lineno] += c_time Scalene.total_cpu_samples += python_time + c_time # disabled randomness for now # Scalene.last_signal_interval = random.uniform(Scalene.mean_signal_interval / 2, Scalene.mean_signal_interval * 3 / 2) # signal.setitimer(Scalene.cpu_timer_signal, Scalene.last_signal_interval, Scalene.last_signal_interval) Scalene.last_signal_time = Scalene.gettime() return @staticmethod def malloc_signal_handler(_, frame): """Handle interrupts for memory profiling (mallocs).""" fname = frame.f_code.co_filename # Record samples only for files we care about. if not Scalene.should_trace(fname): return count = 1.0 lineno = frame.f_lineno read_something = False try: with open(Scalene.malloc_signal_filename, "r") as f: for l, count_str in enumerate(f, 1): read_something = True count_str = count_str.rstrip() count = float(count_str) Scalene.memory_malloc_samples[fname][lineno] += count # print("SCALENE (" + str(l) + ") : malloc = " + str(count) + ", " + str(Scalene.memory_malloc_samples[fname][lineno])) Scalene.total_memory_malloc_samples += count Scalene.current_footprint += count if Scalene.current_footprint > Scalene.max_footprint: # Scalene.memory_max_samples[fname][lineno] += count # Scalene.total_max_samples += count Scalene.max_footprint = Scalene.current_footprint os.remove(Scalene.malloc_signal_filename) except Exception as e: # print(e) pass if read_something: Scalene.memory_malloc_count[fname][lineno] += 1 return @staticmethod def free_signal_handler(_, frame): """Handle interrupts for memory profiling (frees).""" fname = frame.f_code.co_filename # Record samples only for files we care about. if not Scalene.should_trace(fname): return count = 1.0 lineno = frame.f_lineno read_something = False try: with open(Scalene.free_signal_filename, "r") as f: for l, count_str in enumerate(f, 1): read_something = True count_str = count_str.rstrip() count = float(count_str) Scalene.memory_free_samples[fname][lineno] += count # print("SCALENE (" + str(l) + ") : free = " + str(count) + ", " + str(Scalene.memory_free_samples[fname][lineno])) Scalene.total_memory_free_samples += count Scalene.current_footprint -= count # print("free " + str(count)) if Scalene.current_footprint < 0: # In principle, this should not happen, but with sampling, it's _possible_, so we handle it here. Scalene.current_footprint = 0 os.remove(Scalene.free_signal_filename) except Exception as e: # print(e) pass if read_something: Scalene.memory_free_count[fname][lineno] += 1 return @staticmethod def alloc_event_signal_handler(_, frame): Scalene.malloc_signal_handler(True, frame) Scalene.free_signal_handler(True, frame) @staticmethod @lru_cache(128) def should_trace(filename): """Return true if the filename is one we should trace.""" # Profile anything in the program's directory or a child directory, # but nothing else. if filename[0] == '<': # Don't profile Python internals. return False if 'scalene.py' in filename: # Don't profile the profiler. return False filename = os.path.abspath(filename) return Scalene.program_path in filename @staticmethod def start(): """Initiate profiling.""" os.chdir(Scalene.program_path) Scalene.enable_signals() Scalene.elapsed_time = Scalene.gettime() @staticmethod def stop(): """Complete profiling.""" Scalene.disable_signals() Scalene.elapsed_time = Scalene.gettime() - Scalene.elapsed_time os.chdir(Scalene.original_path) @staticmethod @contextmanager def file_or_stdout(file_name): """Returns a file handle for writing; if no argument is passed, returns stdout.""" # from https://stackoverflow.com/questions/9836370/fallback-to-stdout-if-no-file-name-provided if file_name is None: yield sys.stdout else: with open(file_name, 'w') as out_file: yield out_file @staticmethod def output_profiles(): """Write the profile out (currently to stdout).""" # If we've collected any samples, dump them. if Scalene.total_cpu_samples == 0 and Scalene.total_memory_malloc_samples == 0 and Scalene.total_memory_free_samples == 0: # Nothing to output. return False # If I have at least one memory sample, then we are profiling memory. did_sample_memory = (Scalene.total_memory_free_samples + Scalene.total_memory_malloc_samples) > 1 # Collect all instrumented filenames. all_instrumented_files = list(set(list(Scalene.cpu_samples_python.keys()) + list(Scalene.memory_free_samples.keys()) + list(Scalene.memory_malloc_samples.keys()))) with Scalene.file_or_stdout(Scalene.output_file) as out: for fname in sorted(all_instrumented_files): this_cpu_samples = sum(Scalene.cpu_samples_c[fname].values()) + sum(Scalene.cpu_samples_python[fname].values()) try: percent_cpu_time = 100 * this_cpu_samples / Scalene.total_cpu_samples except ZeroDivisionError: percent_cpu_time = 0 # percent_cpu_time = 100 * this_cpu_samples * Scalene.mean_signal_interval / Scalene.elapsed_time print("%s: %% of CPU time = %6.2f%% out of %6.2fs." % (fname, percent_cpu_time, Scalene.elapsed_time), file=out) print(" \t | %9s | %9s | %s %s " % ('CPU %', 'CPU %', 'Avg memory |' if did_sample_memory else '', 'Memory |' if did_sample_memory else ''), file=out) print(" Line\t | %9s | %9s | %s%s [%s]" % ('(Python)', '(native)', 'growth (MB) |' if did_sample_memory else '', ' usage (%) |' if did_sample_memory else '', fname), file=out) print("-" * 80, file=out) with open(fname, 'r') as source_file: for line_no, line in enumerate(source_file, 1): line = line.rstrip() # Strip newline # Prepare output values. n_cpu_samples_c = Scalene.cpu_samples_c[fname][line_no] # Correct for negative CPU sample counts. # This can happen because of floating point inaccuracies, since we perform subtraction to compute it. if n_cpu_samples_c < 0: n_cpu_samples_c = 0 n_cpu_samples_python = Scalene.cpu_samples_python[fname][line_no] # Compute percentages of CPU time. if Scalene.total_cpu_samples != 0: n_cpu_percent_c = n_cpu_samples_c * 100 / Scalene.total_cpu_samples n_cpu_percent_python = n_cpu_samples_python * 100 / Scalene.total_cpu_samples else: n_cpu_percent_c = 0 n_cpu_percent_python = 0 # Now, memory stats. n_free_mb = (Scalene.memory_free_samples[fname][line_no] * Scalene.free_sampling_rate) / (1024 * 1024) n_free_count = Scalene.memory_free_count[fname][line_no] n_avg_free_mb = 0 if n_free_count == 0 else n_free_mb / n_free_count n_malloc_mb = (Scalene.memory_malloc_samples[fname][line_no] * Scalene.malloc_sampling_rate) / (1024 * 1024) n_malloc_count = Scalene.memory_malloc_count[fname][line_no] n_avg_malloc_mb = 0 if n_malloc_count == 0 else n_malloc_mb / n_malloc_count n_growth_mb = 0 if n_malloc_count == 0 and n_free_count == 0 else n_avg_malloc_mb - n_avg_free_mb n_usage_mb = 0 if Scalene.total_memory_malloc_samples == 0 else Scalene.memory_malloc_samples[fname][line_no] / Scalene.total_memory_malloc_samples # Finally, print results. n_cpu_percent_c_str = "" if n_cpu_percent_c == 0 else '%6.2f%%' % n_cpu_percent_c n_cpu_percent_python_str = "" if n_cpu_percent_python == 0 else '%6.2f%%' % n_cpu_percent_python n_growth_mb_str = "" if (n_growth_mb == 0 and n_usage_mb == 0) else '%11.0f' % n_growth_mb n_usage_mb_str = "" if n_usage_mb == 0 else '%9.2f%%' % (100 * n_usage_mb) if did_sample_memory: print("%6d\t | %9s | %9s | %11s | %11s | %s" % (line_no, n_cpu_percent_python_str, n_cpu_percent_c_str, n_growth_mb_str, n_usage_mb_str, line), file=out) else: print("%6d\t | %9s | %9s | %s" % (line_no, n_cpu_percent_python_str, n_cpu_percent_c_str, line), file=out) print("", file=out) return True @staticmethod def disable_signals(): """Turn off the profiling signals.""" try: signal.signal(Scalene.cpu_timer_signal, signal.SIG_IGN) except Exception as ex: pass signal.signal(Scalene.malloc_signal, signal.SIG_IGN) signal.signal(Scalene.free_signal, signal.SIG_IGN) signal.setitimer(Scalene.cpu_timer_signal, 0) @staticmethod def exit_handler(): """When we exit, disable all signals.""" Scalene.disable_signals() @staticmethod def main(): """Invokes the profiler from the command-line.""" usage = dedent("""Scalene: a high-precision CPU and memory profiler. https://github.com/emeryberger/Scalene for CPU profiling only: % python -m scalene yourprogram.py for CPU and memory profiling (Mac OS X): % DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene yourprogram.py for CPU and memory profiling (Linux): % LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene yourprogram.py """) parser = argparse.ArgumentParser(prog='scalene', description=usage, formatter_class=argparse.RawTextHelpFormatter) parser.add_argument('prog', type=str, help='program to be profiled') parser.add_argument('-o', '--outfile', type=str, default=None, help='file to hold profiler output (default: stdout)') parser.add_argument('--profile-interval', type=float, default=float("inf"), help='output profiles every so many seconds.') parser.add_argument('--wallclock', dest='wallclock', action='store_const', const=True, default=False, help='use wall clock time (default: virtual time)') # Parse out all Scalene arguments and jam the remaining ones into argv. # See https://stackoverflow.com/questions/35733262/is-there-any-way-to-instruct-argparse-python-2-7-to-remove-found-arguments-fro args, left = parser.parse_known_args() sys.argv = sys.argv[:1]+left Scalene.set_timer_signal(args.wallclock) Scalene.output_profile_interval = args.profile_interval Scalene.next_output_time = Scalene.gettime() + Scalene.output_profile_interval try: with open(args.prog, 'rb') as prog_being_profiled: Scalene.original_path = os.getcwd() # Read in the code and compile it. code = compile(prog_being_profiled.read(), args.prog, "exec") # Push the program's path. program_path = os.path.dirname(os.path.abspath(args.prog)) sys.path.insert(0, program_path) Scalene.program_path = program_path # os.chdir(program_path) # Set the file being executed. the_globals['__file__'] = args.prog Scalene.output_file = args.outfile # Start the profiler. profiler = Scalene(os.path.join(program_path, os.path.basename(args.prog))) try: profiler.start() # Run the code being profiled. exec(code, the_globals) profiler.stop() # Go back home. # os.chdir(Scalene.original_path) # If we've collected any samples, dump them. if profiler.output_profiles(): pass else: print("Scalene: Program did not run for long enough to profile.") except Exception as ex: template = "Scalene: An exception of type {0} occurred. Arguments:\n{1!r}" message = template.format(type(ex).__name__, ex.args) print(message) print(traceback.format_exc()) except (FileNotFoundError, IOError): print("Scalene: could not find input file.") Scalene.main() scalene-0.7.5/scalene.egg-info/0000755000077200000240000000000013620052734016505 5ustar emerystaff00000000000000scalene-0.7.5/scalene.egg-info/PKG-INFO0000644000077200000240000002506313620052733017607 0ustar emerystaff00000000000000Metadata-Version: 2.1 Name: scalene Version: 0.7.5 Summary: Scalene: A high-resolution, low-overhead CPU and memory profiler for Python Home-page: https://github.com/emeryberger/scalene Author: Emery Berger Author-email: emery@cs.umass.edu License: Apache License 2.0 Description: ![scalene](https://github.com/emeryberger/scalene/raw/master/docs/scalene-image.png) # scalene: a high-performance CPU and memory profiler for Python by [Emery Berger](https://emeryberger.com) ------------ # About Scalene Scalene is a high-performance CPU *and* memory profiler for Python that does a few things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. 1. Scalene is _fast_. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less). 1. Scalene is _precise_. Unlike most other Python profilers, Scalene performs CPU profiling _at the line level_, pointing to the specific lines of code that are responsible for the execution time in your program. This level of detail can be much more useful than the function-level profiles returned by most profilers. 1. Scalene separates out time spent running in Python from time spent in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve. 1. Scalene _profiles memory usage_. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator. ## Installation Scalene is distributed as a `pip` package and works on Linux and Mac OS X platforms. You can install it as follows: ``` % pip install scalene ``` or ``` % python -m pip install scalene ``` _NOTE_: Currently, installing Scalene in this way does not install its memory profiling library, so you will only be able to use it to perform CPU profiling. To take advantage of its memory profiling capability, you will need to download this repository. **NEW**: You can now install the memory profiling part on Mac OS X using Homebrew. ``` % brew tap emeryberger/scalene % brew install --head libscalene ``` This will install a `scalene` script you can use (see below). # Usage The following command will run Scalene to only perform line-level CPU profiling on a provided example program. ``` % python -m scalene test/testme.py ``` If you have installed the Scalene library with Homebrew, you can just invoke `scalene` to perform both line-level CPU and memory profiling: ``` % scalene test/testme.py ``` Otherwise, you first need to build the specialized memory allocator by running `make`: ``` % make ``` Profiling on a Mac OS X system (without using Homebrew): ``` % DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene test/testme.py ``` Profiling on a Linux system: ``` % LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py ``` # Comparison to Other Profilers ## Performance and Features Below is a table comparing various profilers to scalene, running on an example Python program (`benchmarks/julia1_nopil.py`) from the book _High Performance Python_, by Gorelick and Ozsvald. All of these were run on a 2016 MacBook Pro. | | Time (seconds) | Slowdown | Line-level? | CPU? | Separates Python from native? | Memory? | Unmodified code? | | :--- | ---: | ---: | :---: | :---: | :---: | :---: | :---: | | _original program_ | 6.71s | 1.0x | | | | | | | | | | | | | `cProfile` | 11.04s | 1.65x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `Profile` | 202.26s | 30.14x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `pyinstrument` | 9.83s | 1.46x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `line_profiler` | 78.0s | 11.62x | :heavy_check_mark: | :heavy_check_mark: | | | needs `@profile` decorators | | `pprofile` _(deterministic)_ | 403.67s | 60.16x | :heavy_check_mark: | :heavy_check_mark: | | | :heavy_check_mark: | | `pprofile` _(statistical)_ | 7.47s | 1.11x | :heavy_check_mark: | :heavy_check_mark: | | | :heavy_check_mark: | | `yappi` _(CPU)_ | 127.53s | 19.01x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `yappi` _(wallclock)_ | 21.45s | 3.2x | function-level | :heavy_check_mark: | | | :heavy_check_mark: | | `memory_profiler` | _aborted after 2 hours_ | **>1000x**| line_level | | | :heavy_check_mark: | needs `@profile` decorators | | | | | | | | `scalene` _(CPU only)_ | 6.98s | **1.04x** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | | `scalene` _(CPU + memory)_ | 7.68s | **1.14x** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | ## Output Scalene prints annotated source code for the program being profiled and any modules it uses in the same directory or subdirectories. Here is a snippet from `pystone.py`, just using CPU profiling: ``` benchmarks/pystone.py: % of CPU time = 98.78% out of 3.47s. | CPU % | CPU % | Line | (Python) | (C) | [benchmarks/pystone.py] -------------------------------------------------------------------------------- [... lines omitted ...] 137 | 0.87% | 0.13% | def Proc1(PtrParIn): 138 | 1.46% | 0.36% | PtrParIn.PtrComp = NextRecord = PtrGlb.copy() 139 | | | PtrParIn.IntComp = 5 140 | 0.87% | 0.04% | NextRecord.IntComp = PtrParIn.IntComp 141 | 1.46% | 0.30% | NextRecord.PtrComp = PtrParIn.PtrComp 142 | 2.33% | 0.26% | NextRecord.PtrComp = Proc3(NextRecord.PtrComp) 143 | 1.46% | -0.00% | if NextRecord.Discr == Ident1: 144 | 0.29% | 0.04% | NextRecord.IntComp = 6 145 | 1.75% | 0.40% | NextRecord.EnumComp = Proc6(PtrParIn.EnumComp) 146 | 1.75% | 0.29% | NextRecord.PtrComp = PtrGlb.PtrComp 147 | 0.58% | 0.12% | NextRecord.IntComp = Proc7(NextRecord.IntComp, 10) 148 | | | else: 149 | | | PtrParIn = NextRecord.copy() 150 | 0.87% | 0.15% | NextRecord.PtrComp = None 151 | | | return PtrParIn ``` And here is an example with memory profiling enabled, running the Julia benchmark. ``` benchmarks/julia1_nopil.py: % of CPU time = 99.22% out of 12.06s. | CPU % | CPU % | Memory (MB) | Line | (Python) | (C) | | [benchmarks/julia1_nopil.py] -------------------------------------------------------------------------------- 1 | | | | # Pasted from Chapter 2, High Performance Python - O'Reilly Media; 2 | | | | # minor modifications for Python 3 by Emery Berger 3 | | | | 4 | | | | """Julia set generator without optional PIL-based image drawing""" 5 | | | | import time 6 | | | | # area of complex space to investigate 7 | | | | x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8 8 | | | | c_real, c_imag = -0.62772, -.42193 9 | | | | 10 | | | | #@profile 11 | | | | def calculate_z_serial_purepython(maxiter, zs, cs): 12 | | | | """Calculate output list using Julia update rule""" 13 | 0.08% | 0.02% | 0.06 | output = [0] * len(zs) 14 | 0.25% | 0.01% | 9.50 | for i in range(len(zs)): 15 | | | | n = 0 16 | 1.34% | 0.05% | -9.88 | z = zs[i] 17 | 0.50% | 0.01% | -8.44 | c = cs[i] 18 | 1.25% | 0.04% | | while abs(z) < 2 and n < maxiter: 19 | 68.67% | 2.27% | 42.50 | z = z * z + c 20 | 18.46% | 0.74% | -33.62 | n += 1 21 | | | | output[i] = n 22 | | | | return output ``` Positive memory numbers indicate total memory allocation in megabytes; negative memory numbers indicate memory reclamation. Note that because of the way Python's memory management works, frequent allocation and de-allocation (as in lines 19-20 above) show up as high positive memory on one line followed by an (approximately) corresponding negative memory on the following line(s). # Acknowledgements Logo created by [Sophia Berger](https://www.linkedin.com/in/sophia-berger/). Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: License :: OSI Approved :: Apache Software License Classifier: Operating System :: OS Independent Description-Content-Type: text/markdown scalene-0.7.5/scalene.egg-info/SOURCES.txt0000644000077200000240000000043713620052733020374 0ustar emerystaff00000000000000README.md setup.py scalene/__init__.py scalene/__main__.py scalene/scalene-both.py scalene/scalene.py scalene.egg-info/PKG-INFO scalene.egg-info/SOURCES.txt scalene.egg-info/dependency_links.txt scalene.egg-info/top_level.txt test/testme-memory_profiler.py test/testme.py test/testme1.pyscalene-0.7.5/scalene.egg-info/dependency_links.txt0000644000077200000240000000000113620052733022552 0ustar emerystaff00000000000000 scalene-0.7.5/scalene.egg-info/top_level.txt0000644000077200000240000000001013620052733021225 0ustar emerystaff00000000000000scalene scalene-0.7.5/setup.cfg0000644000077200000240000000004613620052734015222 0ustar emerystaff00000000000000[egg_info] tag_build = tag_date = 0 scalene-0.7.5/setup.py0000644000077200000240000000151713620052656015122 0ustar emerystaff00000000000000from setuptools import setup, find_packages from os import path this_directory = path.abspath(path.dirname(__file__)) with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f: long_description = f.read() setup(name='scalene', version='0.7.5', description='Scalene: A high-resolution, low-overhead CPU and memory profiler for Python', long_description=long_description, long_description_content_type='text/markdown', url='https://github.com/emeryberger/scalene', author='Emery Berger', author_email='emery@cs.umass.edu', license='Apache License 2.0', classifiers=[ "Programming Language :: Python :: 3", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", ], packages=find_packages() ) scalene-0.7.5/test/0000755000077200000240000000000013620052734014360 5ustar emerystaff00000000000000scalene-0.7.5/test/testme-memory_profiler.py0000644000077200000240000000225313602450357021450 0ustar emerystaff00000000000000import os import numpy as np #import math from numpy import linalg as LA arr = [i for i in range(1,50000)] @profile def doit1(x): # x = [i*i for i in range(1,1000)][0] y = 1 # w, v = LA.eig(np.diag(arr)) # (1, 2, 3, 4, 5, 6, 7, 8, 9, 10))) x = [i*i for i in range(0,100000)][99999] y1 = [i*i for i in range(0,200000)][199999] z1 = [i for i in range(0,300000)][299999] z = x * y # z = np.multiply(x, y) return z @profile def doit2(x): i = 0 # zarr = [math.cos(13) for i in range(1,100000)] # z = zarr[0] z = 0.1 while i < 100000: # z = math.cos(13) # z = np.multiply(x,x) # z = np.multiply(z,z) # z = np.multiply(z,z) z = z * z z = x * x z = z * z z = z * z i += 1 return z @profile def doit3(x): z = x + 1 z = x + 1 z = x + 1 z = x + z z = x + z # z = np.cos(x) return z def stuff(): y = np.random.randint(1, 100, size=5000000)[4999999] x = 1.01 for i in range(1,2): #10): for j in range(1,2): #10): x = doit1(x) x = doit2(x) x = doit3(x) x = 1.01 return x stuff() scalene-0.7.5/test/testme.py0000644000077200000240000000220513610617716016240 0ustar emerystaff00000000000000import os import numpy as np #import math from numpy import linalg as LA arr = [i for i in range(1,1000)] def doit1(x): # x = [i*i for i in range(1,1000)][0] y = 1 # w, v = LA.eig(np.diag(arr)) # (1, 2, 3, 4, 5, 6, 7, 8, 9, 10))) x = [i*i for i in range(0,100000)][99999] y1 = [i*i for i in range(0,200000)][199999] z1 = [i for i in range(0,300000)][299999] z = x * y # z = np.multiply(x, y) return z def doit2(x): i = 0 # zarr = [math.cos(13) for i in range(1,100000)] # z = zarr[0] z = 0.1 while i < 100000: # z = math.cos(13) # z = np.multiply(x,x) # z = np.multiply(z,z) # z = np.multiply(z,z) z = z * z z = x * x z = z * z z = z * z i += 1 return z def doit3(x): z = x + 1 z = x + 1 z = x + 1 z = x + z z = x + z # z = np.cos(x) return z def stuff(): y = np.random.randint(1, 100, size=5000000)[4999999] x = 1.01 for i in range(1,10): for j in range(1,10): x = doit1(x) x = doit2(x) x = doit3(x) x = 1.01 return x stuff() scalene-0.7.5/test/testme1.py0000644000077200000240000000271413610617711016321 0ustar emerystaff00000000000000#import os #import numpy as np #import math import sys #from numpy import linalg as LA arr = list([i for i in range(1,50000)]) print("arr size = " + str(sys.getsizeof(arr))) def doit1(x): # x = [i*i for i in range(1,1000)][0] y = 1 for i in range(1,1000): arr = list([i for i in range(1,50000)]) del arr arr = [] # w, v = LA.eig(np.diag(arr)) # (1, 2, 3, 4, 5, 6, 7, 8, 9, 10))) l = list([i*i for i in range(0,100000)]) print(sys.getsizeof(l)) x = [i*i for i in range(0,100000)][99999] y1 = [i*i for i in range(0,200000)][199999] z1 = [i for i in range(0,300000)][299999] z = x * y # z = np.multiply(x, y) return z def doit2(x): i = 0 # zarr = [math.cos(13) for i in range(1,100000)] # z = zarr[0] z = 0.1 while i < 100000: # z = math.cos(13) # z = np.multiply(x,x) # z = np.multiply(z,z) # z = np.multiply(z,z) z = z * z z = x * x z = z * z z = z * z i += 1 return z def doit3(x): z = x + 1 z = x + 1 z = x + 1 z = x + z z = x + z # z = np.cos(x) return z def stuff(): # y = np.random.randint(1, 100, size=5000000)[4999999] x = 1.01 for i in range(1,2): #10): for j in range(1,2): #10): x = doit1(x) x = doit2(x) x = doit3(x) x = 1.01 return x if __name__ == "__main__": import sys print(sys.argv) stuff()