profiling — Python profilers

Added in version 3.15.

Source code: Lib/profiling/


Introduction to profiling

A profile is a set of statistics that describes how often and for how long various parts of a program execute. These statistics help identify performance bottlenecks and guide optimization efforts. Python provides two fundamentally different approaches to collecting this information: statistical sampling and deterministic tracing.

The profiling package organizes Python’s built-in profiling tools under a single namespace. It contains two submodules, each implementing a different profiling methodology:

profiling.sampling

A statistical profiler that periodically samples the call stack. Run scripts directly or attach to running processes by PID. Provides multiple output formats (flame graphs, heatmaps, Firefox Profiler), GIL analysis, GC tracking, and multiple profiling modes (wall-clock, CPU, GIL) with virtually no overhead.

profiling.tracing

A deterministic profiler that traces every function call, return, and exception event. Provides exact call counts and precise timing information, capturing every invocation including very fast functions.

Note

The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes. For benchmarking, use the timeit module, which provides reasonably accurate timing measurements. This distinction is particularly important when comparing Python code against C code: deterministic profilers introduce overhead for Python code but not for C-level functions, which can skew comparisons.

Choosing a profiler

For most performance analysis, use the statistical profiler (profiling.sampling). It has minimal overhead, works for both development and production, and provides rich visualization options including flamegraphs, heatmaps, GIL analysis, and more.

Use the deterministic profiler (profiling.tracing) when you need exact call counts and cannot afford to miss any function calls. Since it instruments every function call and return, it will capture even very fast functions that complete between sampling intervals. The tradeoff is higher overhead.

The following table summarizes the key differences:

Feature

Statistical sampling (profiling.sampling)

Deterministic (profiling.tracing)

Overhead

Virtually none

Moderate

Accuracy

Statistical estimate

Exact call counts

Output formats

pstats, flamegraph, heatmap, gecko, collapsed

pstats

Profiling modes

Wall-clock, CPU, GIL

Wall-clock

Special frames

GC, native (C extensions)

N/A

Attach to PID

Yes

No

When to use statistical sampling

The statistical profiler (profiling.sampling) is recommended for most performance analysis tasks. Use it the same way you would use profiling.tracing:

python -m profiling.sampling run script.py

One of the main strengths of the sampling profiler is its variety of output formats. Beyond traditional pstats tables, it can generate interactive flamegraphs that visualize call hierarchies, line-level source heatmaps that show exactly where time is spent in your code, and Firefox Profiler output for timeline-based analysis.

The profiler also provides insight into Python interpreter behavior that deterministic profiling cannot capture. Use --mode gil to identify GIL contention in multi-threaded code, --mode cpu to measure actual CPU time excluding I/O waits, or inspect <GC> frames to understand garbage collection overhead. The --native option reveals time spent in C extensions, helping distinguish Python overhead from library performance.

For multi-threaded applications, the -a option samples all threads simultaneously, showing how work is distributed. And for production debugging, the attach command connects to any running Python process by PID without requiring a restart or code changes.

When to use deterministic tracing

The deterministic profiler (profiling.tracing) instruments every function call and return. This approach has higher overhead than sampling, but guarantees complete coverage of program execution.

The primary reason to choose deterministic tracing is when you need exact call counts. Statistical profiling estimates frequency based on sampling, which may undercount short-lived functions that complete between samples. If you need to verify that an optimization actually reduced the number of function calls, or if you want to trace the complete call graph to understand caller-callee relationships, deterministic tracing is the right choice.

Deterministic tracing also excels at capturing functions that execute in microseconds. Such functions may not appear frequently enough in statistical samples, but deterministic tracing records every invocation regardless of duration.

Quick start

This section provides the minimal steps needed to start profiling. For complete documentation, see the dedicated pages for each profiler.

Statistical profiling

To profile a script, use the profiling.sampling module with the run command:

python -m profiling.sampling run script.py
python -m profiling.sampling run -m mypackage.module

This runs the script under the profiler and prints a summary of where time was spent. For an interactive flamegraph:

python -m profiling.sampling run --flamegraph script.py

To profile an already-running process, use the attach command with the process ID:

python -m profiling.sampling attach 1234

For custom settings, specify the sampling interval (in microseconds) and duration (in seconds):

python -m profiling.sampling run -i 50 -d 30 script.py

Deterministic profiling

To profile a script from the command line:

python -m profiling.tracing script.py

To profile a piece of code programmatically:

import profiling.tracing
profiling.tracing.run('my_function()')

This executes the given code under the profiler and prints a summary showing exact function call counts and timing.

Understanding profile output

Both profilers collect function-level statistics, though they present them in different formats. The sampling profiler offers multiple visualizations (flamegraphs, heatmaps, Firefox Profiler, pstats tables), while the deterministic profiler produces pstats-compatible output. Regardless of format, the underlying concepts are the same.

Key profiling concepts:

Direct time (also called self time or tottime)

Time spent executing code in the function itself, excluding time spent in functions it called. High direct time indicates the function contains expensive operations.

Cumulative time (also called total time or cumtime)

Time spent in the function and all functions it called. This measures the total cost of calling a function, including its entire call subtree.

Call count (also called ncalls or samples)

How many times the function was called (deterministic) or sampled (statistical). In deterministic profiling, this is exact. In statistical profiling, it represents the number of times the function appeared in a stack sample.

Primitive calls

Calls that are not induced by recursion. When a function recurses, the total call count includes recursive invocations, but primitive calls counts only the initial entry. Displayed as total/primitive (for example, 3/1 means three total calls, one primitive).

Caller/Callee relationships

Which functions called a given function (callers) and which functions it called (callees). Flamegraphs visualize this as nested rectangles; pstats can display it via the print_callers() and print_callees() methods.

Legacy compatibility

For backward compatibility, the cProfile module remains available as an alias to profiling.tracing. Existing code using import cProfile will continue to work without modification in all future Python versions.

Deprecated since version 3.15: The pure Python profile module is deprecated and will be removed in Python 3.17. Use profiling.tracing (or its alias cProfile) instead. See profile for migration guidance.

See also

profiling.sampling

Statistical sampling profiler with flamegraphs, heatmaps, and GIL analysis. Recommended for most users.

profiling.tracing

Deterministic tracing profiler for exact call counts.

pstats

Statistics analysis and formatting for profile data.

timeit

Module for measuring execution time of small code snippets.

Submodules