runtime/trace: execution tracer overhaul

# Execution tracer overhaul

Authored by mknyszek@google.com with a mountain of input from others.

In no particular order, thank you to Felix Geisendorfer, Nick Ripley, Michael Pratt, Austin Clements, Rhys Hiltner, thepudds, Dominik Honnef, and Bryan Boreham for your invaluable feedback.

## Background

[Original design document](https://docs.google.com/document/d/1FP5apqzBgr7ahCCgFO-yoVhk4YZrNIDNf9RybngBc14/pub).

Go execution traces provide a moment-to-moment view of what happens in a Go program over some duration. This information is invaluable in understanding program behavior over time and can be leveraged to achieve significant performance improvements. Because Go has a runtime, it can provide deep information about program execution without any external dependencies, making traces particularly attractive for large deployments.

Unfortunately limitations in the trace implementation prevent widespread use.

For example, the process of analyzing execution traces scales poorly with the size of the trace. Traces need to be parsed in their entirety to do anything useful with them, making them impossible to stream. As a result, trace parsing and validation has very high memory requirements for large traces.

Also, Go execution traces are designed to be internally consistent, but don't provide any way to align with other kinds of traces, for example OpenTelemetry traces and Linux sched traces. Alignment with higher level tracing mechanisms is critical to connecting business-level tasks with resource costs. Meanwhile alignment with lower level traces enables a fully vertical view of application performance to root out the most difficult and subtle issues.

Lastly, the implementation of the execution tracer has evolved organically over time and it shows. The codebase also has many old warts and some [age-old](https://github.com/golang/go/issues/16755) bugs that make collecting traces difficult, and seem broken. Furthermore, many significant decision decisions were made over the years but weren't thoroughly documented; those decisions largely exist solely in old commit messages and breadcrumbs left in comments within the codebase itself.

Thanks to work in Go 1.21 cycle, the execution tracer's run-time overhead was reduced from about -10% throughput and +10% request latency in web services to about 1% in both for most applications. This reduced overhead in conjunction with making traces more scalable enables some exciting and powerful new opportunities for traces.

## Goals

The goal of this document is to define an alternative implementation for Go execution traces that scales up to large Go deployments.

Specifically, the design presented aims to achieve:

- Make trace parsing require a small fraction of the memory it requires today.
- Streamable traces, to enable analysis without storage.
- Partially self-describing traces, to reduce the upgrade burden on trace consumers.
- Fix age-old bugs and present a path to clean up the implementation.

This document also describes the existing state of the tracer in detail and explains how we got there.

## Design

[Link to design document.](https://github.com/golang/proposal/blob/master/design/60773-execution-tracer-overhaul.md)

CC @felixge @nsrip-dd @prattmic @aclements @rhysh @dominikh @bboreham @thepudds 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime/trace: execution tracer overhaul #60773

Execution tracer overhaul

Background

Goals

Design

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

runtime/trace: execution tracer overhaul #60773

Description

Execution tracer overhaul

Background

Goals

Design

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions