-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/trace: flight recording #63185
Comments
I agree with all of the advantages of having flight recording in the runtime. As for
I think that program logic will be the far more useful trigger for the majority of use cases, and certainly easier to make use of. On top of that, this flight recorder will be much more straightforward to integrate with existing telemetry solutions, compared to external solutions, which will either not be designed for that (Gotraceui) or will have to be written first. |
Just to say the most frequent "I wish I knew what happened just before" for me is OOM, which isn't (AFAIK) something you can trigger on once the kernel makes its decision. Maybe we could trigger on hitting GOMEMLIMIT? |
Nice! How can we get CPU profile samples to show up in the flight-recorder data stream? The existing APIs for execution traces and CPU profiles have a start time and an end time, so those two pair well together. For this it seems that we'd need a way to ask for The panics in
You've said that the data from |
It turns out that's quite difficult to do because Linux provides no opportunity to dump any information when an OOM occurs; the OOM killer simply SIGKILLs processes. Unfortunately With that being said, there are still some best-effort things we might be able to do. Programs that don't use Either way though, it's still best-effort. Unless the snapshotting fully stops the world, the program may continue executing and OOM before the trace actually gets fully dumped. (Even then, it's still possible (but less likely) that it OOMs before the runtime successfully stops every running goroutine.) Once upon a time there was a patch proposed to Linux to allow for halting a process when it hit container limits, so that another process in a tiny memory reservation could inspect it. One could imagine that if this other process created a core dump of the halted process, we could write some code to extract any active trace buffers from the core dump into a trace. I'm not sure where this leaves us. Perhaps it suggests that |
This comment was marked as off-topic.
This comment was marked as off-topic.
If you happen to have CPU profiling running, it'll just work, but you make a good point that there's no good way to have it included all the time (i.e. no intention of producing a CPU profile). It seems to me like that should maybe be another option, either on the
Hm, that's a good point. I'll update the API to return an error on
There's currently no list of all the Ms per generation because we don't have M events, but yes, everything within a partition is namespaced by that partition's "generation number." The proposed trace parsing API exposes partition changes as a FWIW, the trace parsing API already does this exact kind of "stitching." Every partition is an entirely self-contained trace, which means all goroutines (and Ps) and their statuses get named in every partition. The trace parser uses this information to validate the stitching: new partitions' goroutine statuses need to match where that goroutine left off in the previous partition. |
This proposal has been added to the active column of the proposals project |
I think this is waiting on an implementation and experience from using that implementation. |
That's correct. I plan to have one in golang.org/x/exp/trace soon. |
I made a mistake in the commit message so gopherbot couldn't connect the CL to this issue, but the experimental API has now landed via https://go.dev/cl/550257. It is available in golang.org/x/exp/trace for Go 1.22 only. There are a few caveats with this implementation that will not be true with a real runtime implementation.
I don't think any of these are significant enough to detract from the usefulness of the experiment, but I wanted to bring it up in case one of these does become an issue. We can also explore ways to improve the experiment to make it more representative, if one of them is indeed a problem. Please give it a try! |
The One limitation I encountered is that the As for preserving traces after OOM, one possible approach could be to use a separate agent process with a shared memory buffer to write the traces into. That agent could then detect that the main process crashed and write the contents of the memory buffer into a file or other storage. However, this would require keeping all the flight recorder data in a single pre-allocated buffer. |
Thanks for trying it out!
Yeah, that's a limitation of the experiment which lives out-of-tree for convenience, but it technically shouldn't be necessary if it's implemented in the runtime. I'd hoped to allow one of each type of consumer (one active flight recorder, one active |
@mknyszek , we've discussed allowing the caller to control the set of events that the trace includes. That would enable runtime maintainers to use the execution trace to grant access to events that are niche and expensive, or in the other direction would enable low-overhead access to a subset of events. Maybe this is how we build/improve delta profiles for #67942, and how we give access to goroutine tags/labels in monotonic profiles (heap, mutex, etc) for #23458. If that's the direction we go, I think we'll want either a/ multiple |
Multiple The only way I can imagine this working efficiently is via something like https://go.dev/cl/594595. TL;DR: We have some light infrastructure for emitting "experimental" events now, and this CL formalizes that a bit better by allowing us to split the event stream. It's at least a step toward that. If we have separate buffer streams, you could imagine a new registration consists of an It's certainly not off the table, but I think we're a little ways off still. |
I'd like to point out that in Linux it's now possible to be actively informed by the kernel through an epoll event if a control group is under memory pressure through the PSI subsystem. https://systemd.io/MEMORY_PRESSURE/ This could be used as a signal to dump a trace and/or to trigger GC proactively. |
This has been "waiting for implementation experience". Do we have enough experience to think about whether to add this to Go 1.24? |
We've had some experience with flight recording inside Google, and we've also gotten some good feedback from @martin-sucha. I think a few things are clear from our experiences so far:
Much of this can be refined and improved once we have an implementation that lives fully inside the runtime. Lastly, from my point of view, I don't know what other feedback we would be waiting on that this point. CC @cagedmantis |
Yes, it cuts off an on-going generation immediately so you have up-to-date data. I don't think that's necessary to document, really, since I think that's the most intuitive and useful behavior it could have. Though I guess we could say something like "best-effort up-to-date data as of when WriteTo returns" so nobody has to second-guess it.
Calling multiple Multiple |
@mknyszek , would you mind posting the latest version of the full proposed API as a self-contained comment? Thanks! |
This commit adds the ability to capture execution traces from the past few seconds of execution when something seems wrong. Often when a timer fires and we detect something is wrong, the relevant information is already lost. The new flight recorder in go golang/go#63185 creates a ring buffer that enables capturing these traces. This commit adds the capability to capture traces but doesn't enable it anywhere. There is a small performance cost of having the flight recorder always enabled, so some performance testing is required to determine if we need to protect this behind a cluster setting. Epic: none Release note: None
This commit adds the ability to capture execution traces from the past few seconds of execution when something seems wrong. Often when a timer fires and we detect something is wrong, the relevant information is already lost. The new flight recorder in go golang/go#63185 creates a ring buffer that enables capturing these traces. This commit adds the capability to capture traces but doesn't enable it anywhere. There is a small performance cost of having the flight recorder always enabled, so some performance testing is required to determine if we need to protect this behind a cluster setting. Epic: none Release note: None
Bump 😊 |
package trace
// FlightRecorder represents a single consumer of a Go execution
// trace.
// It tracks a moving window over the execution trace produced by
// the runtime, always containing the most recent trace data.
//
// At most one flight recorder may be active at any given time,
// though flight recording is allowed to be concurrently active
// with a trace consumer using trace.Start.
// This restriction of only a single flight recorder may be removed
// in the future.
type FlightRecorder struct {
...
}
// NewFlightRecorder creates a new flight recorder from the provided configuration.
func NewFlightRecorder(cfg FlightRecorderConfig) *FlightRecorder
// Start activates the flight recorder and begins recording trace data.
// Currently, only one flight recorder and one call to trace.Start may be
// active simultaneously.
// This restriction may be lifted in the future.
// Returns an error if Start is called multiple times on the same
// FlightRecorder, or if Start would cause there to be more
// simultaneously active trace consumers than is currently supported.
func (*FlightRecorder) Start() error
// Stop ends recording of trace data. It blocks until any concurrent WriteTo calls complete.
func (*FlightRecorder) Stop()
// Enabled returns true if the flight recorder is active.
// Specifically, it will return true if Start did not return an error, and Stop has not yet been called.
// It is safe to call from multiple goroutines simultaneously.
func (*FlightRecorder) Enabled() bool
// WriteTo snapshots the moving window tracked by the flight recorder.
// The snapshot is expected to contain data that is up-to-date as of when WriteTo is called,
// though this is not a hard guarantee.
// Only one goroutine may execute WriteTo at a time.
// An error is returned upon failure to write to w, if another WriteTo call is already in-progress,
// or if the flight recorder is inactive.
func (*FlightRecorder) WriteTo(w io.Writer) (n int64, err error)
type FlightRecorderConfig struct {
// MinAge is a lower bound on the age of an event in the flight recorder's window.
//
// The flight recorder will strive to promptly discard events older than the minimum age,
// but older events may appear in the window snapshot. The age setting will always be
// overridden by MaxSize.
//
// If this is 0, the minimum age is implementation defined, but can be assumed to be on the order
// of seconds.
MinAge time.Duration
// MaxBytes is an upper bound on the size of the window in bytes.
//
// This setting takes precedence over MinAge.
// However, it does not make any guarantees on the size of the data WriteTo will write,
// nor does it guarantee memory overheads will always stay below MaxBytes. Treat it
// as a hint.
//
// If this is 0, the maximum size is implementation defined.
MaxBytes uint64
} |
Looks good, thanks! Just a nitpick, there is |
I wonder what a user would do with the error from |
This comment was marked as outdated.
This comment was marked as outdated.
I'm not sure what this means. Given that you snapshot a flight recorder after something unexpected has happened, so it needs to already be running, so how can you have a flight recorder for each behavior?
"Consuming" seems like an implementation detail. "Recording"?
I could imagine other error categories from |
I think there's a misunderstanding here. It was just supposed to be some advice as to how to actually use it, but it imagines a world where you can have multiple independent Anyway, such a world will not exist to begin with, so I'll remove that from the API doc.
Sure, will update.
I agree. I'll remove the error. |
I have updated #63185 (comment) to include the latest feedback. |
For |
This commit adds the ability to capture execution traces from the past few seconds of execution when something seems wrong. Often when a timer fires and we detect something is wrong, the relevant information is already lost. The new flight recorder in go golang/go#63185 creates a ring buffer that enables capturing these traces. This commit adds the capability to capture traces but doesn't enable it anywhere. There is a small performance cost of having the flight recorder always enabled, so some performance testing is required to determine if we need to protect this behind a cluster setting. Epic: none Release note: None
In |
I loosened up the language in the proposed API docs for |
A minor point: would it be better for |
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add support for a trace flight recorder to the package trace
// FlightRecorder represents a single consumer of a Go execution
// trace.
// It tracks a moving window over the execution trace produced by
// the runtime, always containing the most recent trace data.
//
// At most one flight recorder may be active at any given time,
// though flight recording is allowed to be concurrently active
// with a trace consumer using trace.Start.
// This restriction of only a single flight recorder may be removed
// in the future.
type FlightRecorder struct {
...
}
// NewFlightRecorder creates a new flight recorder from the provided configuration.
func NewFlightRecorder(cfg FlightRecorderConfig) *FlightRecorder
// Start activates the flight recorder and begins recording trace data.
// Only one call to trace.Start may be active at any given time.
// In addition, currently only one flight recorder may be active in the program.
// Returns an error if the flight recorder cannot be started or is already started.
func (*FlightRecorder) Start() error
// Stop ends recording of trace data. It blocks until any concurrent WriteTo calls complete.
func (*FlightRecorder) Stop()
// Enabled returns true if the flight recorder is active.
// Specifically, it will return true if Start did not return an error, and Stop has not yet been called.
// It is safe to call from multiple goroutines simultaneously.
func (*FlightRecorder) Enabled() bool
// WriteTo snapshots the moving window tracked by the flight recorder.
// The snapshot is expected to contain data that is up-to-date as of when WriteTo is called,
// though this is not a hard guarantee.
// Only one goroutine may execute WriteTo at a time.
// An error is returned upon failure to write to w, if another WriteTo call is already in-progress,
// or if the flight recorder is inactive.
func (*FlightRecorder) WriteTo(w io.Writer) (n int64, err error)
type FlightRecorderConfig struct {
// MinAge is a lower bound on the age of an event in the flight recorder's window.
//
// The flight recorder will strive to promptly discard events older than the minimum age,
// but older events may appear in the window snapshot. The age setting will always be
// overridden by MaxSize.
//
// If this is 0, the minimum age is implementation defined, but can be assumed to be on the order
// of seconds.
MinAge time.Duration
// MaxBytes is an upper bound on the size of the window in bytes.
//
// This setting takes precedence over MinAge.
// However, it does not make any guarantees on the size of the data WriteTo will write,
// nor does it guarantee memory overheads will always stay below MaxBytes. Treat it
// as a hint.
//
// If this is 0, the maximum size is implementation defined.
MaxBytes uint64
} |
My feeling is that we tend to use And of course you're right, this is totally in-memory. I don't feel very strongly about this either way, since I don't have a strong sense of what the norms are. Signed integers are a little annoying because then we have to say what negative means (probably also "implementation defined"), but we use |
I don't feel strongly about it either. uint64 is probably fine. |
No change in consensus, so accepted. 🎉 The proposal is to add support for a trace flight recorder to the package trace
// FlightRecorder represents a single consumer of a Go execution
// trace.
// It tracks a moving window over the execution trace produced by
// the runtime, always containing the most recent trace data.
//
// At most one flight recorder may be active at any given time,
// though flight recording is allowed to be concurrently active
// with a trace consumer using trace.Start.
// This restriction of only a single flight recorder may be removed
// in the future.
type FlightRecorder struct {
...
}
// NewFlightRecorder creates a new flight recorder from the provided configuration.
func NewFlightRecorder(cfg FlightRecorderConfig) *FlightRecorder
// Start activates the flight recorder and begins recording trace data.
// Only one call to trace.Start may be active at any given time.
// In addition, currently only one flight recorder may be active in the program.
// Returns an error if the flight recorder cannot be started or is already started.
func (*FlightRecorder) Start() error
// Stop ends recording of trace data. It blocks until any concurrent WriteTo calls complete.
func (*FlightRecorder) Stop()
// Enabled returns true if the flight recorder is active.
// Specifically, it will return true if Start did not return an error, and Stop has not yet been called.
// It is safe to call from multiple goroutines simultaneously.
func (*FlightRecorder) Enabled() bool
// WriteTo snapshots the moving window tracked by the flight recorder.
// The snapshot is expected to contain data that is up-to-date as of when WriteTo is called,
// though this is not a hard guarantee.
// Only one goroutine may execute WriteTo at a time.
// An error is returned upon failure to write to w, if another WriteTo call is already in-progress,
// or if the flight recorder is inactive.
func (*FlightRecorder) WriteTo(w io.Writer) (n int64, err error)
type FlightRecorderConfig struct {
// MinAge is a lower bound on the age of an event in the flight recorder's window.
//
// The flight recorder will strive to promptly discard events older than the minimum age,
// but older events may appear in the window snapshot. The age setting will always be
// overridden by MaxSize.
//
// If this is 0, the minimum age is implementation defined, but can be assumed to be on the order
// of seconds.
MinAge time.Duration
// MaxBytes is an upper bound on the size of the window in bytes.
//
// This setting takes precedence over MinAge.
// However, it does not make any guarantees on the size of the data WriteTo will write,
// nor does it guarantee memory overheads will always stay below MaxBytes. Treat it
// as a hint.
//
// If this is 0, the maximum size is implementation defined.
MaxBytes uint64
} |
Proposal: runtime/trace flight recording
Updated: 23 September 2023
Background
"Flight recording" is a technique in which trace data is kept in a conceptual circular buffer, flushed upon request. The purpose of this technique is to capture traces of interesting program behavior, even when one does not know ahead of time when that will happen. For instance, if the web service fails a health check, or the web service takes an abnormally long time to handle a request. Specifically, the web service can identify such conditions when they happen, but the programmer setting up the environment can't predict when exactly they will occur. Starting tracing after something interesting happens also tends not to be useful, because the program has already executed the interesting part.
The Java ecosystem has had this for years through Java's flight recorder. Once the JVM's flight recorder is enabled, the JVM can obtain a trace representing the last few seconds of time. This trace can come from triggers set up in JMX, or by passing a flag to the JVM that dumps a trace on exit.
With the implementation of #60773 now approaching a stable point, hopefully in Go 1.22 we'll have all traces become series of self-contained partitions. This implementation change presents an opportunity to easily add something similar to the Go execution tracer by always retaining at least one partition that can be snapshotted at any time.
This is also enabled by work in the Go 1.21 release to make traces dramatically cheaper. Because flight recording relies on waiting until something interesting happens, tracing needs to be enabled for a much longer period of time. Enabling flight recording across, for example, a small portion of a production fleet, becomes much more palatable when the tracing itself isn't too expensive.
Design
The core of the design is a new API in the
runtime/trace
package to enable flight recording. This means that programs can be instrumented with their own triggers.Implementation
Most of the implementation work has already been done. I think it would be OK to ship the implementation in golang.org/x/exp/trace, even though it has some efficiencies (like, having to copy buffers outside the runtime). We could make this more efficient by reference-counting the runtime buffers to avoid a copy.
The main thing that definitely needs to change, however, is that the flight recorder needs to be able to run simultaneously with a call to
trace.Start
, which is currently not possible, sincegolang.org/x/exp/trace
usestrace.Start
itself. This can be implemented by having the trace reader goroutine (misnomer) write buffers to multiple writers. All we need to do is calltraceAdvance
simultaneously with adding a new writer, so that the new writer always starts receiving data on a generation boundary.Discussion
SetMinAge
andSetMaxBytes
could give more rigid guarantees, but it's both complex to implement and not terribly useful. The primary use-case forSetMinAge
is to allow users to ask for longer traces (for example, if a web service's "long request" means something much longer than a handful of seconds). Meanwhile the primary use-case ofSetMaxBytes
is to control memory overheads and limit the damage caused by a largeSetMinAge
.WriteTo
could allow multiple goroutines to call it since it could easily serialize them internally. However, this can create some bad situations. For instance, consider some snapshot trigger condition that causes multiple goroutines to callWriteTo
. The call is heavyweight and they'll queue up behind each other; the longest one will likely take quite a while to resolve, and the application will be significantly disrupted. It'll also produce traces that aren't very useful (consisting of short partitions corresponding approximately to the duration of the lastWriteTo
call) unless we also allow for multiple goroutines to read the same partition's buffers. However, that's going to be fairly complicated to implement, and also doesn't really add much value either, since it's just duplicate data. The current design encourages callers reduces the risk of run-time panics while also side-stepping these issues by returning an error in this case.Alternatives considered
External circular buffer
@dominikh has suggested adding a similar feature to gotraceui. Because the partitioning is actually baked into the trace's structure, it's possible for trace consumers to implement something similar themselves. The only thing missing is a standard streaming endpoint for execution traces (see follow-up work).
However, there are a few advantages to adding support directly to the standard library and runtime.
runtime/trace
package's annotations, but can instead decide when to grab a snapshot directly.SetPeriod
andSetSize
). Any external solution will be at the mercy of the runtime's defaults.Despite these advantages, it's likely worth pursuing support for such a use-case even if the API described in this proposal is made available. A shortcoming of this document's proposal is that there's no way to trigger a snapshot explicitly against trace data, only program logic. Handling traces externally also means the ability to perform ad-hoc analyses without the need for additional instrumentation.
Follow-up work
Add support for trace streaming endpoints to
net/http/pprof
As stated in the discussion of the "external circular buffer" alternative, we could support that alternative easily and well by just adding a standard debugging endpoint for streaming trace data. It probably makes the most sense to just add new query parameters to the existing
trace
endpoint; the details of that can be hashed out elsewhere.The text was updated successfully, but these errors were encountered: