-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: reduce PGO profile processing overhead #58102
Comments
Change https://go.dev/cl/464575 mentions this issue: |
#62400 has a prototype that significantly reduce build time overhead for PGO. It is too late to be integrated in for Go 1.22. Bump to Go 1.23. |
Just as a practical example of this in the wild: I just tried to pull a production profile (from GCP cloud profiler; 16MB gzip compressed, 60MB uncompressed) of one of our services and drop it into its repo as This caused the build time (GCP Cloudbuild, e2-highcpu-32) to skyrocket from 2.5 minutes to It's clear we won't really be able to enable PGO until the PGO build overhead is lowered to more reasonable levels. |
@CAFxX Thanks for the report! https://go.dev/cl/529738 is nearly ready to go in and aims to address the majority of this issue. If you are able, it would be great if you could try it out and report whether it addresses the problem. That CL doesn't yet plumb through
Note: make sure you don't have a Edit: I submitted the CL, so now you can just pull tip. |
Change https://go.dev/cl/557458 mentions this issue: |
Change https://go.dev/cl/557460 mentions this issue: |
…file." This reverts CL 529738. Reason for revert: Breaking longtest builders For #58102. Fixes #65220. Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def Reviewed-on: https://go-review.googlesource.com/c/go/+/557460 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Will try as soon as I have a bit of more time to dedicate. update: I see it has been reverted. A couple of further notes in the meantime:
|
Change https://go.dev/cl/560035 mentions this issue: |
isPreProfileFile reads the entire file into memory just to check the first few bytes, and then throws it all away. We can avoid this by just peeking at the beginning. For #58102. Change-Id: Id2c2844e5e44a2f3a9c7cdb9a027d94d26bdf71d Reviewed-on: https://go-review.googlesource.com/c/go/+/560035 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
…file." This reverts CL 529738. Reason for revert: Breaking longtest builders For golang#58102. Fixes golang#65220. Change-Id: Id295e3249da9d82f6a9e4fc571760302a1362def Reviewed-on: https://go-review.googlesource.com/c/go/+/557460 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
isPreProfileFile reads the entire file into memory just to check the first few bytes, and then throws it all away. We can avoid this by just peeking at the beginning. For golang#58102. Change-Id: Id2c2844e5e44a2f3a9c7cdb9a027d94d26bdf71d Reviewed-on: https://go-review.googlesource.com/c/go/+/560035 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com>
Change https://go.dev/cl/569336 mentions this issue: |
Change https://go.dev/cl/569338 mentions this issue: |
Change https://go.dev/cl/569335 mentions this issue: |
Change https://go.dev/cl/569337 mentions this issue: |
Change https://go.dev/cl/569425 mentions this issue: |
Change https://go.dev/cl/569424 mentions this issue: |
Change https://go.dev/cl/569423 mentions this issue: |
This CL adjusts error handling to be a bit more idiomatic. The processing function returns errors, leaving main to log and exit on error. This CL contains no functional changes. For #58102. Change-Id: I9074127cc675e177d046474b7f01fbc37d0bd4c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/569335 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This check serves only to provide a more descriptive error if the output directory doesn't exist. That isn't useless, but I don't see why this tool specifically should do this when no other part of the toolchain does. For #58102. Change-Id: I01cf9db2cc1dad85c3afd8a6b008c53f26cb877a Reviewed-on: https://go-review.googlesource.com/c/go/+/569336 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The processing performed in cmd/preprofile is a simple version of the same initial processing performed by cmd/compile/internal/pgo. Refactor this processing into the new IR-independent cmd/internal/pgo package. Now cmd/preprofile and cmd/compile run the same code for initial processing of a pprof profile, guaranteeing that they always stay in sync. Since it is now trivial, this CL makes one change to the serialization format: the entries are ordered by weight. This allows us to avoid sorting ByWeight on deserialization. Impact on PGO parsing when compiling cmd/compile with PGO: * Without preprocessing: PGO parsing ~13.7% of CPU time * With preprocessing (unsorted): ~2.9% of CPU time (sorting ~1.7%) * With preprocessing (sorted): ~1.3% of CPU time The remaining 1.3% of CPU time approximately breaks down as: * ~0.5% parsing the preprocessed profile * ~0.7% building weighted IR call graph * ~0.5% walking function IR to find direct calls * ~0.2% performing lookups for indirect calls targets For #58102. Change-Id: Iaba425ea30b063ca195fb2f7b29342961c8a64c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/569337 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
…pgoir This helps reduce confusion with cmd/internal/pgo, which performs compilation-independent analysis. pgoir associates that data with the IR from the current package compilation. For #58102. Change-Id: I9ef1c8bc41db466d3340f41f6d071b95c09566de Reviewed-on: https://go-review.googlesource.com/c/go/+/569338 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This CL adjusts error handling to be a bit more idiomatic. The processing function returns errors, leaving main to log and exit on error. This CL contains no functional changes. For golang#58102. Change-Id: I9074127cc675e177d046474b7f01fbc37d0bd4c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/569335 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This check serves only to provide a more descriptive error if the output directory doesn't exist. That isn't useless, but I don't see why this tool specifically should do this when no other part of the toolchain does. For golang#58102. Change-Id: I01cf9db2cc1dad85c3afd8a6b008c53f26cb877a Reviewed-on: https://go-review.googlesource.com/c/go/+/569336 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The processing performed in cmd/preprofile is a simple version of the same initial processing performed by cmd/compile/internal/pgo. Refactor this processing into the new IR-independent cmd/internal/pgo package. Now cmd/preprofile and cmd/compile run the same code for initial processing of a pprof profile, guaranteeing that they always stay in sync. Since it is now trivial, this CL makes one change to the serialization format: the entries are ordered by weight. This allows us to avoid sorting ByWeight on deserialization. Impact on PGO parsing when compiling cmd/compile with PGO: * Without preprocessing: PGO parsing ~13.7% of CPU time * With preprocessing (unsorted): ~2.9% of CPU time (sorting ~1.7%) * With preprocessing (sorted): ~1.3% of CPU time The remaining 1.3% of CPU time approximately breaks down as: * ~0.5% parsing the preprocessed profile * ~0.7% building weighted IR call graph * ~0.5% walking function IR to find direct calls * ~0.2% performing lookups for indirect calls targets For golang#58102. Change-Id: Iaba425ea30b063ca195fb2f7b29342961c8a64c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/569337 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
…pgoir This helps reduce confusion with cmd/internal/pgo, which performs compilation-independent analysis. pgoir associates that data with the IR from the current package compilation. For golang#58102. Change-Id: I9ef1c8bc41db466d3340f41f6d071b95c09566de Reviewed-on: https://go-review.googlesource.com/c/go/+/569338 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The new go tool preprofile preprocesses a PGO pprof profile into an intermediate representation that is more efficient for the compiler to consume. Performing preprocessing avoids having every single compile process from duplicating the same processing. This CL prepares the initial plumbing to support automatic preprocessing by cmd/go. Each compile action takes a new dependency on a new "preprocess PGO profile" action. The same action instance is shared by all compile actions (assuming they have the same input profile), so the action only executes once. Builder.build retrieves the file to pass to -pgofile from the output of the preprocessing action, rather than directly from p.Internal.PGOProfile. Builder.buildActionID also uses the preprocess output as the PGO component of the cache key, rather than the original source. This doesn't matter for normal toolchain releases, as the two files are semantically equivalent, but it is useful for correct cache invalidation in development. For example, if _only_ go tool preprofile changes (potentially changing the output), then we must regenerate the output and then rebuild all packages. This CL does not actually invoke go tool preprocess. That will come in the next CL. For now, it just copies the input pprof profile. This CL shouldn't be submitted on its own, only with the children. Since the new action doesn't yet use the build cache, every build (even fully cached builds) unconditionally run the PGO action. For #58102. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I594417cfb0164cd39439a03977c904e4c0c83b8b Reviewed-on: https://go-review.googlesource.com/c/go/+/569423 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Following the previous CL, now actually run the preprofile tool to create the preprocessed output. There is still no build cache integration, so the tool will run on every build even if nothing has changed. For #58102. Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest Change-Id: I0414377a956889f457e50898737fcaa8a698658d Reviewed-on: https://go-review.googlesource.com/c/go/+/569424 Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Change https://go.dev/cl/613375 mentions this issue: |
As https://go.dev/doc/comment#package says, every package should have a package comment. Command cmd/preprofile had one, it was just not being recognized due to a blank line. For #51430. For #58102. Change-Id: I73e31158e0f244f6453728ab68c5c8da4cfb38b1 Reviewed-on: https://go-review.googlesource.com/c/go/+/613375 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Currently, each cmd/compile invocation parses the full PGO pprof profile, builds a full weight graph, and then determines what is relevant to that package. This is a lot of work that scales poorly with the size of the profile and the size of the build (number of packages). For particularly large profiles, this can lead to extremely long build times.
Follow-up to #55022.
cc @cherrymui @aclements
The text was updated successfully, but these errors were encountered: