Description
Original Title: proposal: support gradual code repair while moving a type between packages
Go should add the ability to create alternate equivalent names for types, in order to enable gradual code repair during codebase refactoring. This was the target of the Go 1.8 alias feature, proposed in #16339 but held back from Go 1.8. Because we did not solve the problem for Go 1.8, it remains a problem, and I hope we can solve it for Go 1.9.
In the discussion of the alias proposal, there were many questions about why this ability to create alternate names for types in particular is important. As a fresh attempt to answer those questions, I wrote and posted an article, “Codebase Refactoring (with help from Go).” Please read that article if you have questions about the motivation. (For an alternate, shorter presentation, see Robert's Gophercon lightning talk. Unfortunately, that video wasn't available online until October 9. Update, Dec 16: here's my GothamGo talk, which was essentially the first draft of the article.)
This issue is not proposing a specific solution. Instead, I want to gather feedback from the Go community about the space of possible solutions. One possible avenue is to limit aliases to types, as mentioned at the end of the article. There may be others we should consider as well.
Please post thoughts about type aliases or other solutions as comments here.
Thank you.
Update, Dec 16: Design doc for type aliases posted.
Update, Jan 9: Proposal accepted, dev.typealias repository created, implementation due at the start of the Go 1.9 cycle for experimentation.
Discussion summary (last updated 2017-02-02)
Do we expect to need a general solution that works for all declarations?
If type aliases are 100% necessary, then var aliases are maybe 10% necessary, func aliases are 1% necessary, and const aliases are 0% necessary. Because const already has = and func could plausibly use = too, the key question is whether var aliases are important enough to plan for or implement.
As argued by @rogpeppe (#16339 (comment)) and @ianlancetaylor (#16339 (comment)) in the original alias proposal and as mentioned in the article, a mutating global var is usually a mistake. It probably doesn't make sense to complicate the solution to accommodate what is usually a bug. (In fact, if we can figure out how, it would not surprise me if in the long term Go moves toward requiring global vars to be immutable.)
Because richer var aliases are likely not important enough to plan for, it seems like the right choice here is to focus only on type aliases. Most of the comments here seem to agree. I won't list everyone.
Do we need a new syntax (= vs => vs export)?
The strongest argument for new syntax is the need to support var aliases, either now or in the future (#18130 (comment) by @Merovius). It seems okay to plan not to have var aliases (see previous section).
Without var aliases, reusing = is simpler than introducing new syntax, whether => like in the alias proposal, ~ (#18130 (comment) by @joegrasse), or export (#18130 (comment) by @cznic).
Using = in would also exactly match the syntax of type aliases in Pascal and Rust. To the extent that other languages have the same concepts, it's nice to use the same syntax.
Looking ahead, there could be a future Go in which func aliases exist too (see #18130 (comment) by @nigeltao), and then all declarations would permit the same form:
const C2 = C1
func F2 = F1
type T2 = T1
var V2 = V1
The only one of these that wouldn't establish a true alias would be the var declaration, because V2 and V1 can be redefined independently as the program executes (unlike the const, func, and type declarations which are immutable). Since one main reason for variables is to allow them to vary, that exception would at least be easy to explain. If Go moves toward immutable global vars, then even that exception would disappear.
To be clear, I am not suggesting func aliases or immutable global vars here, just working through the implications of such future additions.
@jimmyfrasche suggested (#18130 (comment)) aliases for everything except consts, so that const would be the exception instead of var:
const C2 = C1 // no => form
func F2 => F1
type T2 => T1
var V2 => V1
var V2 = V1 // different from => form
Having inconsistencies with both const and var seems more difficult to explain than having just an inconsistency for var.
Can this be a tooling- or compiler-only change instead of a language change?
It's certainly worth asking whether gradual code repair can be enabled purely by side information supplied to the compiler (for example, #18130 (comment) by @btracey).
Or maybe if the compiler can apply some kind of rule-based preprocessing to transform input files before compilation (for example, #18130 (comment) by @tux21b).
Unfortunately, no, the change really can't be confined that way. There are at least two compilers (gc and gccgo) that would need to coordinate, but so would any other tools that analyze programs, like go vet, guru, goimports, gocode (code completion), and others.
As @bcmills said (#18130 (comment)), “a ‘non-language-change’ mechanism which must be supported by all implementations is a de facto language change — it’s just one with poorer documentation.”
What other uses might aliases have?
We know of the following. Given that type aliases in particular were deemed important enough for inclusion in Pascal and Rust, there are likely others.
-
Aliases (or just type aliases) would enable creating drop-in replacements that expand other packages. For example see https://go-review.googlesource.com/#/c/32145/, especially the explanation in the commit message.
-
Aliases (or just type aliases) would enable structuring a package with a small API surface but a large implementation as a collection of packages for better internal structure but still present just one package to be imported and used by clients. There's a somewhat abstract example described at Proposal: Alias declarations for Go #16339 (comment).
-
Protocol buffers have an "import public" feature whose semantics is trivial to implement in generated C++ code but impossible to implement in generated Go code. This causes frustration for authors of protocol buffer definitions shared between C++ and Go clients. Type aliases would provide a way for Go to implement this feature. In fact, the original use case for import public was gradual code repair. Similar issues may arise in other kinds of code generators.
-
Abbreviating long names. Local (unexported or not-package-scoped) aliases might be handy to abbreviate a long type name without introducing the overhead of a whole new type. As with all these uses, the clarity of the final code would strongly influence whether this is a suggested use.
What other issues does a proposal for type aliases need to address?
Listing these for reference. Not attempting to solve or discuss them in this section, although a few were discussed later and are summarized in separate sections below.
-
Handling in godoc. (all: support gradual code repair while moving a type between packages #18130 (comment) by @nigeltao and all: support gradual code repair while moving a type between packages #18130 (comment) by @jimmyfrasche)
-
Can methods be defined on types named by alias? (all: support gradual code repair while moving a type between packages #18130 (comment) by @ulikunitz)
-
If aliases to aliases are allowed, how do we handle alias cycles? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd)
-
Should aliases be able to export unexported identifiers? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd)
-
What happens when you embed an alias (how do you access the embedded field)? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd, also spec: embedding a type alias is confusing #17746)
-
Are aliases available as symbols in the built program? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd)
-
Ldflags string injection: what if we refer to an alias? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd; this only arises if there are var aliases.)
Is versioning a solution by itself?
"In that case maybe versioning is the whole answer, not type aliases."
(#18130 (comment) by @iainmerrick)
As noted in the article, I think versioning is an complementary concern. Support for gradual code repair, such as with type aliases, gives a versioning system more flexibility in how it builds a large program, which can be difference between being able to build the program and not.
Can the larger refactoring problem be solved instead?
In #18130 (comment), @niemeyer points out that there were actually two changes for moving os.Error to error: the name changed but so did the definition (the current Error method used to be a String method).
@niemeyer suggests that perhaps we can find a solution to the broader refactoring problem that fixes types moving between packages as a special case but also handles things like method names changing, and he proposes a solution built around "adapters".
There is a fair amount of discussion in the comments that I can't easily summarize here. The discussion isn't over, but so far it is unclear whether "adapters" can fit into the language or be implemented in practice. It does seem clear that adapters are at least one order of magnitude more complex than type aliases.
Adapters need a coherent solution to the subtyping problems noted below as well.
Can methods be declared on alias types?
Certainly aliases do not allow bypassing the usual method definition restrictions: if a package defines type T1 = otherpkg.T2, it cannot define methods on T1, just as it cannot define methods directly on otherpkg.T2. That is, if type T1 = otherpkg.T2, then func (T1) M() is equivalent to func (otherpkg.T2) M(), which is invalid today and remains invalid. However, if a package defines type T1 = T2 (both in the same package), then the answer is less clear. In this case, func (T1) M() would be equivalent to func (T2) M(); since the latter is allowed, there is an argument to allow the former. The current design doc does not impose a restriction here (in keeping with the general avoidance of restrictions), so that func (T1) M() is valid in this situation.
In #18130 (comment), @jimmyfrasche suggests that instead defining "no use of aliases in method definitions" would be a clear rule and avoid needing to know what T is defined as to know if func (T) M() is valid. In #18130 (comment), @rsc points out that even today there are certain T for which func (T) M() is not valid: https://play.golang.org/p/bci2qnldej. In practice this doesn't come up because people write reasonable code.
We will keep this possible restriction in mind but wait until there is strong evidence that it is needed before introducing it.
Is there a cleaner way to handle embedding and, more generally, field renames?
In #18130 (comment), @Merovius points out that an embedded type that changes its name during a package move will cause problems when that new name must eventually be adopted at the use sites. For example if user type U has an embedded io.ByteBuffer that moves to bytes.Buffer, then while U embeds io.ByteBuffer the field name is U.ByteBuffer, but when U is updated to refer to bytes.Buffer, the field name necessarily changes to U.Buffer.
In #18130 (comment), @neild points out that there is at least a workaround if references to io.ByteBuffer must be excised: the package P that defines U can also define 'type ByteBuffer = bytes.Buffer' and embed that type into U. Then U still has a U.ByteBuffer, even after io.ByteBuffer is gone entirely.
In #18130 (comment), @bcmills suggests the idea of field aliases, to allow a field to have multiple names during a gradual repair. Field aliases would allow defining something like type U struct { bytes.Buffer; ByteBuffer = Buffer }
instead of having to create the top-level type alias.
In #18130 (comment), @rsc raises yet another possibility: some syntax for 'embed this type with this name', so that it is possible to embed a bytes.Buffer as the field name ByteBuffer, without needing a top-level type or an alternate name. If that existed, then the type name could be updated from io.ByteBuffer to bytes.Buffer while preserving the original name (and not introducing a second, nor a clumsy exported type).
These all seem worth exploring once we have more evidence of large-scale refactorings blocked by problems with fields changing names. As @rsc wrote, "If type aliases help us get to the point where lack of field aliases is the next big roadblock for large-scale refactorings, that will be progress!"
There was a suggestion of restricting the use of aliases in embedded fields or changing the embedded name to use the target type's name, but those make the alias introduction break existing definitions that must then be fixed atomically, essentially preventing any gradual repair. @rsc: "We discussed this at some length in #17746. I was originally on the side of the name of an embedded io.ByteBuffer alias being Buffer, but the above argument convinced me I was wrong. @jimmyfrasche in particular made some good arguments about the code not changing depending on the definition of the embedded thing. I don't think it's tenable to disallow embedded aliases completely."
What is the effect on programs using reflection?
Programs using reflection see through aliases. In #18130 (comment), @atdiar points out that if a program is using reflection to, for example, find the package in which a type is defined or even the name of a type, it will observe the change when the type is moved, even if a forwarding alias is left behind. In #18130 (comment), @rsc confirmed this and wrote "Like the situation with embedding, it's not perfect. Unlike the situation with embedding, I don't have any answers except maybe code shouldn't be written using reflect to be quite that sensitive to those details."
The use of vendored packages today also changes package import paths seen by reflect, and we have not been made aware of significant problems caused by that ambiguity. This suggests that programs are not commonly inspecting reflect.Type.PkgPath in ways that would be broken by use of aliases. Even so, it's a potential gap, just like embedding.
What is the effect on separate compilation of programs and plugins?
In #18130 (comment), @atdiar raises the question of the effect on object files and separate compilation. In #18130 (comment), @rsc replies that there should be no need to make changes here: if X imports Y and Y changes and is recompiled, then X needs to be recompiled too. That's true today without aliases, and it will remain true with aliases. Separate compilation means being able to compile X and Y in distinct steps (the compiler does not have to process them in the same invocation), not that it is possible to change Y without recompiling X.
Would sum types or some kind of subtyping be an alternative solution?
In #18130 (comment), @iand suggests "substitutable types", "a list of types that may be substituted for the named type in function arguments, return values etc.". In #18130 (comment), @j7b suggests using algebraic types "so we also get an empty interface equivalent with compile time type checking as a bonus". Other names for this concept are sum types and variant types.
In general this does not suffice to allow moving types with gradual code repair. There are two ways to think about this.
In #18130 (comment), @bcmills takes the concrete way, pointing out that algebraic types have a different representation than the original, which makes it not possible to treat the sum and the original as interchangeable: the latter has type tags.
In #18130 (comment), @rsc takes the theoretical way, expanding on #18130 (comment) by @gri pointing out that in a gradual code repair, sometimes you need T1 to be a subtype of T2 and sometimes vice versa. The only way for both to be subtypes of each other is for them to be the same type, which not concidentally is what type aliases do.
As a side tangent, in addition to not solving the gradual code repair problem, algebraic types / sum types / union types / variant types are by themselves hard to add to Go. See
the FAQ answer and the Go 1.6 AMA discussion for more.
In #18130 (comment), @thwd suggests that since Go has a subtyping relationship between concrete types and interfaces (bytes.Buffer can be seen as a subtype of io.Reader) and between interfaces (io.ReadWriter is a subtype of io.Reader in the same way), making interfaces "recursively covariant (according to the current variance rules) down to their method arguments" would solve the problem provided that all future packages only use interfaces, never concrete types like structs ("encourages good design, too").
There are three problems with that as a solution. First, it has the subtyping issues above, so it doesn't solve gradual code repair. Second, it doesn't apply to existing code, as @thwd noted in this suggestion. Third, forcing the use of interfaces everywhere may not actually be good design and introduces performance overheads (see for example #18130 (comment) by @Merovius and #18130 (comment) by @zombiezen).
Restrictions
This section collects proposed restrictions for reference, but keep in mind that restrictions add complexity. As I wrote in #18130 (comment), "we should probably only implement those restrictions after actual experience with the unrestricted, simpler design helps us understand whether the restriction would bring enough benefits to pay for its cost."
Put another way, any restriction would need to be justified by evidence that it would prevent some serious misuse or confusion. Since we haven't implemented a solution yet, there is no such evidence. If experience did provide that evidence, these will be worth returning to.
Restriction? Aliases of standard library types can only be declared in standard library.
(#18130 (comment) and #18130 (comment) by @iand)
The concern is "code that has renamed standard library concepts to fit a custom naming convention", or "long spaghetti chains of aliases across multiple packages that end up back at the standard library", or "aliasing things like interface{} and error".
As stated, the restriction would disallow the "extension package" case described above involving x/image/draw.
It's unclear why the standard library should be special: the problems would exist with any code. Also, neither interface{} nor error is a type from the standard library. Rephrasing the restriction as "aliasing predefined types" would disallow aliasing error, but the need to alias error was one of the motivating examples in the article.
Restriction? Alias target must be package-qualified identifier.
(#18130 (comment) by @jba)
This would make it impossible to make an alias when renaming a type within a package, which may be used widely enough to necessitate a gradual repair (#18130 (comment) by @bcmills).
It would also disallow aliasing error as in the article.
Restriction? Alias target must be package-qualified identifier with same name as alias.
(proposed during alias discussion in Go 1.8)
In addition to the problems of the previous section with limiting to package-qualified identifiers, forcing the name to stay the same would disallow the conversion from io.ByteBuffer to bytes.Buffer in the article.
Restriction? Aliases should be discouraged in some way.
"How about hiding aliases behind an import, just like for "C" and “unsafe”, to further discourage it's usage? In the same vein, I would like the aliases syntax to be verbose and stand out as a scaffold for on going refactoring." - #18130 (comment) by @xiegeo
"Should we also automatically infer that an aliased type is legacy and should be replaced by the new type? If we enforce golint, godoc and similar tools to visualize the old type as deprecated, it would limit the abuse of type aliasing very significantly. And the final concern of aliasing feature being abused would be resolved." - #18130 (comment) by @rakyll
Until we know that they will be used wrong, it seems premature to discourage usage. There may be good, non-temporary uses (see above).
Even in the case of code repair, either the old or new type may be the alias during the transition, depending on the constraints imposed by the import graph. Being an alias does not mean the name is deprecated.
There is already a mechanism for marking certain declarations as deprecated (see #18130 (comment) by @jimmyfrasche).
Restriction? Aliases must target named types.
"Aliases shouldn't not apply to unnamed type. Their is no "code repair" story in moving from one unnamed type to another. Allowing aliases on unnamed types means I can no longer teach Go as simply named and unnamed types." - #18130 (comment) by @davecheney
Until we know that they will be used wrong, it seems premature to discourage usage. There may be good uses with unnamed targets (see above).
As noted in the design doc, we do expect to change the terminology to make the situation clearer.