Description
Proposal
a = slices.Delete(a, i, j)
should zero the (j
- i
) elements after the length of the returned slice
TL;DR zeroing the elements is error-prone, it should be taken care of by the stdlib function.
Chronology
2012 Go 1.0 released. It doesn’t have generics.
2022 Go 1.18 released. It has generics.
2022 The golang.org/x/exp/slices package has Delete.
2023 Delete is promoted to the new stdlib package slices.
Current implementation
slices.Delete
achieves its job in-place by
- shifting
M
=len(a)-j
tail elements to the left, by copy - returning a slice of the original slice, having smaller or equal length
See its source.
Problem: memory
If the elements have a reference type (pointer, map, slice, chan, string), or contain fields having a reference type, then (j
-i
) elements at the end of the original slice are left untouched and keep referencing some objects. They will not be garbage collected, and they may enjoy a lifetime far exceeding the intent of the developer.
This is a memory leak.
Impact
Slices containing only value types are not impacted. Their memory footprint is the same, regardless of any zeroing.
Some programs will have a memory leak without noticing and without serious consequences, when the objects are few and small.
Some programs will have serious memory leaks, and find it difficult to properly diagnose. The consequences are crashes (OOM), increased GC pressure, overall bad performance, higher costs associated with more memory utilization and more server instances, and higher costs associated with debugging and fixing the leak.
Indirect impact
Some developers do want to address this and write zero values themselves (see documentation recommendation below). But they risk writing incorrect code, turning a mild performance concern into a serious correctness problem.
Partial solution: documentation
The documentation of slices.Delete
explicitly states:
Delete might not modify the elements s[len(s)-(j-i):len(s)]. If those elements contain pointers you might consider zeroing those elements so that objects they reference can be garbage collected.
Problem: zeroing is difficult
In order to properly prevent a memory leak, the burden is on the developer to:
- figure out that there is a risk (memory leak) that requires action on their part,
- determine if they should do some zeroing before calling Delete, or after calling Delete,
- properly compute the indices of the elements to be zeroed,
- figure out how to properly access the elements beyond the length of the returned slice.
This is error-prone.
When dealing with pointer zeroing, the Delete API is both hard to use and easy to misuse.
Cognitive burden
Incorrect code may be written in good faith in a context where several non-trivial concepts must be properly understood by the developers.
-
correct utilization: the developers need to understand what a slice is and how slice headers work, in order to understand the API design “pass a slice as argument, get a slice as return value”
- this is similar to the API of append
-
developers are implicitly expecting the stdlib to “do the right thing” (principle of least astonishment)
-
the perceived “right thing” is not universal: some want performance by default, some want correctness and safety by default
- objects not being GCed (memory leak) feels like a correctness/safety issue, but the consequences are mostly a performance issue.
-
-
even for skilled developers, the list of conditions (listed in previous section) to properly address the problem will too often not be fully met
-
the need for such manual workarounds partially defeats the purpose of having a stdlib function.
Suggested implementation
// Delete removes the elements s[i:j] from s, returning the modified slice.
// Delete panics if s[i:j] is not a valid slice of s.
// Delete is O(len(s)-i), so if many items must be deleted, it is better to
// make a single call deleting them all together than to delete one at a time.
func Delete[S ~[]E, E any](s S, i, j int) S {
_ = s[i:j] // bounds check
s = append(s[:i], s[j:]...)
clear(s[len(s):len(s)+j-i]) // zero/nil out the obsolete elements, for GC
return s
}
Impacted API
Same changes to these 4 5 functions:
- slices.Delete
- slices.DeleteFunc
- slices.Compact
- slices.CompactFunc
- (edit) slices.Replace
Backward compatibility
Changing the behavior of slices.Delete
is backward compatible. Its current documentation does not promise to zero or not zero the obsolete elements.
Cost
Runtime overhead of having slices.Delete
zero the obsolete elements, per this benchmark:
When deleting 1 element
~1% (this is a common use case)
When deleting a large range of elements
~30%
The GC work of "collecting" obsolete objects when they will become unreachable is expected to be very small, and is not specifically attributable to slices.Delete
(just normal GC work).
Pointers vs. Values
The proposal is mostly useful for slices whose elements are pointers, or contain pointers.
Should we try to keep the existing behaviour for slices of value elements (e.g. int, or a struct of values)?
Probably not, for 2 reasons:
- Principle of least astonishment: if we zero the pointers, we should zero the values as well.
- It is not possible to express in Go generic code “do this operation, but only if the type parameter E is a pointer or contains pointer”
List ownership
We could argue that any use of a subslice s[i:j]
always leaves elements not set to zero, before i
and after j
.
If slicing doesn’t do any zeroing, should Delete have to do it?
One major difference is that it makes sense to have various parts of a program use various slices (“views”, “windows”) on the same backing array. The parts need to be careful, as changes made by one may affect the other parts. Using append in such a context is subtle, with reallocation happening transparently. The ownership and responsibility are shared. An arbitrary number of agents only reading and slicing is always fine, not mutating the backing array contents.
Delete
is different because it is destructive by nature. No part of the program is supposed to make sense of “a slice where some elements were shifted, and some obsolete elements were left in place” by some other code. Only the slice returned by Delete
makes sense, and any other slice of the same backing array can be regarded as obsolete. The caller of Delete
gets de facto the ownership of the backing array.
What other languages do
Java
- ArrayList.remove() nulls out the tail element: src
- ArrayList.removeRange() nulls out the tail elements: src
C#
- List<T>.RemoveAt() nulls out the tail element src
- List<T>.RemoveRange() nulls out the tail elements: src
C++
- Vector.erase() does not null out the tail element: demo
Edit: precision about a[len(a):cap(a)]
It is out of scope of the current proposal to write anything to the original tail (the elements v17
, v18
, v19
in the diagram).
We must not assume ownership of the elements beyond the original length, and we must not zero them out. They might be used for legit purpose by other parts of the program, in theory.