Skip to content

slices: have Delete and others clear the tail #63393

Closed
@Deleplace

Description

@Deleplace

Proposal

a = slices.Delete(a, i, j)  should zero the (j - i) elements after the length of the returned slice

TL;DR zeroing the elements is error-prone, it should be taken care of by the stdlib function.

Chronology

2012  Go 1.0 released. It doesn’t have generics.

2022  Go 1.18 released. It has generics.

2022  The golang.org/x/exp/slices package has Delete.

2023  Delete is promoted to the new stdlib package slices.

Current implementation

slices.Delete achieves its job in-place by

  • shifting M=len(a)-j tail elements to the left, by copy
  • returning a slice of the original slice, having smaller or equal length

See its source.

Problem: memory

If the elements have a reference type (pointer, map, slice, chan, string), or contain fields having a reference type, then (j-i) elements at the end of the original slice are left untouched and keep referencing some objects. They will not be garbage collected, and they may enjoy a lifetime far exceeding the intent of the developer.

This is a memory leak.

Impact

Slices containing only value types are not impacted. Their memory footprint is the same, regardless of any zeroing.

Some programs will have a memory leak without noticing and without serious consequences, when the objects are few and small.

Some programs will have serious memory leaks, and find it difficult to properly diagnose. The consequences are crashes (OOM), increased GC pressure, overall bad performance, higher costs associated with more memory utilization and more server instances, and higher costs associated with debugging and fixing the leak.

Indirect impact

Some developers do want to address this and write zero values themselves (see documentation recommendation below). But they risk writing incorrect code, turning a mild performance concern into a serious correctness problem.

Partial solution: documentation

The documentation of slices.Delete explicitly states:

Delete might not modify the elements s[len(s)-(j-i):len(s)]. If those elements contain pointers you might consider zeroing those elements so that objects they reference can be garbage collected.

Problem: zeroing is difficult

In order to properly prevent a memory leak, the burden is on the developer to:

  • figure out that there is a risk (memory leak) that requires action on their part,
  • determine if they should do some zeroing before calling Delete, or after calling Delete,
  • properly compute the indices of the elements to be zeroed,
  • figure out how to properly access the elements beyond the length of the returned slice.

This is error-prone.

When dealing with pointer zeroing, the Delete API is both hard to use and easy to misuse.

Cognitive burden

Incorrect code may be written in good faith in a context where several non-trivial concepts must be properly understood by the developers.

  • correct utilization: the developers need to understand what a slice is and how slice headers work, in order to understand the API design “pass a slice as argument, get a slice as return value”

    • this is similar to the API of append
  • developers are implicitly expecting the stdlib to “do the right thing” (principle of least astonishment)

    • the perceived “right thing” is not universal: some want performance by default, some want correctness and safety by default

      • objects not being GCed (memory leak) feels like a correctness/safety issue, but the consequences are mostly a performance issue.
  • even for skilled developers, the list of conditions (listed in previous section) to properly address the problem will too often not be fully met

  • the need for such manual workarounds partially defeats the purpose of having a stdlib function.

Suggested implementation

// Delete removes the elements s[i:j] from s, returning the modified slice.
// Delete panics if s[i:j] is not a valid slice of s.
// Delete is O(len(s)-i), so if many items must be deleted, it is better to
// make a single call deleting them all together than to delete one at a time.
func Delete[S ~[]E, E any](s S, i, j int) S {
    _ = s[i:j] // bounds check

    s = append(s[:i], s[j:]...)
    clear(s[len(s):len(s)+j-i]) // zero/nil out the obsolete elements, for GC
    return s
}

Impacted API

Same changes to these 4 5 functions:

  • slices.Delete
  • slices.DeleteFunc
  • slices.Compact
  • slices.CompactFunc
  • (edit) slices.Replace

Backward compatibility

Changing the behavior of slices.Delete is backward compatible. Its current documentation does not promise to zero or not zero the obsolete elements.

Cost

Runtime overhead of having slices.Delete zero the obsolete elements, per this benchmark:

When deleting 1 element

~1%  (this is a common use case)

When deleting a large range of elements

~30%

The GC work of "collecting" obsolete objects when they will become unreachable is expected to be very small, and is not specifically attributable to slices.Delete (just normal GC work).

Pointers vs. Values

The proposal is mostly useful for slices whose elements are pointers, or contain pointers.

Should we try to keep the existing behaviour for slices of value elements (e.g. int, or a struct of values)?

Probably not, for 2 reasons:

  • Principle of least astonishment: if we zero the pointers, we should zero the values as well.
  • It is not possible to express in Go generic code “do this operation, but only if the type parameter E is a pointer or contains pointer”

List ownership

We could argue that any use of a subslice s[i:j] always leaves elements not set to zero, before i and after j.

If slicing doesn’t do any zeroing, should Delete have to do it?

One major difference is that it makes sense to have various parts of a program use various slices (“views”, “windows”) on the same backing array. The parts need to be careful, as changes made by one may affect the other parts. Using append in such a context is subtle, with reallocation happening transparently. The ownership and responsibility are shared. An arbitrary number of agents only reading and slicing is always fine, not mutating the backing array contents.

Delete is different because it is destructive by nature. No part of the program is supposed to make sense of “a slice where some elements were shifted, and some obsolete elements were left in place” by some other code. Only the slice returned by Delete makes sense, and any other slice of the same backing array can be regarded as obsolete. The caller of Delete gets de facto the ownership of the backing array.

What other languages do

Java

  • ArrayList.remove() nulls out the tail element: src
  • ArrayList.removeRange() nulls out the tail elements: src

C#

  • List<T>.RemoveAt() nulls out the tail element src
  • List<T>.RemoveRange() nulls out the tail elements: src

C++

  • Vector.erase() does not null out the tail element: demo

Edit: precision about a[len(a):cap(a)]

It is out of scope of the current proposal to write anything to the original tail (the elements v17, v18, v19 in the diagram).

We must not assume ownership of the elements beyond the original length, and we must not zero them out. They might be used for legit purpose by other parts of the program, in theory.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions