Heap Allocations
Heap allocations are moderately expensive. The exact details depend on which allocator is in use, but each allocation (and deallocation) typically involves acquiring a global lock, doing some non-trivial data structure manipulation, and possibly executing a system call. Small allocations are not necessarily cheaper than large allocations. It is worth understanding which Rust data structures and operations cause allocations, because avoiding them can greatly improve performance.
The Rust Container Cheat Sheet has visualizations of common Rust types, and is an excellent companion to the following sections.
Profiling
If a general-purpose profiler shows malloc, free, and related functions as
hot, then it is likely worth trying to reduce the allocation rate and/or using
an alternative allocator.
DHAT is an excellent profiler to use when reducing allocation rates. It works on Linux and some other Unixes. It precisely identifies hot allocation sites and their allocation rates. Exact results will vary, but experience with rustc has shown that reducing allocation rates by 10 allocations per million instructions executed can have measurable performance improvements (e.g. ~1%).
Here is some example output from DHAT.
AP 1.1/25 (2 children) {
Total: 54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
Reads: 15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
Writes: 20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
Allocated at {
#1: 0x95CACC9: alloc (alloc.rs:72)
#2: 0x95CACC9: alloc (alloc.rs:148)
#3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
#4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
#5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
#6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
#7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
#8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
}
}
It is beyond the scope of this book to describe everything in this example, but it should be clear that DHAT gives a wealth of information about allocations, such as where and how often they happen, how big they are, how long they live for, and how often they are accessed.
Box
Box is the simplest heap-allocated type. A Box<T> value is a T value
that is allocated on the heap.
It is sometimes worth boxing one or more fields in a struct or enum fields to make a type smaller. (See the Type Sizes chapter for more about this.)
Other than that, Box is straightforward and does not offer much scope for
optimizations.
Rc/Arc
Rc/Arc are similar to Box, but the value on the heap is accompanied by
two reference counts. They allow value sharing, which can be an effective way
to reduce memory usage.
However, if used for values that are rarely shared, they can increase allocation rates by heap allocating values that might otherwise not be heap-allocated. Example.
Unlike Box, calling clone on an Rc/Arc value does not involve an
allocation. Instead, it merely increments a reference count.
Vec
Vec is a heap-allocated type with a great deal of scope for optimizing the
number of allocations, and/or minimizing the amount of wasted space. To do this
requires understanding how its elements are stored.
A Vec contains three words: a length, a capacity, and a pointer. The pointer
will point to heap-allocated memory if the capacity is nonzero and the element
size is nonzero; otherwise, it will not point to allocated memory.
Even if the Vec itself is not heap-allocated, the elements (if present and
nonzero-sized) always will be. If nonzero-sized elements are present, the
memory holding those elements may be larger than necessary, providing space for
additional future elements. The number of elements present is the length, and
the number of elements that could be held without reallocating is the capacity.
When the vector needs to grow beyond its current capacity, the elements will be copied into a larger heap allocation, and the old heap allocation will be freed.
Vec Growth
A new, empty Vec created by the common means
(vec![]
or Vec::new or Vec::default) has a length and capacity of zero, and no
heap allocation is required. If you repeatedly push individual elements onto
the end of the Vec, it will periodically reallocate. The growth strategy is
not specified, but at the time of writing it uses a quasi-doubling strategy
resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It
skips directly from 0 to 4, instead of going via 1 and 2, because this avoids
many allocations in practice.) As a vector grows, the frequency of
reallocations will decrease exponentially, but the amount of possibly-wasted
excess capacity will increase exponentially.
This growth strategy is typical for growable data structures and reasonable in
the general case, but if you know in advance the likely length of a vector you
can often do better. If you have a hot vector allocation site (e.g. a hot
Vec::push call), it is worth using eprintln! to print the vector length
at that site and then doing some post-processing (e.g. with counts) to
determine the length distribution. For example, you might have many short
vectors, or you might have a smaller number of very long vectors, and the best
way to optimize the allocation site will vary accordingly.
Short Vecs
If you have many short vectors, you can use the SmallVec type from the
smallvec crate. SmallVec<[T; N]> is a drop-in replacement for Vec that
can store N elements within the SmallVec itself, and then switches to a
heap allocation if the number of elements exceeds that. (Note also that
vec![] literals must be replaced with smallvec![] literals.)
Example 1,
Example 2.
SmallVec reliably reduces the allocation rate when used appropriately, but
its use does not guarantee improved performance. It is slightly slower than
Vec for normal operations because it must always check if the elements are
heap-allocated or not. Also, If N is high or T is large, then the
SmallVec<[T; N]> itself can be larger than Vec<T>, and copying of
SmallVec values will be slower. As always, benchmarking is required to
confirm that an optimization is effective.
If you have many short vectors and you precisely know their maximum length,
ArrayVec from the arrayvec crate is a better choice than SmallVec. It
does not require the fallback to heap allocation, which makes it a little
faster.
Example.
Longer Vecs
If you know the minimum or exact size of a vector, you can reserve a specific
capacity with Vec::with_capacity, Vec::reserve, or
Vec::reserve_exact. For example, if you know a vector will grow to have at
least 20 elements, these functions can immediately provide a vector with a
capacity of at least 20 using a single allocation, whereas pushing the items
one at a time would result in four allocations (for capacities of 4, 8, 16, and
32).
Example.
If you know the maximum length of a vector, the above functions also let you
not allocate excess space unnecessarily. Similarly, Vec::shrink_to_fit can be
used to minimize wasted space, but note that it may cause a reallocation.
String
A String contains heap-allocated bytes. The representation and operation of
String are very similar to that of Vec<u8>. Many Vec methods relating to
growth and capacity have equivalents for String, such as
String::with_capacity.
The SmallString type from the smallstr crate is similar to the SmallVec
type.
The String type from the smartstring crate is a drop-in replacement for
String that avoids heap allocations for strings with less than three words’
worth of characters. On 64-bit platforms, this is any string that is less than
24 bytes, which includes all strings containing 23 or fewer ASCII characters.
Example.
Note that the format! macro produces a String, which means it performs an
allocation. If you can avoid a format! call by using a string literal, that
will avoid this allocation.
Example.
std::format_args and/or the lazy_format crate may help with this.
Hash Tables
HashSet and HashMap are hash tables. Their representation and
operations are similar to those of Vec, in terms of allocations: they have
a single contiguous heap allocation, holding keys and values, which is
reallocated as necessary as the table grows. Many Vec methods relating to
growth and capacity have equivalents for HashSet/HashMap, such as
HashSet::with_capacity.
clone
Calling clone on a value that contains heap-allocated memory typically
involves additional allocations. For example, calling clone on a non-empty
Vec requires a new allocation for the elements (but note that the capacity of
the new Vec might not be the same as the capacity of the original Vec). The
exception is Rc/Arc, where a clone call just increments the reference
count.
clone_from is an alternative to clone. a.clone_from(&b) is equivalent
to a = b.clone() but may avoid unnecessary allocations. For example, if you
want to clone one Vec over the top of an existing Vec, the existing Vec’s
heap allocation will be reused if possible, as the following example shows.
#![allow(unused)] fn main() { let mut v1: Vec<u32> = Vec::with_capacity(99); let v2: Vec<u32> = vec![1, 2, 3]; v1.clone_from(&v2); // v1's allocation is reused assert_eq!(v1.capacity(), 99); }
Although clone usually causes allocations, it is a reasonable thing to use in
many circumstances and can often make code simpler. Use profiling data to see
which clone calls are hot and worth taking the effort to avoid.
Sometimes Rust code ends up containing unnecessary clone calls, due to (a)
programmer error, or (b) changes in the code that render previously-necessary
clone calls unnecessary. If you see a hot clone call that does not seem
necessary, sometimes it can simply be removed.
Example 1,
Example 2,
Example 3.
to_owned
ToOwned::to_owned is implemented for many common types. It creates owned
data from borrowed data, usually by cloning, and therefore often causes heap
allocations. For example, it can be used to create a String from a &str.
Sometimes to_owned calls (and related calls such as clone and to_string)
can be avoided by storing a reference to borrowed data in a struct rather than
an owned copy. This requires lifetime annotations on the struct, complicating
the code, and should only be done when profiling and benchmarking shows that it
is worthwhile.
Example.
Cow
Sometimes code deals with a mixture of borrowed and owned data. Imagine a
vector of error messages, some of which are static string literals and some of
which are constructed with format!. The obvious representation is
Vec<String>, as the following example shows.
#![allow(unused)] fn main() { let mut errors: Vec<String> = vec![]; errors.push("something went wrong".to_string()); errors.push(format!("something went wrong on line {}", 100)); }
That requires a to_string call to promote the static string literal to a
String, which incurs an allocation.
Instead you can use the Cow type, which can hold either borrowed or owned
data. A borrowed value x is wrapped with Cow::Borrowed(x), and an owned
value y is wrapped with Cow::Owned(y). Cow also implements the From<T>
trait for various string, slice, and path types, so you can usually use into
as well. (Or Cow::from, which is longer but results in more readable code,
because it makes the type clearer.) The following example puts all this together.
#![allow(unused)] fn main() { use std::borrow::Cow; let mut errors: Vec<Cow<'static, str>> = vec![]; errors.push(Cow::Borrowed("something went wrong")); errors.push(Cow::Owned(format!("something went wrong on line {}", 100))); errors.push(Cow::from("something else went wrong")); errors.push(format!("something else went wrong on line {}", 101).into()); }
errors now holds a mixture of borrowed and owned data without requiring any
extra allocations. This example involves &str/String, but other pairings
such as &[T]/Vec<T> and &Path/PathBuf are also possible.
All of the above applies if the data is immutable. But Cow also allows
borrowed data to be promoted to owned data if it needs to be mutated.
Cow::to_mut will obtain a mutable reference to an owned value, cloning if
necessary. This is called “clone-on-write”, which is where the name Cow comes
from.
This clone-on-write behaviour is useful when you have some borrowed data, such
as a &str, that is mostly read-only but occasionally needs to be modified.
Finally, because Cow implements Deref, you can call methods directly on
the data it encloses.
Cow can be fiddly to get working, but it is often worth the effort.
Reusing Collections
Sometimes you need to build up a collection such as a Vec in stages. It is
usually better to do this by modifying a single Vec than by building multiple
Vecs and then combining them.
For example, if you have a function do_stuff that produces a Vec that might
be called multiple times:
#![allow(unused)] fn main() { fn do_stuff(x: u32, y: u32) -> Vec<u32> { vec![x, y] } }
It might be better to instead modify a passed-in Vec:
#![allow(unused)] fn main() { fn do_stuff(x: u32, y: u32, vec: &mut Vec<u32>) { vec.push(x); vec.push(y); } }
Sometimes it is worth keeping around a “workhorse” collection that can be
reused. For example, if a Vec is needed for each iteration of a loop, you
could declare the Vec outside the loop, use it within the loop body, and then
call clear at the end of the loop body (to empty the Vec without affecting
its capacity). This avoids allocations at the cost of obscuring the fact that
each iteration’s usage of the Vec is unrelated to the others.
Example 1,
Example 2.
Similarly, it is sometimes worth keeping a workhorse collection within a struct, to be reused in one or more methods that are called repeatedly.
Reading Lines from a File
BufRead::lines makes it easy to read a file one line at a time:
#![allow(unused)] fn main() { fn blah() -> Result<(), std::io::Error> { fn process(_: &str) {} use std::io::{self, BufRead}; let mut lock = io::stdin().lock(); for line in lock.lines() { process(&line?); } Ok(()) } }
But the iterator it produces returns io::Result<String>, which means it
allocates for every line in the file.
An alternative is to use a workhorse String in a loop over
BufRead::read_line:
#![allow(unused)] fn main() { fn blah() -> Result<(), std::io::Error> { fn process(_: &str) {} use std::io::{self, BufRead}; let mut lock = io::stdin().lock(); let mut line = String::new(); while lock.read_line(&mut line)? != 0 { process(&line); line.clear(); } Ok(()) } }
This reduces the number of allocations to at most a handful, and possibly just
one. (The exact number depends on how many times line needs to be
reallocated, which depends on the distribution of line lengths in the file.)
This will only work if the loop body can operate on a &str, rather than a
String.
Using an Alternative Allocator
It is also possible to improve heap allocation performance without changing your code, simply by using a different allocator. See the Alternative Allocators section for details.
Avoiding Regressions
To ensure the number and/or size of allocations done by your code doesn’t increase unintentionally, you can use the heap usage testing feature of dhat-rs to write tests that check particular code snippets allocate the expected amount of heap memory.