Storing the same data in multiple containers

2021-04-10 by Rahul Singh

Here is a problem I'm sure we've all seen and solved easily before: How do you store a piece of data in multiple containers? An LRUCache is one real world example where this situation comes up. In an LRUCache, the cache key is stored in a map for constant time lookups by key, but also stored in a linked list so its easy to evict cache entries that are Least Recently Used (LRU) once the cache capacity is exceeded.

Storing a single piece of data (the cache key) in multiple containers (the map and the linkedlist) seems trivial, but there is more here than meets the eye when working with Rust. Lets take a concrete example with a data structure we'll call a DualIndexMap. A DualIndexMap is a simple data structure that indexes a key, value pair by both the key and the value. So we have to store the key (and the value) in both maps.

Its easy in a Garbage Collected Language

In a Garbage collected language like Go or Java its easy to build this data structure. Here is the Go version:

Hide CodeGithub
type Key struct {
    k int
}

type Val struct {
    v int
}

type DualIndexMap struct {
    kvMap map[Key]Val
    vkMap map[Val]Key
}

func NewDualIndexMap() *DualIndexMap {
    return &DualIndexMap {
        kvMap: make(map[Key]Val),
        vkMap: make(map[Val]Key),
    }
}

func (m *DualIndexMap) Put(key Key, val Val) {
    m.kvMap[key] = val
    m.vkMap[val] = key
}

func (m *DualIndexMap) GetByKey(key Key) Val {
    return m.kvMap[key]
}

func (m *DualIndexMap) GetByVal(val Val) Key {
    return m.vkMap[val]
}

So what happens if we try the same thing in Rust? As it turns out, it isn't as straightforward.

Hide CodeGithub
pub struct DualIndexMap<K, V> {
    kv_map: HashMap<K, V>,
    vk_map: HashMap<V, K>,
}

impl<K: Hash + Eq, V: Hash + Eq> DualIndexMap<K, V> {
    pub fn put(&mut self, key: K, val: V) {
        self.kv_map.insert(key, val);
        self.vk_map.insert(val, key);
    }
}

error[E0382]: use of moved value: `val`
--> src/maps/dualindexmap.rs:19:28
|
17 |     pub fn put(&mut self, key: K, val: V)
| --- move occurs because `val` has type `V`, which does not implement the `Copy` trait
18 |         self.kv_map.insert(key, val);
|                                    --- value moved here
19 |         self.vk_map.insert(val, key);
|                               ^^^ value used here after move

We get a compile error. The reason this doesn't work is that Rust enforces one very important principle: Strong ownership. In rust, a piece of data can only have one owner. This principle is violated in the above example because both maps are trying to take ownership of the same piece of data.

Here are a few ways to solve this problem:

1. Make a copy of the data with the Copy Trait

We can add trait bounds that constrain the keys and values our data structure will accept.

Hide CodeGithub

//K and V must implement the Copy trait
impl<K: Copy + Hash + Eq, V: Copy + Hash + Eq> DualIndexMap<K, V> {
    pub fn put(&mut self, key: K, val: V) {
        //Key and Value are moved into the kv_map
        self.kv_map.insert(key, val);
        //Key and Value are now copied into vk_map
        self.vk_map.insert(val, key);
    }
}

Here we're saying that key and value must implement the Copy trait (in addition to Hash and Eq). What this means is that when we try to hand over the key to the second map, it is automatically copied, creating a second independent piece of data (with its own memory allocated on the heap) for the vk_map to own. While this works it isn't a great solution if your key or value are large or you're dealing with lots of them. Its making two copies of each pair and thats clearly suboptimal.

2. Make a copy of the data with the Clone Trait

This is similar to the previous solution because it copies the data, but subtly different in that the copy is explicit because we constrain the key and value to implement the Clone trait, which in turn means that we have to call clone to do it.

Hide CodeGithub

//K and V must implement the Clone trait
impl<K: Clone + Hash + Eq, V: Clone + Hash + Eq> DualIndexMap<K, V> {
    pub fn put(&mut self, key: K, val: V) {
        //Key and Value are explicitly copied into kv_map using clone()
        self.kv_map.insert(key.clone(), val.clone());
        //Key and Value are moved into the vk_map
        self.vk_map.insert(val, key);
    }
}

Another subtle detail: Its important to first clone the key and value and hand those to the kv_map and before handing the original values to the vk_map. If you hand the arguments passed into the function to the first map, you won't be able to clone them because they will already have been moved out and hence unavailable to clone.

Both these solutions have two notable drawbacks: a) they make copies of the data and b) clients of your map cannot use your data structure for types that don't implement Copy or Clone. These two issues make it inefficient at best and impossible at worst to use in production code. The recommended approach is to use a smart pointer.

3. Use a Reference Counting Smart Pointer

The efficient way to solve this problem is to use a reference counting smart pointer. The Rust standard library provides Rc that does exactly what we need.

Hide CodeGithub

impl<K: Hash + Eq, V: Hash + Eq> DualIndexMap<K, V> {
    pub fn put(&mut self, key: K, val: V) {
        //Wrap the key and value in an Rc
        let k = Rc::new(key);
        let v = Rc::new(val);
        // Clone the Rc to increment the Ref count
        self.kv_map.insert(Rc::clone(&k), Rc::clone(&v));
        self.vk_map.insert(v, k);
    }
}

Here we give the key and the value to Rc which now owns it. Under the covers Rc allocates memory on the heap and stores the data there. Rc then maintains a reference count that is incremented when you clone it and decremented when you Drop it.

In our example above we first create new ref counted smart pointers - one for the key and one for the value - and then hand over a clone to the first map. The clone simply increments the ref count and doesn't actually copy any memory.

We then hand over the the original pointers to the second map. This approach is efficent because there exists only a single copy of key and value and the ownership rules aren't violated. Rc owns the data and clones of Rc all point to the same piece of memory. Each of the hashmaps then owns a copy of the Rc each of which points to the same piece of memory.

So anytime you encounter a situation where you need to store the same piece of data with multiple owners, use an Rc. However, note that this only works in single threaded environments. If you're working with multiple threads then take a look at Arc.

Fast Algorithms & Data Structures in Rust

privacy about

Copyright (c) 2021, Rahul Singh
Opinions expressed are solely my own and do not express the views or opinions of my employer