Sunday, March 01, 2020

Learning Rust - Day 10 - Smart Pointers

This is the 10th entry of my learning Rust journal...

It captures some of the learning points while going through chapter 15 of the Rust Book. You can read other posts in this series by following the label learning rust.

I enjoyed reading this chapter. I found it particularly interesting because it was about concepts I usually do not need to think about when working with the other programming languages I have used before now. Apart from that, it also allowed me to invalidate some wrong assumptions I had picked up along the way and also consolidates some of the concepts I have been learning.

One of the assumptions I had, which I found out was wrong while going through this chapter has to do with Stack vs Heap. For some strange reason, I had thought that Structs and Enums are always on the Heap. I suspect my familiarity with Java is to blame for this wrong assumption, since if you squint hard enough a struct looks like an Object in Java and usage of new keyword always means allocating memory on the heap.  But this is not the case in Rust. A struct or an enum does not automatically mean heap memory allocation.



An outcome of this re-evaluation of what goes on the stack or heap is the realisation that even though, it might be convenient to talk about the stack and the heap as two distinct locations or distinct kinds of memory, this is actually not the case. Stack and heap are the same in the sense they are addressable memory that can be used to store values. The only difference is that, with the stack, allocating, copying and de-allocating is done in tandem with the stack/stack frame lifecycle.  With heap  this allocates/copy/deallocate would have to be handled somehow else. In a language like Java, the GC takes care of such.

I also took the time, to go over the ownership rules in Rust. I captured my summary of this in another post: Rust Ownership Rules.

This chapter on Smart pointers made me revisit the post on ownership in Rust: Learning Rust - Day 2 - Getting Acquainted with Ownership and it helped me to further ground my understanding of pointers in general.  The mental model I now hold of the idea can be described as follows:

First of all, the very first thing one learns when learning how to program is the idea of variables and the fact that they hold values. This means we can retrieve the values from variables. Another thing that can be retrieved from variables is the memory location of the variable it contains. This might not be a thing when working with languages with a managed environment.

This retrieval of the memory location, in Rust, is done by using the &. Hence if the memory address needs to be retrieved from a variable, the variable is prepended with a &. If not, the value would be retrieved.

You now have a pointer, if this memory addressed is put into another variable. Hence a pointer is a variable that holds the memory address of a value. In Rust, we have references, both mutable and immutable. It can then be seen that a reference is a pointer that borrows the value it points to.

Basically, having &var gets a shared reference, which is then stored in a variable. The variable is the pointer. Also having &mut var gets a mutable reference, which can then be stored in a variable. Again, such a variable is a pointer.

If one then has such a variable that holds a memory address ie a pointer, the actual value that the memory points to can still be retrieved. This is done by dereferencing the pointer. So given a var_pointer which was populated by assigning the result of &var to it, then the original value in var can be accessed via *var_pointer. This is called dereferencing.

Smart pointers can be seen as another form of pointers (apart from reference). One of the main distinguishing difference between a reference and smart pointers is the fact that smart pointers usually have ownership of values. They also have metadata (such as their capacity)
and extra guarantees (such as String ensuring its valid UTF-8). References do not have ownership and also do not possess extra metadata and guarantees.

I suspect that this mental model is not 100% accurate as I later found out that a custom smart pointer can be implemented via a struct that implements the Deref trait.  I still do not have clarity on this, but would update once I do.

Some of the key points I picked up while reading this chapter includes:

  •  Box<T> is a smart pointer that ensures values are placed on the Heap
  • A smart pointer like Box<T> is mostly useful when dealing with values whose capacity cannot be determined at compile time. A user-space example for this is recursively defined data types. 
  • A custom pointer can be implemented with a struct. Such struct needs to implement Deref trait in other to support the dereferencing mechanism.
  • Deref coercion is a convenience that Rust performs on arguments to functions and methods. It allows recursive dereferencing to be applied until the value that matches arguments to a function/method is found.
  • The Drop trait, when implemented for a smart pointer, allows for greater control of the memory deallocation processes. The code implemented would be called whenever the pointer that points to the memory location is to go out of scope.
  • The method defined in the Drop trait cannot be called directly. Hence, if we need to force a value to be cleaned up early, the std::mem::drop function is used.
  • The Rc<T> smart pointer allows for multiple ownership of a value. Think of a node in a graph that can be owned by multiple edges.
  • The RefCell<T> smart pointer helps in implementing the Interior Mutability Pattern, which allows the implementation of situations where it would be useful for a value to mutate itself in its methods but appears immutable to other code.

Code example for creating a custom smart pointer

use std::fmt::Display;
use std::ops::Deref;

fn main() {

  // smart pointer via struct    
  struct MyBox<T: Display>(T);

  impl<T: Display> MyBox<T> {
      fn new(x: T) -> MyBox<T> {
          MyBox(x)
      }
  }

  // implementing Deref
  impl<T: Display> Deref for MyBox<T> {
      type Target = T;

      fn deref(&self) -> &T {
         &self.0        
}
  }

  // implementing Drop    
  impl<T: Display> Drop for MyBox<T> {
    fn drop(&mut self) {
       println!("Dropping MyBox with data `{}`!", self.0);
    }
  }

  // create a pointer that is based    
  // on the custom smart pointer    
  let y = MyBox::new(5);
  // Dereference    
  // the smart pointer    
  assert_eq!(5, *y);
}

Code example for showing Deref coercion

Given the code above that defines a custom smart pointer, we can have

// A method defined to take a str reference
fn hello(name: &str) {
   println!("Hello, {}!", name);
}

// but can be called with our smart pointer
// thanks to Deref coercion
let m = MyBox::new(String::from("Rust"));
hello(&m);

Code example listing different kinds of pointers


let value = 5;

// a pointer based on reference
let pointer_1 = &value;

// a pointer based on Box, which is a smart pointer
let pointer_2 = Box::new(5);

// a pointer based on RC, which is a smart pointer
use std::rc::Rc;
let pointer_3 = Rc::new(5);

use std::cell::RefCell;
let pointer_4 = RefCell::new(5);

// a pointer based of custom smart pointer
let pointer_4 = MyBox::new(5);

// with a function defined:fn hello_deref(name: &i32) {
    println!("Hello, {}!", name);
}

// can be used with all pointers
// thanks to deref coercion
hello_deref(pointer_1);
hello_deref(&pointer_2);
hello_deref(&pointer_3);
hello_deref(&pointer_4);

No comments: