Sunday, January 12, 2020

Learning Rust - Day 5 - Common Collections and Error Handling

This entry captures the highlights of reading through chapter 8 (Common Collections) and chapter 9 (Error Handling) of The Book. It marks my fourth entry in the Learning Rust journey. For more entries like this, see the posts with learning rust label.

I hang out in beginners section of Rust channel on Discord, and every now and then I see folks come around and say something like "I have been fighting with the borrow checker". I think that phrase came to life for me in this learning session. The last section of chapter 8 has 3 exercises, and while trying to work on those exercises, I found myself in situation where it feels like I am wrestling with the borrow checker! - I know, I know, the borrow checker is just trying to save me from my own stupidity - but right now it feel more like a struggle! 😂 And I guess these early encounters with the borrow checker signifies the beginning of my initiation rights towards becoming a Rusteacean?

Anyways, in these post, as I do, I pen down some of the things that stood out for me.

First thing was that + has a wired API when used for concatenating strings. It requires the second operand to be a borrow, while basically making the first string unusable after concatenation.

let hello = String::from("Hello");
let world = String::from(" world");
let hello_world = hello + &world; // world has to be a borrow
println!("{}", hello_world);
println!("{}", hello); // hello can't be used anymore due to move

The + has to be the most natural API for concatenating strings in previous languages I have worked with. Not sure the explanation for this unique incarnation in Rust. It seems, in other to get the more familiar behaviour, the format! macros should be used, and that basically provides string interpolation functionality:

let hello = String::from("Hello");
let world = String::from(" world");
let hello_world = format!("{}-{}",hello, world);
println!("{}", hello_world);
println!("{}", hello); // still usable
println!("{}", world); // still usable

I also learnt that strings in Rust does not support indexing. Which means one cannot access an individual characters in a string by referencing it by index. That is, this would lead to compile time error:

let hello = String::from("Hello world");
println!("{}", hello[0])

This is due to the fact that Rust stores string as an array (vector) of bytes, plus it supports UTF-8 by default. This means that a character could actually be represented by more than 1 byte, and in such a case, indexing that only returns a byte would make no sense. As the example that was given in the book shows:

print!("{}", String::from("Здравствуйте").len());

The above might look like being composed of 12 characters, and hence be represented by Rust with a vector with 12 slots. In reality this is not the case. The above is backed by a 24 long vector. Hence providing a facility for allowing indexing into a string to return a character does not makes sense.

What Rust provides, instead is ability to retrieve slices from a String. For example:

println!("{}", &hello[0..1])

And also the ability to turn a String into a unicode scalar values, or bytes, which can then be iterated over.

Also learnt Rust has a thing called deref coercion although this was not covered in this chapters. It would be in chapter 15.

While reading through the section on Hash Map, the or_insert caught my attention. The form it appeared in the book is similar to the following:

let mut map: HashMap<&str, i32> = HashMap::new();
let r = map.entry("1").or_insert(0);
*r += 10;
*r += 10;
println!("{}", map.get("1").unwrap()) // prints 10

I do not know enough Rust to know what is idiomatic or not, but the I find the UX around the API a little odd. The fact that an operation that inserts a value to a key, when that key is absent, also returns an handle to the value which can be used to mutate the value, feels overloaded. Also, one can assume if map.entry("1").or_insert(0) returns the value which can be later mutated, then map.entry("1") should also do the same, with the step of trying to create it if it is absent removed. But no, the API does not have that level of symmetry.

Also, I know mutation in Rust is safe because of the borrow checker, I am still not sure if that is a license to use mutation over the place. It might be easier for the compiler to ensure that mutation does not lead to bugs, not sure if that level of mutation would also ease a human in understanding the logic in the code.  I personally still have preference for the Functional programming approach of treating everything as immutable values, and deal with effects instead of mutation and side effects, as I think it also add to the simplicity and elegance of the code. Fingers crossed, and mind always open, as I learn more and use Rust, I get to understand its idiomatic use and philosophy. 

While going through the chapters for this study session, I had to revisit some of the information provided in Chapter 4 regarding ownership, slice types etc. I realised there are some cogent points I did not include in my Learning Rust - Day 2 - Getting Acquainted with Ownership post, about ownership/borrowing etc, so I am including them here. It is just some laws that governs how ownership et al. works. The items in bold are a verbatim copy from the Rust book.

  • Each value in Rust has a variable that’s called its owner.
  • This variable can either be a stack variable, that is, it points to data that resides in the stack or a heap variable, that is, it points to data on the heap.
  • When it points to the heap, then you can see the variable as containing the pointer, length, capacity etc: basically things you can see as ownership related data
  • There can only be one owner at a time.
  • When the owner goes out of scope, the value will be dropped.
  • When passing data on the heap to functions, instead of making a copy of the data, to pass the function, a pointer to the location of the data is passed instead. This is a reference. This act is called borrowing.
  • At any given time, you can have either one mutable reference or any number of immutable references. 


Turning to the Error handling chapter, I learnt about the panic = 'abort' that can be added to the Cargo.toml file. It has the effect of switching off the memory clean up Rust does when panic occurs. The memory would then have to be freed up by the operating system. Doing this has the added effect of reducing the size of the generated Rust binary. And by the way, it seems panic, is Rust lingo for exception.

I also learnt about the RUST_BACKTRACE environment variable that can be used to list the full path and actual portion of the user code that triggered the panic instead. You can see this as just having proper stack trace. interesting enough, this week I learnt this would no longer be needed, as recent changes to the Rust compiler now makes the panic messages to now pointing to the location where they were called, rather than core's internals 

Up until now, I never encountered the return keyword. But I now know that Rust has a return keyword, and that it is mostly use for explicit signifying of a return out of a function, especially where there could be multiple return path. Its usage is closely related to the elvis operator: ? which seems to embody the spirit of the do notation in languages like Haskell and PureScript. So far, I have only seen how to use it with Result type, but not the Option type.

As I mentioned in the beginning I had to fight with the borrow checker while implementing the exercises listed at the end of chapter 8. But what I found astonishing was amount of these errors were lifetime related!

Here is a list of some of the errors I remembered to capture:
  • these two types are declared with different lifetimes...
  • ^^^ borrowed value does not live long enough
  • ^ expected lifetime parameter
  • help: consider giving it a 'static lifetime: `&'static`
I guess I would be learning a ton when I start with the next chapter, which is about Generic Types, Traits... and you guess it...Lifetimes!!! I can't wait!

No comments: