Saturday, August 13, 2022

IntoIterator and the for ... in Syntax in Rust

In Rust Iterator pattern with iter(), into_iter() and iter_mut() methods I explained why attempting to use a variable holding a Vec after iterating through it using the for … in syntax leads to a compilation error.

The post explains why the following code won't compile:

fn main() {
   let some_ints = vec![1,2,3,4,5];
  // iterating through a vec
   for i in some_ints {
       dbg!(i);
   }
// attempting to use the vec will 
// lead to compile error after the iteration
   dbg!(some_ints);
}

I then showed 3 methods that can be called before iterating using the for … in and how 2 of these methods allow the Vec to still be used even after iteration.

These 3 methods are into_iter(), iter(), and iter_mut(). That is:

#[test]
fn into_iter_demo() {
    // the .into_iter() method creates an iterator, v1_iter 
    // which takes ownership of the values being iterated.
    let mut v1_iter = v1.into_iter();

    assert_eq!(v1_iter.next(), Some(1));
    assert_eq!(v1_iter.next(), Some(2));
    assert_eq!(v1_iter.next(), Some(3));
    assert_eq!(v1_iter.next(), None);

    // If the line below is uncommented, the code won't compile anymore
    // this is because, after the iteration, v1 can no longer be used 
    // since the iteration moved ownership
    //dbg!(v1);
}

The two other methods that allow the Vec to still be used after iteration via for … in are:

#[test]
fn iter_demo() {
    let v1 = vec![1, 2, 3];
    // the .iter() method creates an iterator, 
    // v1_iter which borrows value immutably 
    let mut v1_iter = v1.iter();

    // iter() returns an iterator of slices.
    assert_eq!(v1_iter.next(), Some(&1));
    assert_eq!(v1_iter.next(), Some(&2));
    assert_eq!(v1_iter.next(), Some(&3));
    assert_eq!(v1_iter.next(), None);
   // because values were borrowed immutably, 
   // it is still possible to use 
   // the vec after iteration is done
    dbg!(v1);
}

And

#[test]
fn iter_mut_demo() {
    let mut v1 = vec![1, 2, 3];

    // the .iter_mut() method creates an iterator, 
    // v1_iter which borrows value and can mutate it. 
    let mut v1_iter = v1.iter_mut();

    // access the first item and multiple it by 2
    let item1 = v1_iter.next().unwrap();
    *item1 = *item1 * 2;

    // access the second item and multiple it by 2
    let item2 = v1_iter.next().unwrap();
    *item2 = *item2 * 2;

    // access the third item and multiple it by 2
    let item3 = v1_iter.next().unwrap();
    *item3 = *item3 * 2;

    // end of the iteration
    assert_eq!(v1_iter.next(), None);

    // this will print out [2,4,6]
    dbg!(v1);
}

In this post, we are going to dive a little bit deeper into understanding some of the machinery that makes the above work.

We start again by talking about the Iterator trait.

Iterator pattern and Iterator trait.

An Iterator represents the ability to retrieve elements from another data structure in sequence. In rust, it is any data structure that implements the Iterator trait.

It is important to note that the Vec data structure by itself is not an Iterator and hence cannot be iterated.

To make this more obvious, let us forget about the for … in syntax for a second, and try to perform an operation that should be possible on a data structure that supports being iterated.

An example of such an operation is the for_each method.

In the code example below, we attempt to loop directly over a Vec of numbers using for_each and print each item. The code won't compile:

fn main() {
   let some_ints = vec![1,2,3];
   // calling for_each directly on a Vec won't compile
   some_ints.for_each(|item| {
       dbg!(item);
   });
}

The compile error below gives us a clue as to why the code does not compile:

error[E0599]: `Vec<{integer}>` is not an iterator
  --> src/main.rs:60:14
   |
60 |      some_ints.for_each(|item| {
   |                ^^^^^^^^ `Vec<{integer}>` is not an iterator; try calling `.into_iter()` or `.iter()`
   |
   = note: the following trait bounds were not satisfied:
           `Vec<{integer}>: Iterator`
           which is required by `&mut Vec<{integer}>: Iterator`
           `[{integer}]: Iterator`
           which is required by `&mut [{integer}]: Iterator`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `playground` due to a previous error

The compile error contains the line:

Vec<{integer}> is not an iterator; try calling .into_iter() or .iter()

Proving the point that a data structure like Vec by itself is not an iterator.

But instead of using the for_each method, we can actually perform an iteration directly on the Vec using the for … in syntax.

For example, the following code compile and runs as expected:

fn main() {
   let some_ints = vec![1,2,3];
   for item in some_ints {
       dbg!(item);
   }
}

What gives?!

Did we not just prove that a Vec is not an Iterator on itself?

We even showed this by trying to call a method that should work on an iterator and confirm it fails. But here we are still being able to iterate over something that should not be an Iterator using the for … in syntax.

How is that possible?

To understand why this works, we need to look into another trait called IntoIterator.

What is the IntoIterator trait?

The IntoIterator is a trait that specifies how a data structure can be converted into an Iterator. The basic structure of the trait looks like this:

pub trait IntoIterator {
    type Item;
    type IntoIter: Iterator
    where
        <Self::IntoIter as Iterator>::Item == Self::Item;

    fn into_iter(self) -> Self::IntoIter;
}

As can be seen, the main method specified by the trait is into_iter(). The result of calling it with is an Iterator. That is, Self::IntoIter the return type, is of type Iterator, given it is an associated type defined to be an iterator in the body of the trait. This is what the line type IntoIter: Iterator above means.

So any data structure, that by itself is not an iterator, can define how it can be transformed into an iterator by implementing the IntoIterator.

The Vec data structure defined in the Rust standard library implements the IntoIterator, which means it has the method into_iter() which when called returns an Iterator.

To see this in action, let's go back to the code that did not compile, where we directly called for_each on a Vec, but instead of calling for_each directly, we first call into_iter(), before calling for_each.

fn main() {
   let some_ints = vec![1,2,3];
   // first calling into_iter() works
   some_ints.into_iter().for_each(|item| {
       dbg!(item);
   });
}

This works, because the first call to into_iter() returns an Iterator, which allows iterating over the underlying Vec.

So how does this help answer why it is possible to use for … in directly on a Vec without first turning it into an Iterator by calling into_iter?

Well, the answer is that when the for … in syntax is used, the compiler automagically first calls into_iter(), getting an Iterator back and using that to do the iteration.

According to the documentation, the for … in syntax actually desugars to something like this:

let values = vec![1, 2, 3, 4, 5];
{
    let result = match IntoIterator::into_iter(values) {
        mut iter => loop {
            let next;
            match iter.next() {
                Some(val) => next = val,
                None => break,
            };
            let x = next;
            let () = { println!("{x}"); };
        },
    };
    result
}

Where the into_iter is first called on the Vec value, and then the iteration is continuously done, calling next(), until None is reached, signifying the end of the iteration.

So the into_iter method, which is part of the IntoInterator traits explains how the for … in syntax can be used for iterating over a Vec. And this is because the compiler by default calls the into_iter when for … in syntax is used.

But what about the other two similar methods that we saw at the beginning of this post? That is iter() and iter_mut().

It is also possible to use these two methods to turn a Vec into an iterator.

That is:

fn main() {

   let some_ints = vec![1,2,3];
   // first calling into_iter() works
   some_ints.iter().for_each(|item| {
       dbg!(item);
   });
}

and

fn main() {
   let mut some_ints = vec![1,2,3];
   some_ints.iter_mut().for_each(|item| {
       dbg!(item);
   });
}

What is going on when iter_mut() and iter() are used? And how is this different from the into_iter() that comes from the IntoIterator trait?

3 different kinds of iteration

Iterators can come in different forms. Nothing is stopping a developer from implementing an Iterator that has other custom behavior that defines how it iterates.

In Rust's standard library, most collections have 3 different kinds of Iterators. We can have one which takes ownership of the value being iterated, one that borrows the value immutably, and another that borrows the value and can mutate it.

An iterator that takes ownership can be created by calling into_iter(), one that borrows immutable can be created by calling iter() and the one that borrows value with the ability to mutate can be created by calling iter_mut(). This is the crux of the Rust Iterator pattern with iter(), into_iter() and iter_mut() methods post.

It turns out that the Rust compiler by default goes for the into_iter() version when it de-sugars the for … in syntax.

One important thing to point out here is the fact that it is possible to have a custom data structure, that is an iterator, i.e. has all the familiar iteration-related methods: map, for_each etc but which is not usable with the for … in syntax. This will be the case if such data structure implements Iterator but does not implement the IntoIterator trait. Because without implementing the IntoIterator trait, there will be no into_iter() method for the for … in syntax to call.

Another interesting point is what happens if we manually call any of into_iter(), iter(), or iter_mut ourselves as part of usage in for … in syntax. Basically what was shown in the Rust Iterator pattern with iter(), into_iter() and iter_mut() methods post.

How come these works:

fn main() {
   let mut some_ints = vec![1,2,3];
// manually calling iter in a forin 
   for i in some_ints.iter() { 
       dbg!(i);
   }

// manually calling iter_mut in a forin 
   for i in some_ints.iter_mut() {
       dbg!(i);
   }

// manually calling into_iter in a forin    
   for i in some_ints.into_iter() {
       dbg!(i);
   }
}

We are manually converting the Vec ourselves to an iterator by calling iter(), iter_mut(), and into_iter() and yet it works.

Why does this work?

Was it not already stated that the for … in syntax works with anything that implements IntoIterator which allows it to call into_iter(). And here we are, manually converting the Vec into an iterator ourselves, and yet the for … in works. How come?

The answer is in a little trick in the standard library. Which is the fact that the standard library contains this implementation for IntoIterator:

impl<I: Iterator> IntoIterator for I

This basically means any Iterator implements IntoIterator and the implementation is such that the Iterator returns itself when into_iter() is called. Which makes sense if you think about it. If something is already an Iterator, what else can be done when you attempt to turn it again into an Iterator other than it returning itself?

And this is what happens. Even though the iter(), into_iter() and iter_mut methods are called directly, for … in, still work, because the iterator created by calling this method automatically has an implementation of IntoIterator which returns itself and which the for … in syntax needs.

The above shows how implementing the IntoIterator trait can be done in such a way as to provide interesting functionalities.

Another interesting utility that is achieved via providing different implementations for IntoIterator is how it is possible to use for … in with a collection and yet be able to still use the collection after iteration, without having to call the iter() or iter_mut.

This is a more succinct syntax to the solution provided in the Rust Iterator pattern with iter(), into_iter() and iter_mut() methods post

We look at this, in the next section.

The 3 Implementation of IntoIterator for Vec

There are three different implementations of IntoIterator for the Vec type. These 3 different implementations are for 3 different variants of the Vec type depending on its memory access.

There are implementations of IntoIterator for the bare Vec<T> type, the immutable referenced &Vec<T> type and mutable reference &'a mut Vec<T> type.

The implementation of IntoIterator for bare Vec<T> returns an Iterator that takes ownership of the values as they are iterated. The implementation of IntoIterator for &Vec<T> borrows the value being iterated immutable, while the implementation of IntoIterator for &'a mut Vec<T> makes it possible to mutate the value as part of the iteration.

This means one can iterate over a Vec type and still be able to use it afterward if the iteration is done over &Vec<T> or &'a mut Vec<T>. For example:

fn main() {
   let some_ints = vec![1,2,3];
   for i in &some_ints { // same as calling some_ints.iter()
       dbg!(i);
   }
}

and

fn main() {
   let mut some_ints = vec![1,2,3];
   for i in &mut some_ints { // same as calling some_ints.iter_mut()
       *i = *i * 2;
   }
   dbg!(some_ints);
}

The above syntax can be used as a more succinct way to iterate over a data structure like Vec using the for … in syntax without taking ownership of the Vec.

Summary

  • The IntoIterator is a trait that defines how an Iterator can be created for a data structure. It defines an into_iter() method that when it is called, should return an Iterator 
  • The for .. in syntax requires there is an implementation of IntoIterator because the compiler automagically first calls into_iter() to retrieve an Iterator it uses for its iteration. 
  • The Rust standard library also contains an implementation of IntoIterator for an Iterator. This implementation just returns the Iterator. This makes sense, because if something is already an iterator, then returning it, satisfies the contract defined by IntoIterator 
  • The fact that there is an IntoIterator for Iterator that returns that iterator means that methods like iter() or iter_mut() can still be used within the for .. in syntax. The for .. in syntax, can call the into_iter() gets the Iterator itself and uses that for its iteration. 
  • The standard library also contains 3 different implementations of IntoIterator for Vec<T>, &Vec<T>, and &'a mut Vec<T>. The implementation for Vec<T> takes ownership of the value being iterated, the implementation for &Vec<T> borrows values being iterated immutably, and the implementation for &'a mut Vec<T> borrows values mutably. 
  • Given a variable some_var holding a Vec one can iterate over it using for … in and still be able to use it after the iteration if the iteration is done over &some_var and &mut some_ints.


4 comments:

Anonymous said...

Very clear explanation.

The is a small copy/paste error :

fn main() {
let some_ints = vec![1,2,3];
// calling for_each directly on a Vec won't compile
some_ints.iter()(|item| {
dbg!(item);
});
}

As you are trying to demonstrate non-compiling code this should be some_ints.for_each(), as confirmed by the rustc error message below.

dade said...

Thanks for spotting that. I have updated with the correction.

Anonymous said...

There are a few into_inter that should be into_iter.

dade said...

Thanks! Updated!