Async/Await for AVR with Rust

July 25, 2020 · Updated July 26, 2020 · 16 min read

With the recent ability for Rust to compile for AVR microcontrollers, I thought that it's time for me to bring my favorite feature of Rust to Arduino: async/await.

TL;DR: final code is at https://github.com/lights0123/async-avr. Here's an example of a simple task that does two things at once.

Introduction

The age-old question when learning Arduino: "How can I do multiple things at once"? When programming for full operating systems, like any desktop, mobile, or higher-end embedded platforms, the answer is easy—threads. They're built into the OS, and you typically have to use them to take advantage of features like multiple CPU cores. Or, when programming for an embedded target, you may reach for a lightweight RTOS like FreeRTOS, which gives you full threads without much work.

But what if you have a 16MHz microcontroller with 2 KB of RAM?

The Arduino Uno is known for being ubiquitous, cheap, and easy to program. Although there are more and more reasons to use competing products, such as the various ARM Cortex-M and Espressif boards (which I use almost exclusively), the Microchip (formerly Atmel) ATMega series is sometimes still preferred for low power applications, cutting cents off a BOM, and its widespread support.

Typically, one checks in a loop repeatedly to see if the task is ready to activate. This can be checking if there's data in a serial buffer, if a certain time period has passed, or a GPIO pin has toggled. However, I'm going to present an alternative to doing this—async/await.

What even is `async`/`await`, anyways?

Many modern languages are implementing async/await. C# was one of the first big languages to do it, and then Python, JavaScript, Dart, Kotlin quickly followed. The big difference between threads and async functions is that asynchronous code is handled by the program itself, usually cooperatively, while synchronous operations on threads rely on the OS to do it. The first obvious reason for it, then, is when threads aren't available—just like what we're about to get into soon. However, asynchronous code is typically faster than synchronous code because it is able to avoid the overhead of spawning threads and context switches, as well as the memory overhead, as the amount of space needed on the stack can be calculated ahead of time.

Internally, the compiler turns the code into a state machine, typically using the language's generators feature if it has one. Here, every time you write .await and the data is not available immediately, the compiler inserts a return statement, and adds a parameter that allows you to jump right back to where it was. Rust uses Pinning, where it is guaranteed that the generated struct that holds the stack will always be in the same place in memory at all times. This avoids a problem where a pointer is taken to a local variable, the function returns and comes back, and is suddenly pointing to something completely different.

Prior work

C++20 has similar features to Rust's async/await with the co_await keyword, although the available documentation is not quite as good as Rust's. However, there's a total of... 1 search result on the Arduino forums, and it's in German with no code. And even if there was code, it would be largely unaccessable as Arduino ships with an old compiler:

❱ ~/.arduino15/packages/arduino/tools/avr-gcc/*/bin/avr-gcc --version
avr-gcc (GCC) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.

GCC 7.3.0 ships with "almost full support" for C++17 (although not by default), but a total of 0 C++20 additions. And if you're on PlatformIO, you'll have even less luck:

❱ ~/.platformio/packages/toolchain-atmelavr/bin/avr-gcc --version
avr-gcc (AVR_8_bit_GNU_Toolchain_3.6.2_1759) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.

However, Rust has been able to compile async/await code for almost a year now on the stable compiler build. There's been steadily increasing interest in using these features on embedded platforms with Rust, and many projects have been using it as a result.

Pseudo-`async`/`await`

These libraries actually implement a very similar thing to async/await with a state machine, and jumping to different positions in code based on where it's already been. For example, AceRoutine has a macro, COROUTINE, that internally converts the function into a state machine that jumps to different places using goto. This is actually almost identical to how tools like Babel in the JavaScript world convert async functions to traditional functions that can be used with old browsers.

However, there are a few very critical problems:

Local variables are not preserved across yields.
You can't use certain statements in some libraries (not AceRoutine), such as switch.
Destructors don't run properly, and will free uninitialized memory if not done correctly (due to the first point).
Tasks can't return values.

These limitations are because simple preprocessor macros, which is the only option in C/C++, are only "glorified copy-paste", and have no real computing power. Although async is built into the Rust compiler, it wasn't before, and a (less-than-ideal, but still usable) macro could implement most of the feature itself. Hopefully, new C++20 features will lead to more use in the embedded world.

Putting it together

Getting started

It looks like there's a few packages (known as "crates" in the Rust world) to interface with the hardware:

ruduino, which only supports the Arduino Uno
avr-device, which provides low-level hardware control
avr-hal, which provides higher-level board and MCU support, and depends on avr-device

Although ruduino is by the avr-rust team that worked on getting Rust working on AVR, I ultimately decided to go with avr-hal:

It supports many more boards, and has a structure that allows it to expand further
It supports embedded-hal, which allows us to share code between boards: for example, the apa102_spi crate will let us control APA102 LED strips on an Arduino Uno and a Raspberry Pi running Linux, using the exact same code!
- embedded-hal also has a bigger focus on nonblocking operations, which is exactly what we're looking for
It has a cleaner and safer API

To set that up:

(Update 2020-07-26): this is much simpler now with the latest nightly. All that is required:

rustup install nightly
rustup +nightly component add rust-src

Compiling and Running

We can compile by running

cargo +nightly build -Z build-std=core --release --target avr-atmega328p.json
# or, in my repository which has some helpers configured
cargo +nightly build --release

Then, to upload it to a device, assuming that you have avrdude installed, run:

avrdude -v -patmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:target/avr-atmega328p/release/examples/serial.elf:e

Change the upload path (target/avr-atmega328p/release/examples/serial.elf) to meet what you want to upload.

If you only have the Arduino IDE installed

Enable "Show verbose output during: upload" in the Arduino IDE. Observe the build logs for an avrdude command—it should look something like:

/path/to/.arduino15/packages/arduino/tools/avrdude/6.3.0-arduino17/bin/avrdude -C/path/to/.arduino15/packages/arduino/tools/avrdude/6.3.0-arduino17/etc/avrdude.conf -v -patmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:/tmp/arduino_build_721874/Blink.ino.hex:i

Copy that command, but delete everything after -Uflash:w:. Then, without spaces, add the path to your binary. This will typically be target/avr-atmega328p/release/project_name.elf, or target/avr-atmega328p/release/examples/example_name.elf. Finally, add :e. Your final command will probably look something like:

/path/to/.arduino15/packages/arduino/tools/avrdude/6.3.0-arduino17/bin/avrdude -C/path/to/.arduino15/packages/arduino/tools/avrdude/6.3.0-arduino17/etc/avrdude.conf -v -patmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:target/avr-atmega328p/release/project_name.elf:e

What about converting to hex first?

Arduino typically converts the compiled binary to raw hex, and many AVR-Rust projects have followed that pattern. However, there's generally no need to do that, as avrdude has the ability to upload ELF binaries directly.

Onto `async` operations

As I mentioned earlier, embedded-hal is implemented in such a way that asynchronous operations is quite easy. When you attempt to read or write a value, you can get three results:

WouldBlock: I can't complete this operation right now.
Ok: The operation succeeded. If reading something, this gives you the read value.
Err: An error occurred while performing this operation.

This maps very nicely to Rust's Future trait, which is used to implement asynchronous code. For example, reading from a serial port is implemented like this:

fn poll_read(
    mut self: Pin<&mut Self>,
    cx: &mut Context<'_>,
    buf: &mut [u8],
) -> Poll<Result<usize, T::Error>> {
    if let Some(ptr) = buf.first_mut() {
        match self.0.read() {
            Ok(byte) => {
                *ptr = byte;
                Poll::Ready(Ok(1))
            }
            Err(nb::Error::WouldBlock) => Poll::Pending,
            Err(nb::Error::Other(err)) => Poll::Ready(Err(err)),
        }
    } else {
        Poll::Ready(Ok(0))
    }
}

Although there's a bit of weirdness in terms of checking if the buffer is big enough, for the most part, this works pretty well.

Why do I always return 1 byte read?

I'm mostly mirroring the AsyncRead and AsyncWrite in the futures crate, which typically is used on machines with full operating systems. Typically, the overhead of a syscall is somewhat high, so you might request e.g. 1024 bytes at a time. Obviously, that isn't the case for an 8-bit MCU. However, I'm keeping that pattern for ease of porting and to allow for networking, e.g. the Arduino Ethernet shield.

Then, I copied and pasted the AsyncReadExt and AsyncWriteExt traits from the futures crate as well. The way this is set up is that you just implement the low-level functions, and the Ext traits take care of the more user-friendly functions. This is done instead of default implementations to allow the main trait to be small in case it is added to the standard library, much like how Future is in std/core, but many functions require the extension trait FutureExt from the futures crate. The functions that AsyncReadExt and the like provide allow for tasks like reading and writing a fixed number of bytes, and awaiting until that is complete.

Writing an executor

Thankfully, there's a few places we can get code from—the Rust embedded team has been working on a generic executor, and I have my own executor that I've used for third-party code on TI calculators. I started off with a way to safely share state between interrupts. Typically, this is done with atomic instructions, but AVR does not have those (or necessarily have a need for it, as it is single-core). So, I used volatile operations:

#[derive(Debug)]
#[repr(transparent)]
struct Volatile<T: Copy>(UnsafeCell<T>);

impl<T: Copy> Volatile<T> {
    pub fn new(value: T) -> Volatile<T> {
        Volatile(UnsafeCell::new(value))
    }

    pub fn read(&self) -> T {
        unsafe { ptr::read_volatile(self.0.get()) }
    }

    pub fn write(&self, value: T) {
        unsafe { ptr::write_volatile(self.0.get(), value) };
    }
}

Why not the vcell crate?

The above code is functionally identical to the VolatileCell that vcell provides—in fact, I wrote the above code and realized that it is almost identical to that crate. However, I will probably want more control in the future, as the above code is unsound if used with a type that is bigger than 1 byte. This is because AVR can write or read one byte in one instruction, so if an interrupt occurs, it is either written or not. In the future, I'd like to expand it to disable interrupts when writing more than 1 byte, so state can be safely shared between contexts.

Creating a VTable

We need to tell Rust how it can wake up a task. So, doing some copy-pasting from async-on-embedded:

// NOTE `*const ()` is &Volatile<bool>
static VTABLE: RawWakerVTable = {
    unsafe fn clone(p: *const ()) -> RawWaker {
        RawWaker::new(p, &VTABLE)
    }
    unsafe fn wake(p: *const ()) {
        wake_by_ref(p)
    }
    unsafe fn wake_by_ref(p: *const ()) {
        (*(p as *const Volatile<bool>)).write(true)
    }
    unsafe fn drop(_: *const ()) {
        // no-op
    }

    RawWakerVTable::new(clone, wake, wake_by_ref, drop)
};

Here, I adapted the code to use my own synchronization code instead of AtomicBool, which we don't have access to.

The `block_on` function

Finally, it's time for the executor:

/// Spawns a task and blocks until the future resolves, returning its result.
pub fn block_on<T>(task: impl Future<Output = T>) -> T {
    let ready = Volatile::new(true);
    let waker = unsafe { Waker::from_raw(RawWaker::new(&ready as *const _ as *const _, &VTABLE)) };
    let mut context = Context::from_waker(&waker);
    pin_mut!(task);
    let mut task = task;
    loop {
        while ready.read() {
            match task.as_mut().poll(&mut context) {
                Poll::Ready(val) => {
                    return val;
                }
                Poll::Pending => {
                    ready.write(false);
                }
            }
        }
    }
}

This function will accept a Future, which is typically automatically generated when writing an async block in Rust. Then, it will poll it. The task will either say that it's done right away, or register itself to wake up (via the waker) when it's ready. Then, it gets polled again, continuing the cycle.

However, LLVM has a bug that crashes the MCU when calling a function pointer. Unfortunately, that's entirely how Wakers are implemented, so let's comment out the part where I mark it as not ready:

-    ready.write(false);
+ // ready.write(false);

This will just poll the task in a busy loop. This isn't great for power usage, but is easy to implement.

Writing asynchronous code!

A simple demo

Now that we have the low-level stuff working, we can get on to writing asynchronous code! Let's start with a basic, single-task only example just to make sure everything works:

#[no_mangle]
pub extern "C" fn main() -> ! {
    let dp = arduino_uno::Peripherals::take().unwrap();

    let mut pins = arduino_uno::Pins::new(dp.PORTB, dp.PORTC, dp.PORTD);

    let mut serial = AsyncSerial::new(arduino_uno::Serial::new(
        dp.USART0,
        pins.d0,
        pins.d1.into_output(&mut pins.ddr),
        57600,
    ));


    block_on(async {
        loop {
            serial.write_all(b"Hello World!\n").await.unwrap();
        }
    });
    loop {}
}

Compile and upload it:

❱ cargo +nightly build --example single-task --release && avrdude -v -patmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:target/avr-atmega328p/release/examples/single-task.elf:e
   Compiling async-avr v0.1.0 (/home/lights0123/IdeaProjects/async-avr)
    Finished release [optimized] target(s) in 0.28s

avrdude: Version 6.3-20190619
         Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
         Copyright (c) 2007-2014 Joerg Wunsch

avrdude: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.00s

avrdude: Device signature = 0x1e950f (probably m328p)
avrdude: safemode: lfuse reads as 0
avrdude: safemode: hfuse reads as 0
avrdude: safemode: efuse reads as 0
avrdude: reading input file "target/avr-atmega328p/release/examples/single-task.elf"
avrdude: writing flash (380 bytes):

Writing | ################################################## | 100% 0.06s

avrdude: 380 bytes of flash written
avrdude: verifying flash memory against target/avr-atmega328p/release/examples/single-task.elf:
avrdude: load data flash data from input file target/avr-atmega328p/release/examples/single-task.elf:
avrdude: input file target/avr-atmega328p/release/examples/single-task.elf contains 380 bytes
avrdude: reading on-chip flash data:

Reading | ################################################## | 100% 0.05s

avrdude: verifying ...
avrdude: 380 bytes of flash verified

avrdude: safemode: lfuse reads as 0
avrdude: safemode: hfuse reads as 0
avrdude: safemode: efuse reads as 0
avrdude: safemode: Fuses OK (E:00, H:00, L:00)

avrdude done.  Thank you.

And let's check the serial output:

❱ pio device monitor -b 57600
--- Available filters and text transformations: colorize, debug, default, direct, hexlify, log2file, nocontrol, printable, send_on_enter, time
--- More details at http://bit.ly/pio-monitor-filters
--- Miniterm on /dev/ttyACM0  57600,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
--- exit ---

A bit more powerful

Great—that worked! Now, let's try doing multiple things. Let's do both SPI communication and reading UART data now. Starting out with getting our peripherals:

let dp = arduino_uno::Peripherals::take().unwrap();

let mut pins = arduino_uno::Pins::new(dp.PORTB, dp.PORTC, dp.PORTD);

let serial = arduino_uno::Serial::new(
    dp.USART0,
    pins.d0,
    pins.d1.into_output(&mut pins.ddr),
    57600,
);

pins.d10.into_output(&mut pins.ddr); // CS must be set to output mode.

// Create SPI interface.
let spi = Spi::new(
    dp.SPI,
    pins.d13.into_output(&mut pins.ddr),
    pins.d11.into_output(&mut pins.ddr),
    pins.d12.into_pull_up_input(&mut pins.ddr),
    Settings::default(),
);

Converting them to async versions:

let mut spi = AsyncSpi::new(spi);

let (rx, tx) = serial.split();
let mut rx = AsyncSerial::new(rx);
let tx = RefCell::new(AsyncSerial::new(tx));

Some basic locking:

let serial_lock = Cell::new(false);

Why do we need to lock?

Because we may have multiple tasks trying to write to the same serial port at the same time, we need to add synchronization so that doesn't happen. Ideally, I would use a Mutex that handles all this for us, but I haven't written one yet (or moreso copied-and-pasted from one that already exists).

Creating a serial loop:

let serial_loop = async {
    loop {
        let mut b = [0];
        rx.read_exact(&mut b).await.unwrap();
        loop {
            if !serial_lock.get() {
                serial_lock.set(true);
                tx.borrow_mut().write_all(b"hello!\n").await.unwrap();
                serial_lock.set(false);
                break;
            }
            Yield::default().await;
        }
    }
};

Note: this will not actually run the loop. That comes later.

Creating an SPI loop:

let spi_loop = async {
    loop {
        spi.write_all(b"a").await.unwrap();
        let mut data = [0; 1];
        spi.read_exact(&mut data).await.unwrap();
        loop {
            if !serial_lock.get() {
                serial_lock.set(true);
                let mut tx_ref = tx.borrow_mut();
                tx_ref.write_all(b"wrote ").await.unwrap();
                tx_ref.write_all(&data).await.unwrap();
                tx_ref.write_all(b"!\n").await.unwrap();
                serial_lock.set(false);
                break;
            }
            Yield::default().await;
        }
        Yield::default().await;
    }
};

What's with all the yielding?

As Rust implements cooperative multitasking, we need to explicitly give control back to other tasks. Otherwise, we'd be in a loop waiting for the other task to be done writing, without ever giving it a chance to write!

Finally running it:

block_on(async { futures_util::join!(serial_loop, spi_loop) });

Compile and upload it:

❱ cargo +nightly build --example serial && avrdude -v -patmega328p -carduino -P/dev/ttyACM0 -b115200 -D -Uflash:w:target/avr-atmega328p/release/examples/serial.elf:e
   Compiling async-avr v0.1.0 (/home/lights0123/IdeaProjects/async-avr)
    Finished release [optimized] target(s) in 0.28s

avrdude: Version 6.3-20190619
         Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
         Copyright (c) 2007-2014 Joerg Wunsch

avrdude: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.00s

avrdude: Device signature = 0x1e950f (probably m328p)
avrdude: safemode: lfuse reads as 0
avrdude: safemode: hfuse reads as 0
avrdude: safemode: efuse reads as 0
avrdude: reading input file "target/avr-atmega328p/release/examples/serial.elf"
avrdude: writing flash (2562 bytes):

Writing | ################################################## | 100% 0.43s

avrdude: 2562 bytes of flash written
avrdude: verifying flash memory against target/avr-atmega328p/release/examples/serial.elf:
avrdude: load data flash data from input file target/avr-atmega328p/release/examples/serial.elf:
avrdude: input file target/avr-atmega328p/release/examples/serial.elf contains 2562 bytes
avrdude: reading on-chip flash data:

Reading | ################################################## | 100% 0.34s

avrdude: verifying ...
avrdude: 2562 bytes of flash verified

avrdude: safemode: lfuse reads as 0
avrdude: safemode: hfuse reads as 0
avrdude: safemode: efuse reads as 0
avrdude: safemode: Fuses OK (E:00, H:00, L:00)

avrdude done.  Thank you.

And let's check the serial output:

❱ pio device monitor -b 57600
--- Available filters and text transformations: colorize, debug, default, direct, hexlify, log2file, nocontrol, printable, send_on_enter, time
--- More details at http://bit.ly/pio-monitor-filters
--- Miniterm on /dev/ttyACM0  57600,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
wrote a!
wrote a!
wrote a!
wrote a!
hello!
wrote a!
wrote a!
wrote a!
--- exit ---

Here, I pressed a key on my keyboard to send a character, which resulted in the reply of "hello!". This successfully demonstrates multiple tasks running at the same time.

To Do

Right now, this is far from being done and usable. I only have basic for two peripherals: UART and SPI. More work is required for timers, but once that is done, using async/await on an Arduino will be very nice—scheduling a periodic task would be as easy as surrounding it in a timeout function, and waiting for e.g. serial data would not block anything else. Additionally, getting interrupts to work will allow the MCU to go to sleep while waiting for something to occur, saving power.

Currently, effort on this is blocked on an LLVM issue that prevents calling function pointers. Until that is fixed, busy-loop polling is the best we can do.

However, async/await could definitely change the way Arduino code is written. It doesn't have to be in Rust (although I would definitely not complain :) ), but this style could allow for easier to write and more power efficient projects.

Final code

https://github.com/lights0123/async-avr

Thanks to

These are a few resources that have been helpful.

https://github.com/Rahix/avr-hal
https://github.com/avr-rust/ruduino/issues/9
https://github.com/shepmaster/rust-arduino-blink-led-no-core-with-cargo
https://github.com/nh2/quadcopter-simulation/blob/8a3652b8a704877156e91c0691ce8ce37588eb53/arduino-accel-rust/src/main.rs#L300-L327

On this page

Async/Await for AVR with Rust

Introduction

What even is `async`/`await`, anyways?

Prior work

Pseudo-`async`/`await`

Putting it together

Getting started

Compiling and Running

If you only have the Arduino IDE installed

What about converting to hex first?

Onto `async` operations

Why do I always return 1 byte read?

Writing an executor

Why not the `vcell` crate?

Creating a VTable

The `block_on` function

Writing asynchronous code!

A simple demo

A bit more powerful

Why do we need to lock?

What's with all the yielding?

To Do

Final code

Thanks to

On this page

Async/Await for AVR with Rust

Introduction

What even is async/await, anyways?

Prior work

Pseudo-async/await

Putting it together

Getting started

Compiling and Running

If you only have the Arduino IDE installed

What about converting to hex first?

Onto async operations

Why do I always return 1 byte read?

Writing an executor

Why not the vcell crate?

Creating a VTable

The block_on function

Writing asynchronous code!

A simple demo

A bit more powerful

Why do we need to lock?

What's with all the yielding?

To Do

Final code

Thanks to

What even is `async`/`await`, anyways?

Pseudo-`async`/`await`

Onto `async` operations

Why not the `vcell` crate?

The `block_on` function