My experiences going Rust from C++

I’ve been experimenting with Rust for over 6 months now. Most of that time I spent playing around with a C64 emulator I wrote as a first project and initially I thought about creating a series on that topic. However, since there’s so much reading material on the Internet about it already, I figured maybe it would be a good idea to write an intro for C/C++ programmers on Rust. But then I found this article, so I decided to take a completely different route.

In this post I wanted to outline the problems/quirks I ran into when transitioning from C++ to Rust. This by no means indicates that the language is poorly constructed – it’s just a matter of putting yourself in a completely different mindset, since Rust is really more than it seems at first glance and it has traps of its own if you try to code in it “C-style”. At the time of writing this, the latest stable version of the compiler is 1.7.0, so some things might get outdated with time. If you’ve been programming for a while and are considering trying out Rust, here are some things worth being wary of as you start:

1. Data ownership model

The first thing I had to learn is how variables in Rust are not variables at all but rather bindings to specific data. As such, the language introduces the concept of ownership in a sense that data can be bound to one and only one “variable” at a time. There are good examples in the link above of how that works, so I won’t be going into details here. The reason this caused me so much problem is that referring to other struct members and recurring function calls have to be thought through very carefuly when writing a program in Rust. Gone is the idea of throwing pointers everywhere and reusing it when you see it fit – once data in Rust is borrowed you have to finish doing with it what you want in order to reclaim it somewhere else in the code. It’s an interesting concept, one that surely provides some extra safety measures which other languages lack, nevertheless it takes a while to get accustomed to it.

2. No inheritance

The lack of a basic inheritance model in Rust forced me to duplicate some parts of the code. To give an example, the C64 has two timer chips which are essentialy the same thing – for emulation purposes differing in only one function. A natural instinct here is to create a single struct and just overload that particular function but Rust has no mechanism for it. The closest thing that met my needs was a trait but what I really needed was “a trait with properties”. If your design relies heavily on OOP you should either rethink it or choose a different language.

3. No cyclic references

Having started to code my emulator “C-style”, I decided to go for a clear structure that would define the entire computer:

struct C64
{
   sid_chip: SID,  // audio   
   vic_chip: VIC,  // graphics
   cpu_chip: CPU,  // processor
   cia1_chip: CIA, // timer 1
   (...)
}

Each member variable is a simple struct type. The design was clean and satisfying, so I happily started hacking at the code implementing each chip in turn.

Halfway in my work I realized I made a terrible mistake.

It turned out that all components of the C64 struct will have to communicate with each other directly in certain situations, so I needed some sort of a “bus” component. I really wanted to avoid creating an artificial structure for that purpose which eventually would introduce annoyances of its own. Global variables spread over all modules was not an option I wanted to use either.

It was a problem I couldn’t initially solve for a couple of reasons: Rust doesn’t provide any mechanism for struct field objects to communicate with the parent and you can’t just pass a reference to parent since that breaks the Rust ownership rules. Eventually I found a solution by embedding each chip structure into an Rc nested RefCell. With this I was able to use cloned instances as separate references which I would then pass during each chip’s construction. In simplest term, this solution provides a behavior similar to smart pointers, so even though a clone of the first instance is being referenced it still deals with the same data as the original copy. Once all instances are destroyed (or dropped using Rust terminology) the memory is freed completely, so it’s safe from memory leaks.

4. No explicit NULL value

Being a language set on safety, Rust disallows creating an object without initializing each of its member variables. This means no NULL pointers which you can set at a later time, so I was stuck with a new problem after introducing RefCells:

struct VIC
{
   cpu_ref: Rc<RefCell<CPU>>,  // reference to the CPU object   
   (...)
}

impl VIC
{
    // construction of VIC
    pub fn new_shared() -> Rc<RefCell<VIC>> {
            Rc::new(RefCell::new(VIC {
                cpu_ref: CPU::new_shared(), // creating a shared instance of CPU - because we have to
                (...)
                }))
    }
}


struct CPU
{
   vic_ref: Rc<RefCell<VIC>>,  // reference to the VIC chip object   
   (...)
}

impl CPU
{
    // construction of CPU
    pub fn new_shared() -> Rc<RefCell<CPU>> {
            Rc::new(RefCell::new(CPU {
                vic_ref: VIC::new_shared(),  // this causes a problem!
                (...)
                }))
    }
}

What’s happening above is once the CPU is constructed it will force the creation of VIC which will in turn create another CPU and so on, resulting in infinite recurrence. This is where std::option comes into play, being the closest thing to a NULL value in Rust:

struct VIC
{
   cpu_ref: Option(Rc<RefCell<CPU>>),  // now it's optional
   (...)
}

impl VIC
{
    // construction of VIC
    pub fn new_shared() -> Rc<RefCell<VIC>> {
            Rc::new(RefCell::new(VIC {
                cpu_ref: None,  // no infinite recurrence - will set the reference later on
                (...)
                }))
    }
}

My only gripe with this approach was that I had to specifically create a set_references() function for each type and since each struct had different references it couldn’t be neatly solved with a more generic trait.

5. Rust macros are not what you think at first!

The natural way of thinking about macros when coming from a C background is “replace this expression with elaborately syntaxed code”. Not suprisingly, Rust takes a completely different approach, deeming (quite rightfully) plain text substitution as unsafe and error prone. After switching to shared RefCell instances I faced the problem of obfuscated syntax when trying to access the actual underlying data:

// attempting to get inside the Rc<RefCell<CPU>> from within VIC struct.
// imagine typing that every single time when you need it!
self.cpu_ref.as_ref().unwrap().borrow_mut().set_vic_irq(true);

Unlike C, a macro in Rust is treated as a syntactic structure and as such has limitations of its own. You can’t access properties of an object nor can you use the self keyword to simplify your code further:

macro_rules! as_ref {
    ($x:expr) => ($x.as_ref().unwrap().borrow_mut())
}

(...)

// same code using a Rust macro - as short as it could get
as_ref!(self.cpu_ref).set_vic_irq(true);

(...)

While I understand the reasoning behind making a macro the way it is, I still find it a bit dissapointing not being able to use the less safe C-style variant.

6. Type wrapping is technically undefined

This is a language trait I’m a bit on the fence with. In C, once you go over beyond the data type scope you automatically wrap – a feature that’s been extensively used in 8-bit computers as well. At the time of writing this, data wrapping in Rust is undefined and will cause a panic! in debug builds. Wrapping data is possible but requires additional boilerplate:

(...)

self.some_8bit_variable.wrapping_add(1); // safely wraps 255 -> 0 when addition overflows

(...)

While it’s fine that Rust explicitly tells us where data wrapping is meant to happen, I’d still want to be able to manually turn that feature off for the sake of more compact code.

7. Type explicitness everywhere

Depending on the point of view, one of the biggest flaws/merits of C and C++ is implicit type conversion when assigning variables to each other, so you can “safely” assign a char to an int and pretty much expect the code to work, as long as you know what you’re doing. Also, let’s disregard for a second that we’re pragmatic progammers who adhere to compiler warnings – my practice shows that when it comes to data precision they’re mostly ignored (or completely turned off!).

So the thing is, Rust disallows assigning different types of variables to each other unless you specifically cast one type to another. The syntax of such a cast, however, I found to be slightly cumbersome to use especially if I had to perform several casts during one operation (adding bytes, casting them to words to perform a shift, then going back to byte again etc.). This is something one has to get used to, but in my early code this was the major cause of bugs:

// EXAMPLE 1
// relative memory addressing mode in C64 code excerpt: fetching operand
// bugged code: wrong relative offset calculated
fn get_operand_rel() -> u8
{
    let offset = mem.next_byte() as i8;  // memory offset should be treated as a signed char
    let addr: i16 = cpu.prog_counter + offset as u16; // BUG!
    mem.read_byte(addr as u16) // address in mem is stored as u16, so have to cast it *again*
}

// correct code: (took quite a while to track the bug down!)
fn get_operand_rel() -> u8
{
    let offset = mem.next_byte() as i8;
    let addr: i16 = cpu.prog_counter as i16 + offset as i16; // Correct! Casting both to i16
    mem.read_byte(addr as u16)
}

// EXAMPLE 2
fn foo()
{
    let var = mem.next_byte() as u8;
    let var_word: u16 = (var as u16) << 8; // would probably look neater as (u16)var << 8
    (...)
    
    // this is legal Rust code!
    let var2 = 10 as usize as u16 as u8 as u32 as f64;
}

I admit – a lot of this is me being subjective with my own preferences but the point is that if you’re used to extensive casting you may run into trouble with understanding your code. On the other hand, this may encourage programmers into breaking more complicated operations into steps for the sake of clarity. Seeing how mixed current Rust codebases are, I’m not so sure this will soon happen, though.

8. Forget OOP – go functional

Disregarding OOP as the silver bullet of programming is not uncommon today as people realize how many problems that model creates once you go deeper. Cache hits and misses, convoluted relationships between different classes and sometimes over-the-top patterns would be on top of the list. As I got more experienced with Rust it became clear that functional programming is its main focus. If you decide to write an application, you may have to forget quite a few things you know from C++. Use functions. Use modules. Use structs but don’t rely heavily on OOP patterns to assure communication between objects. In the end it will only make you happy as the code becomes a lot more readable and easier to navigate – and this comes from a person who made his first Rust application entirely in Emacs!

Try Rust. You will enjoy it! 🙂

Tweet about this on TwitterShare on RedditShare on LinkedInShare on FacebookShare on Google+Share on Tumblr

2 thoughts on “My experiences going Rust from C++

  1. Thanks for posting this. Just a quick note on integer overflow, because I think what you’ve written is a little unfair to Rust…

    Signed integer overflow is undefined behavior in C/C++: http://stackoverflow.com/questions/18195715/why-is-unsigned-integer-overflow-defined-behavior-but-signed-integer-overflow-is

    It’s actually pretty well-defined in Rust: in a debug build, it panics, and in a release build, it wraps. https://github.com/rust-lang/rfcs/blob/master/text/0560-integer-overflow.md

    I can see how it’d be a pain when writing an emulator, though, that Rust enforces the same rules on unsigned types.

    • Yes, my main issue was with overflow of *any* type in Rust. On the other hand I think it’s still better than “selectively defined” behavior like in C++.

Leave a Reply

Your email address will not be published. Required fields are marked *