Writing Ruby gems in Rust

Ruby has support for native extension loading dynamically linked libraries (.so on Linux, .bundle on MacOS) so you can use a library written in a compiled language as if it was written in Ruby. As long as your language can link against C code and expose callable functions whose names aren't mangled, you can write a native gem extension in that language.

You need to be able to link against C code so you can call the provided functions for working with and defining value for the Ruby interpreter (usually prefixed with rb_). Unmangled are needed so your dynamically linkable library needs to expose the function Init_<name>, where name is the name of the module you're created, I was interested in doing this in Rust but didn't find much in the way of how to actually do it so that the native code was usable like Ruby, most of the things I found were using the ffi or fiddle standard gems which I didn't want.

I'm going to run through building an extension in Rust, called hello, that exposes the module function Hello.say_it. say_it can take an arguments, a symbol or a string, or a block. The symbol or string should be the locale and the block should return the locale. It will then return a string which is hello in the language identified by the locale. You can find the end product here: https://git.sr.ht/~nds/rust_ruby_playground/tree.

Compiling

We need to compile our extension into something that can be dynamically loaded, a shared object (.so) on Linux or a loadable bundle (.bundle) on Mac OS. There is the standard gem mkmf that generates the Makefile for us and the gem rake-compile that makes it easy to invoke that Makefile and copy the output to the right place. All these helpers for native extensions are intended for C and C++, so we need to do a little bit of tweaking for them to handle Rust. Thankfully, they are configurable enough that we can do it without resorting to too many hacks.

mkmf

mkmf is a gem in the standard library that provides the MakeMakefile module. It make it (relatively) easy to generate a cross-platform Makefile to compile a C or C++ native extension (there may also be Java support as this seems relatively common but I haven't looked into that). If we were writing C or C++ that'd be great, we'd pretty much be done. However, this is a gem written in Rust so mkmf works against us some what so what have to do a bit of Makefile munging in our extconf.rb. The key things are:

Setting $srcs because by default create_makefile only finds C, C++ and object files:

$srcs = Dir[File.join(RbConfig.expand("$(srcdir)"), "src", "**", "*.rs")].sort

Settings the values used by the linker tasks to not actually link and just copy the files we built with Cargo around:

create_makefile('hello') do |mk|
    mk << "LDSHARED = true"
    mk <<
      "POSTLINK = cp ./release/libhello.dylib $(TARGET_SO)"
end

LDSHARED is usually the compiler (clang or gcc) with the right flags create a dynamically linked library. Setting it to true (an executable or shell built-in that should be available on most non-Windows systems) is a hack I'm kind of proud off. It takes any arguments, ignores them and returns with an exit code of 0. It's an easy way of saying we don't actually care about this step.

POSTLINK is a hook that the Makefile created by mkmf provides to run after the shared library has been linked. As we didn't actually link anything in the linking step, we need to copy the artifact built by cargo to the right spot with the right name - that's all the post link hook does.

The last bit of munging is probably the most heinous but it's pretty simple so I reckon it's fine. We substitute the compilation task all for one that invokes cargo:

contents = File.read("Makefile")
all = <<-MAKEFILE
\tcargo build --release --manifest-path $(srcdir)/Cargo.toml
MAKEFILE
contents.gsub!(/^all: .*$/, all)

# Rewrite the makefile with our changes.
File.write('Makefile', contents)

We need to specify --manifest-path because the working directory when extconf.rb is invoked is not the extension directory (because it's being invoked by rake-compiler in this case). I think there must be a better way for this because of how configurable the generated Makefile already is.

rake-compiler

rake-compiler is a commonly used gem for compiling native gems. It provides tasks for building them. Both psych and semian, which I used as references for creating a native gem, use this. For normal C and C++ gems it doesn't really require any configuration. For Rust we need to be able to match the source files using a different glob to the default, this is as easy as:

GEMSPEC = Gem::Specification.load(File.join(__dir__, "hello.gemspec"))
Rake::ExtensionTask.new("hello", GEMSPEC) do |ext|
  ext.source_pattern = "**/*.{rs,toml}"
end

We can then compile our gem with rake compile and the dynamically linkable library will end up in the right spot: lib/hello. Ready for us to require.

Requiring

Compiling our gem isn't super useful if we can't then require it from Ruby. We do this by requireing the dynamic library: require "hello.so". Our Ruby file frontperson that ties it all together could be as simple as:

# lib/hello.rb

require "hello.so"

Wait! You're using .so. Yep! Ruby has special handling for that and even though our file has the .bundle extension on Mac OS, it'll find the right file and initialise our library.

For the required library to actually do anything you'll need to have defined and exported (unmangled) a function called Init_<name> where name is the gem's name, in this case we're exporting Init_hello.

As an aside, the way I often see native gems done is having the fundamental or performance sensitive stuff done in the native language, then the functionality and a nice interface is built up within the Ruby code itself. I think this makes sense because the native code interfacing with Ruby can be quite verbose and is very imperative - defining a module requires calling rb_define_module then calls to rb_define_module_function to define functions on that module, rather than just writing the function within the module in Ruby.

Interacting with Ruby

We're creating a Ruby gem so we need to be able to interact with Ruby. That means understanding values from Ruby and defining functions, classes, modules, etc. to be used in Ruby. In requiring I briefly mentioned two functions from Ruby's C API: rb_define_module and rb_define_module_function. These are just two of many functions in this API which allow us to define an interface between our native code (in this case Rust) and Ruby code. Much of it is documented. The C API is used with the Ruby source code so that provides great examples for how to perform more advanced tasks. For example:

Using this API from Rust means we need to generate bindings for the C code. The interface is contained in ruby.h. We could do this ourselves using the bindgen crate. However, setting up this building is unnecessary work, rb-sys already provides this for us. rb-sys is built and maintained by the person who implemented building native extensions in Rust for rubygems. There is also the older ruby-sys which seems to be quite popular.

In hello, we expose a function called Init_hello that is the entrypoint:

#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn Init_hello() {
    let module_name = CString::new("Hello").unwrap();
    let fn_name = CString::new("say_it").unwrap();

    unsafe {
        let klass = rb_define_module(module_name.as_ptr());
        let cb = std::mem::transmute::<
            unsafe extern "C" fn(i32, *const RubyValue, RubyValue) -> RubyValue,
            unsafe extern "C" fn() -> RubyValue,
        >(ruby_say_it);
        rb_define_module_function(klass, fn_name.as_ptr(), Some(cb), -1);
    }
}

The no_mangle proc macro is important because otherwise the function name would be mangled to ensure it's uniqueness. Ruby expects a function of this exact name when loading or extension so mangling has to be turned off.

Macros

Quite a few important "functions" provided by the C API are in actual fact macros. For example, to check the Ruby type pointed to be a VALUE we have the rb_check_type function. However, this will raise a TypeError if the assertion is false. What about if we want a boolean response? Perhaps we do different things based on the argument type (this is pretty common in Ruby interfaces). The C API provides the RB_TYPE_P macro. Macros are processed by the C preprocessor so aren't usable in Rust. So, what we need to do is define some wrappers, in C, then generate bindings for those to use in Rust. For example:

// wrapper.h
#include <stdbool.h>
#include "ruby.h"

bool raw_is_type(VALUE v, int t);

// wrapper.c
#include "./wrapper.h"

bool raw_is_type(VALUE v, int t)
{
    return RB_TYPE_P(v, t);
}

Generating bindings against this we can then use the raw_is_type. I'd recommend then giving it a Rust wrapper so you don't have to cast the ruby_value_type enum every time you want to use it:

pub unsafe fn is_type(v: VALUE, typ: ruby_value_type) -> bool {
    raw_is_type(v, typ as i32)
}

You will need to compile this wrapper so that there is a object file to link against, the easiest for this is to use the cc crate, it's pretty easy to use, checkout the build.rs. This isn't something particular to Rust-Ruby interop, it's a Rust-C interop thing and is documented well in The Embedded Rust Book

One example of a Ruby gem in Rust I found in my search was Steve Klabnik's RustExample. It hasn't been updated in a while but one thing at the bottom of a README in the repository stuck out to me:

Re-implementing all this in Rust seems error prone, so I think that a thin C layer which uses this stuff and then passes the "regular C stuff" to Rust makes more sense than reimplementing stuff like this just to get rid of the C.

Source

I think that this problem is part of what Steve was referring to. For the purposes of playing around making this gem I wanted as much as possible to be written in Rust but I think for maintainability writing a shim layer in C would be much easier. It would be really annoying to write wrappers for all the macros the C API expects to be used.

Testing

I think a good idea for writing an extension in Rust, and probably any language, is to try and contain usage of the Ruby C API to a thin layer around the outside of the actual logic. That will, among other things, make it much easier to test. In the hello crate, the say_it function is what does the actual work, the rest is wrangling the Ruby API. We can write tests for it how we usually would.

#[cfg(test)]
mod tests {
    use crate::say_it;

    #[test]
    fn en_locale() {
        let result = say_it("en").unwrap();
        assert_eq!(result, "Hello, world!");
    }
}

Sometimes it might be nice to write tests that involve using the Ruby API, though they could often be written as tests in Ruby allowing you to also test the Ruby interface. To write tests that involve Ruby in Rust, you need to make sure that the ruby_init function is invoked before you using any of the rb_ prefixed functions.

#[cfg(test)]
    mod test {
        use ruby_test::ruby_test;

        use super::*;
        use std::ffi::CString;

        #[test]
        fn is_type_string() {
            unsafe { ruby_init() };
            let rs = "test";
            let len = rs.len();
            let s = CString::new(rs).unwrap();
            let v = unsafe {
                rb_utf8_str_new(
                    s.as_ptr(),
                    len as std::os::raw::c_long
                )
            };
            let result = unsafe {
                is_type(
                    v,
                    rb_sys::ruby_value_type::RUBY_T_STRING,
                )
            };
            assert!(result)
        }
    }

Debugging

Lets be real - it's now always going to be this simple. When we're developing a real extension there are going to be bugs and we're going to have to dig into them one way or another. We should be able to use all the tools at our disposal, in Rust that includes lldb (wrapped up in rust-lldb). For our hello gem we can drop into a breakpoint in say_it with:

bundle exec rust-lldb --one-line 'b say_it' --oneline 'process launch' -- ruby -e 'require "hello"; puts Hello.say_it :en'

You can do the same with rust-gdb if that's what you prefer. I crafted this command from this blog post which uses gdb.

You can also run your specs in a similar way:

bundle exec rust-lldb --one-line 'b hello::say_it' --one-line 'process launch' -- ruby -e 'require "rspec"; RSpec::Core::Runner.run(Dir["**/*_spec.rb"])'

Note: The arguments to RSpec::Core::Runner.run are the same as how you'd invoke rspec on the command line (they're just an array)

Inspiration

The way I figured much of this out was looking at semian and psych (gems that contain extensions written in C) and working out how that translates to Rust.

Last word

Implementing this extension I wanted as "raw" an experience as possible, using rb-sys which is just Ruby bindings with no sugar on top. There are crates, like rutie that provide a more polished experience, allowing you to define modules, classes, etc. in a more declarative way. I think in general these would make it easier if you're developing a larger extension but I'm not sure there would be the same incentive to separate your Ruby interface code from your logic code which would make testing more difficult. I'm interested to try crates like this out as I want to further explore writing extensions in Rust for some software reliability projects I have in mind.