Writing Ruby gems in Rust
ruby rust
Ruby has support for native extension loading dynamically linked libraries
(.so
on Linux, .bundle
on MacOS) so you can use a library written in a
compiled language as if it was written in Ruby. As long as your language can
link against C code and expose callable functions whose names aren't mangled,
you can write a native gem extension in that language.
You need to be able to link against C code so you can call the provided
functions for working with and defining value for the Ruby interpreter (usually
prefixed with rb_
). Unmangled are needed so your dynamically linkable library
needs to expose the function Init_<name>
, where name
is the name of the
module you're created, I was interested in doing this in Rust but didn't find
much in the way of how to actually do it so that the native code was usable like
Ruby, most of the things I found were using the ffi
or fiddle
standard gems
which I didn't want.
I'm going to run through building an extension in Rust, called hello
, that
exposes the module function Hello.say_it
. say_it
can take an arguments, a
symbol or a string, or a block. The symbol or string should be the locale and
the block should return the locale. It will then return a string which is hello
in the language identified by the locale. You can find the end product here:
https://git.sr.ht/~nds/rust_ruby_playground/tree.
Compiling
We need to compile our extension into something that can be dynamically loaded,
a shared object (.so
) on Linux or a loadable bundle (.bundle
) on Mac OS.
There is the standard gem mkmf
that generates the Makefile for us and the gem
rake-compile
that makes it easy to invoke that Makefile and copy the output to
the right place. All these helpers for native extensions are intended for C and
C++, so we need to do a little bit of tweaking for them to handle Rust.
Thankfully, they are configurable enough that we can do it without resorting to
too many hacks.
mkmf
mkmf
is a gem in the standard library that provides the MakeMakefile
module.
It make it (relatively) easy to generate a cross-platform Makefile
to compile
a C or C++ native extension (there may also be Java support as this seems
relatively common but I haven't looked into that). If we were writing C or C++
that'd be great, we'd pretty much be done. However, this is a gem written in
Rust so mkmf
works against us some what so what have to do a bit of Makefile
munging in our
extconf.rb.
The key things are:
Setting $srcs
because by default create_makefile
only finds C, C++ and object files:
$srcs = Dir[File.join(RbConfig.expand("$(srcdir)"), "src", "**", "*.rs")].sort
Settings the values used by the linker tasks to not actually link and just copy the files we built with Cargo around:
create_makefile('hello') do |mk|
mk << "LDSHARED = true"
mk <<
"POSTLINK = cp ./release/libhello.dylib $(TARGET_SO)"
end
LDSHARED
is usually the compiler (clang
or gcc
) with the right flags
create a dynamically linked library. Setting it to true
(an executable or
shell built-in that should be available on most non-Windows systems) is a hack
I'm kind of proud off. It takes any arguments, ignores them and returns with an
exit code of 0. It's an easy way of saying we don't actually care about this
step.
POSTLINK
is a hook that the Makefile
created by mkmf
provides to run after
the shared library has been linked. As we didn't actually link anything in the
linking step, we need to copy the artifact built by cargo
to the right spot
with the right name - that's all the post link hook does.
The last bit of munging is probably the most heinous but it's pretty simple so I
reckon it's fine. We substitute the compilation task all
for one that invokes
cargo
:
contents = File.read("Makefile")
all = <<-MAKEFILE
\tcargo build --release --manifest-path $(srcdir)/Cargo.toml
MAKEFILE
contents.gsub!(/^all: .*$/, all)
# Rewrite the makefile with our changes.
File.write('Makefile', contents)
We need to specify --manifest-path
because the working directory when
extconf.rb
is invoked is not the extension directory (because it's being
invoked by rake-compiler in this case). I think there must be
a better way for this because of how configurable the generated Makefile
already is.
rake-compiler
rake-compiler
is a commonly
used gem for compiling native gems. It provides tasks for building them. Both
psych and
semian, which I used as references for
creating a native gem, use this. For normal C and C++ gems it doesn't really
require any configuration. For Rust we need to be able to match the source files
using a different glob to the default, this is as easy as:
GEMSPEC = Gem::Specification.load(File.join(__dir__, "hello.gemspec"))
Rake::ExtensionTask.new("hello", GEMSPEC) do |ext|
ext.source_pattern = "**/*.{rs,toml}"
end
We can then compile our gem with rake compile
and the dynamically linkable
library will end up in the right spot: lib/hello
. Ready for us to require.
Requiring
Compiling our gem isn't super useful if we can't then require it from Ruby. We
do this by require
ing the dynamic library: require "hello.so"
. Our Ruby file
frontperson that ties it all together could be as simple as:
# lib/hello.rb
require "hello.so"
Wait! You're using .so
. Yep! Ruby has special handling for that and even
though our file has the .bundle
extension on Mac OS, it'll find the right file
and initialise our library.
For the required library to actually do anything you'll need to have defined and
exported (unmangled) a function called Init_<name>
where name
is the gem's
name, in this case we're exporting Init_hello
.
As an aside, the way I often see native gems done is having the fundamental or
performance sensitive stuff done in the native language, then the functionality
and a nice interface is built up within the Ruby code itself. I think this makes
sense because the native code interfacing with Ruby can be quite verbose and is
very imperative - defining a module requires calling rb_define_module
then
calls to rb_define_module_function
to define functions on that module, rather
than just writing the function within the module in Ruby.
Interacting with Ruby
We're creating a Ruby gem so we need to be able to interact with Ruby. That
means understanding values from Ruby and defining functions, classes, modules,
etc. to be used in Ruby. In requiring I briefly mentioned two
functions from Ruby's C API: rb_define_module
and rb_define_module_function
.
These are just two of many functions in this API which allow us to define an
interface between our native code (in this case Rust) and Ruby code. Much of it
is documented. The C
API is used with the Ruby source code so that
provides great examples for how to perform more advanced tasks. For example:
Using this API from Rust means we need to generate bindings for the C code. The
interface is contained in ruby.h
. We could do this ourselves using the
bindgen crate. However, setting up
this building is unnecessary work,
rb-sys already provides this for us.
rb-sys
is built and maintained by the person who implemented building native
extensions in Rust for rubygems. There is also the older
ruby-sys which seems to be quite
popular.
In hello
, we expose a function called Init_hello
that is the entrypoint:
#[allow(non_snake_case)]
#[no_mangle]
pub extern "C" fn Init_hello() {
let module_name = CString::new("Hello").unwrap();
let fn_name = CString::new("say_it").unwrap();
unsafe {
let klass = rb_define_module(module_name.as_ptr());
let cb = std::mem::transmute::<
unsafe extern "C" fn(i32, *const RubyValue, RubyValue) -> RubyValue,
unsafe extern "C" fn() -> RubyValue,
>(ruby_say_it);
rb_define_module_function(klass, fn_name.as_ptr(), Some(cb), -1);
}
}
The no_mangle
proc macro is important because otherwise the function name would
be mangled to ensure it's uniqueness. Ruby expects a function of this exact name
when loading or extension so mangling has to be turned off.
Macros
Quite a few important "functions" provided by the C API are in actual fact
macros. For example, to check the Ruby type pointed to be a VALUE
we have the
rb_check_type
function. However, this will raise a TypeError
if the
assertion is false. What about if we want a boolean response? Perhaps we do
different things based on the argument type (this is pretty common in Ruby
interfaces). The C API provides the RB_TYPE_P
macro. Macros are processed by
the C preprocessor so aren't usable in Rust. So, what we need to do is define
some wrappers, in C, then generate bindings for those to use in Rust. For example:
// wrapper.h
#include <stdbool.h>
#include "ruby.h"
bool raw_is_type(VALUE v, int t);
// wrapper.c
#include "./wrapper.h"
bool raw_is_type(VALUE v, int t)
{
return RB_TYPE_P(v, t);
}
Generating bindings against this we can then use the raw_is_type
. I'd
recommend then giving it a Rust wrapper so you don't have to cast the
ruby_value_type
enum every time you want to use it:
pub unsafe fn is_type(v: VALUE, typ: ruby_value_type) -> bool {
raw_is_type(v, typ as i32)
}
You will need to compile this wrapper so that there is a object file to link against, the easiest for this is to use the cc crate, it's pretty easy to use, checkout the build.rs. This isn't something particular to Rust-Ruby interop, it's a Rust-C interop thing and is documented well in The Embedded Rust Book
One example of a Ruby gem in Rust I found in my search was Steve Klabnik's RustExample. It hasn't been updated in a while but one thing at the bottom of a README in the repository stuck out to me:
Re-implementing all this in Rust seems error prone, so I think that a thin C layer which uses this stuff and then passes the "regular C stuff" to Rust makes more sense than reimplementing stuff like this just to get rid of the C.
I think that this problem is part of what Steve was referring to. For the purposes of playing around making this gem I wanted as much as possible to be written in Rust but I think for maintainability writing a shim layer in C would be much easier. It would be really annoying to write wrappers for all the macros the C API expects to be used.
Testing
I think a good idea for writing an extension in Rust, and probably any language,
is to try and contain usage of the Ruby C API to a thin layer around the outside
of the actual logic. That will, among other things, make it much easier to test.
In the hello
crate, the say_it
function is what does the actual work, the
rest is wrangling the Ruby API. We can write tests for it how we usually would.
#[cfg(test)]
mod tests {
use crate::say_it;
#[test]
fn en_locale() {
let result = say_it("en").unwrap();
assert_eq!(result, "Hello, world!");
}
}
Sometimes it might be nice to write tests that involve using the Ruby API,
though they could often be written as tests in Ruby allowing you to also test
the Ruby interface. To write tests that involve Ruby in Rust, you need to make
sure that the ruby_init
function is invoked before you using any of the rb_
prefixed functions.
#[cfg(test)]
mod test {
use ruby_test::ruby_test;
use super::*;
use std::ffi::CString;
#[test]
fn is_type_string() {
unsafe { ruby_init() };
let rs = "test";
let len = rs.len();
let s = CString::new(rs).unwrap();
let v = unsafe {
rb_utf8_str_new(
s.as_ptr(),
len as std::os::raw::c_long
)
};
let result = unsafe {
is_type(
v,
rb_sys::ruby_value_type::RUBY_T_STRING,
)
};
assert!(result)
}
}
Debugging
Lets be real - it's now always going to be this simple. When we're developing a
real extension there are going to be bugs and we're going to have to dig into
them one way or another. We should be able to use all the tools at our disposal,
in Rust that includes lldb
(wrapped up in rust-lldb
). For our hello
gem we
can drop into a breakpoint in say_it
with:
bundle exec rust-lldb --one-line 'b say_it' --oneline 'process launch' -- ruby -e 'require "hello"; puts Hello.say_it :en'
You can do the same with rust-gdb
if that's what you prefer. I crafted this
command from this blog
post
which uses gdb
.
You can also run your specs in a similar way:
bundle exec rust-lldb --one-line 'b hello::say_it' --one-line 'process launch' -- ruby -e 'require "rspec"; RSpec::Core::Runner.run(Dir["**/*_spec.rb"])'
Note: The arguments to
RSpec::Core::Runner.run
are the same as how you'd invokerspec
on the command line (they're just an array)
Inspiration
The way I figured much of this out was looking at semian and psych (gems that contain extensions written in C) and working out how that translates to Rust.
Last word
Implementing this extension I wanted as "raw" an experience as possible, using
rb-sys
which is just Ruby bindings with no sugar on top. There are crates,
like rutie
that provide a more
polished experience, allowing you to define modules, classes, etc. in a more
declarative way. I think in general these would make it easier if you're
developing a larger extension but I'm not sure there would be the same incentive
to separate your Ruby interface code from your logic code which would make
testing more difficult. I'm interested to try crates like this out as I want to
further explore writing extensions in Rust for some software reliability
projects I have in mind.