The story starts with Rust 1.85.0 giving me a surprise SIGTRAP in some test code. For reasons not important here, I had to bump opt-level from 0 to 1 for tests. That made a bunch of tests fail with SIGTRAP, whose default signal handler terminates the process.

The SIGTRAP happened in Rust code code that call C.

Passing a Rust closure through C code

The code that broke passes a Rust closure through a C function and invokes the closure in a callback.

The C API takes a user data argument in addition to the callback function, for passing through context:

// Calls (*proc)(data). VALUE is uintptr_t.
VALUE rb_protect(VALUE (* proc) (VALUE), VALUE data, int *pstate);

Each closure has a distinct anonymous type as they can vary in size depending on what’s captured. There is no way to separate out the captures and the function pointer to fit the C API. But, we can use a trait object to talk about a category of closure types, and the trait object has one uniform size:

fn closure_info(closure: impl FnMut()) {
    use std::mem::size_of_val;
    let trait_object: &dyn FnMut() = &closure;
    println!(
        "closure_size={} trait_object_size={}",
        size_of_val(&closure),
        size_of_val(&trait_object)
    );
}

fn main() {
    let mut int = 0;
    let one_capture = || int = 42;
    let no_capture = || {};
    closure_info(one_capture); // closure_size=8 trait_object_size=16
    closure_info(no_capture);  // closure_size=0 trait_object_size=16
    // Varying closure size, same trait_object_size.
}

The trait object is too big to fit the pointer-sized data argument, but we can solve that by taking a reference of it. That gives &mut &mut dyn FnMut(), a double reference to the trait object. &mut is pointer-sized, unlike &mut dyn. Great, we can now pass the closure through. Now we need to write the callback that invokes the closure.

The transmute

To call the closure, we need to first turn the VALUE back into the double reference to trait object. I used std::mem::transmute for this, and it blew up. Rust 1.85.0 compiles the following function to a single UD2 instruction which raises SIGTRAP.

#[repr(transparent)]
struct VALUE(usize);

extern "C" fn c_callback(obj: VALUE) {
    let closure: &mut &mut dyn FnMut() = unsafe { std::mem::transmute(obj) };
    closure();
}

Surprise SIGTRAPs like these from LLVM makes me think of trap mode in UndefinedBehaviorSanitizer. What rules did I break?

Probably pointer provenance

Integer to pointer transmute is documented to have unspecified behavior, but our struct VALUE(usize) definition is not an integer. Like many parts of Unsafe Rust, it’s hard to say how the transmute should have behaved. Rust 1.78.0 includes a change that seems to draw a clear line. “Lower transmutes from int to pointer type as gep on null” (“transmute patch” from here on) changes how transmute picks the provenance of the target pointer. Before, the transmute acted like an integer-to-pointer as cast, picking a previously exposed provenance. Now, it’s based on the null pointer. The null pointer is invalid for access, and so is the derived pointer. The SIGTRAP is probably trying to say, in an obtuse way, that I’m calling an invalid function pointer.

I had a hunch this is related to pointer provenance changes. I also remember from discussions about provenance something about writing a signature on the Rust side that uses a pointer type to accept an integer argument from the C side. I’m not sure if it’s a good idea to misrepresent types like this, but it’s useful to do it as an experiment to see if the SIGTRAP goes away:

// Experiment: the C side calls this with an integer (VALUE), but we claim it's a pointer
extern "C" fn c_callback(obj: *mut ()) {
    let closure: &mut &mut dyn FnMut() = unsafe { std::mem::transmute(obj) };
    closure();
}

And this version doesn’t SIGTRAP! So this is probably related to pointer provenance, rules that stipulate that in addition to having a good address, a pointer is valid for dereference only when obtained a certain way. The pointer in both the working version and the SIGTRAP version have the same address and in-memory representation, but only one is valid for dereference. The ptr module has documentation about provenance rules.

Searching in rust-lang/rust found me the transmute patch, and experiments on Compiler Explorer with various Rust versions showed 1.78.0 to be the first version that compiles the function to a single UD2 instruction. It’s the first release that includes the transmute patch.

But why does it SIGTRAP only on older Rusts? What changed?

Revert due to LLVM provenance bug

In the rust_has_provenance RFC, the lack of proper treatment of provenance in LLVM is stated as a drawback. Turns out, the transmute patch triggers such bugs and so was reverted in version 1.91.0. In some situations, the existence of a pointer with invalid provenance in the system can have LLVM confused and wrongly decide that an unrelated pointer is invalid for access. To be clear, the LLVM bugs are not the reason my code raises SIGTRAP; the transmute patch is designed to break such code. The unintended breakages from the transmute patch were due to the LLVM bugs giving bad output for code completely absent of Unsafe Rust.

There are many reports of code breakages due to the transmute patch, but the Rust team did not consider it to be a breaking change. I think it deserved a place in the release note as a compatibility issue. It also would have been nice if it panicked with a clear message rather than a nondescript SIGTRAP, but maybe the UD2 comes from a place too deep in LLVM’s pipeline to realistically replace.

With the revert, the SIGTRAP is gone. Absence of problematic runtime behavior does not imply absence of Undefined Behavior, though, and the revert did not change rules of the language. The current documentation for std::mem::transmute is clear about this subject:

Transmuting integers to pointers is a largely unspecified operation. It is likely not equivalent to an as cast. Doing non-zero-sized memory accesses with a pointer constructed this way is currently considered undefined behavior.

Let’s try to follow the rules.

The fix

Using an as cast instead of transmute avoids the SIGTRAP in all Rust versions:

extern "C" fn c_callback(obj: VALUE) {
    let closure = obj.0 as *const *mut dyn FnMut();
    unsafe { (**closure)() };
}

I understand why as cast works here better than transmute as follows: transmute works solely with the bits of the input value, but provenance is not represented in the input integer (integers have no provenance), so the output pointer has no provenance and is invalid for dereference. On the other hand, as casts can add things to the output not in the input. For casting from a smaller integer size to a larger one, it adds bits. Here, it adds provenance.

Ready for round two

I wrote code that triggers Undefined Behavior. We fix the code and move on, having learned a bit about pointer provenance language rules. If and when the transmute patch is reintroduced after the LLVM bugs are fixed or mitigated, our code will work.