|
| 1 | += Unambiguous types |
| 2 | + |
| 3 | +Most of these mappings are obvious, but there are some nuances and gotchas with |
| 4 | +Rust FFI (Foreign Function Interface). |
| 5 | + |
| 6 | +This document defines clear, one-to-one mappings between primitive types in C, |
| 7 | +Rust (and possible other languages in the future). Its purpose is to eliminate |
| 8 | +ambiguity in type widths, signedness, and binary representation across |
| 9 | +platforms and languages. |
| 10 | + |
| 11 | +For Git, the only header required to use these unambiguous types in C is |
| 12 | +`git-compat-util.h`. |
| 13 | + |
| 14 | +== Boolean types |
| 15 | +[cols="1,1", options="header"] |
| 16 | +|=== |
| 17 | +| C Type | Rust Type |
| 18 | +| bool^1^ | bool |
| 19 | +|=== |
| 20 | + |
| 21 | +== Integer types |
| 22 | + |
| 23 | +In C, `<stdint.h>` (or an equivalent) must be included. |
| 24 | + |
| 25 | +[cols="1,1", options="header"] |
| 26 | +|=== |
| 27 | +| C Type | Rust Type |
| 28 | +| uint8_t | u8 |
| 29 | +| uint16_t | u16 |
| 30 | +| uint32_t | u32 |
| 31 | +| uint64_t | u64 |
| 32 | + |
| 33 | +| int8_t | i8 |
| 34 | +| int16_t | i16 |
| 35 | +| int32_t | i32 |
| 36 | +| int64_t | i64 |
| 37 | +|=== |
| 38 | + |
| 39 | +== Floating-point types |
| 40 | + |
| 41 | +Rust requires IEEE-754 semantics. |
| 42 | +In C, that is typically true, but not guaranteed by the standard. |
| 43 | + |
| 44 | +[cols="1,1", options="header"] |
| 45 | +|=== |
| 46 | +| C Type | Rust Type |
| 47 | +| float^2^ | f32 |
| 48 | +| double^2^ | f64 |
| 49 | +|=== |
| 50 | + |
| 51 | +== Size types |
| 52 | + |
| 53 | +These types represent pointer-sized integers and are typically defined in |
| 54 | +`<stddef.h>` or an equivalent header. |
| 55 | + |
| 56 | +Size types should be used any time pointer arithmetic is performed e.g. |
| 57 | +indexing an array, describing the number of elements in memory, etc... |
| 58 | + |
| 59 | +[cols="1,1", options="header"] |
| 60 | +|=== |
| 61 | +| C Type | Rust Type |
| 62 | +| size_t^3^ | usize |
| 63 | +| ptrdiff_t^3^ | isize |
| 64 | +|=== |
| 65 | + |
| 66 | +== Character types |
| 67 | + |
| 68 | +This is where C and Rust don't have a clean one-to-one mapping. |
| 69 | + |
| 70 | +A C `char` and a Rust `u8` share the same bit width, so any C struct containing |
| 71 | +a `char` will have the same size as the corresponding Rust struct using `u8`. |
| 72 | +In that sense, such structs are safe to pass over the FFI boundary, because |
| 73 | +their fields will be laid out identically. However, beyond bit width, C `char` |
| 74 | +has additional semantics and platform-dependent behavior that can cause |
| 75 | +problems, as discussed below. |
| 76 | + |
| 77 | +The C language leaves the signedness of `char` implementation defined. Because |
| 78 | +our developer build enables -Wsign-compare, comparison of a value of `char` |
| 79 | +type with either signed or unsigned integers may trigger warnings from the |
| 80 | +compiler. |
| 81 | + |
| 82 | +Note: Rust's `char` type is an unsigned 32-bit integer that is used to describe |
| 83 | +Unicode code points. |
| 84 | + |
| 85 | +=== Notes |
| 86 | +^1^ This is only true if stdbool.h (or equivalent) is used. + |
| 87 | +^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the |
| 88 | +platform/arch for C does not follow IEEE-754 then this equivalence does not |
| 89 | +hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but |
| 90 | +there may be a strange platform/arch where even this isn't true. + |
| 91 | +^3^ C also defines uintptr_t, ssize_t and intptr_t, but these types are |
| 92 | +discouraged for FFI purposes. For functions like `read()` and `write()` ssize_t |
| 93 | +should be cast to a different, and unambiguous, type before being passed over |
| 94 | +the FFI boundary. + |
| 95 | + |
| 96 | +== Problems with std::ffi::c_* types in Rust |
| 97 | +TL;DR: In practice, Rust's `c_*` types aren't guaranteed to match C types for |
| 98 | +all possible C compilers, platforms, or architectures, because Rust only |
| 99 | +ensures correctness of C types on officially supported targets. These |
| 100 | +definitions have changed over time to match more targets which means that the |
| 101 | +c_* definitions will differ based on which Rust version Git chooses to use. |
| 102 | + |
| 103 | +Current list of safe, Rust side, FFI types in Git: + |
| 104 | + |
| 105 | +* `c_void` |
| 106 | +* `CStr` |
| 107 | +* `CString` |
| 108 | + |
| 109 | +Even then, they should be used sparingly, and only where the semantics match |
| 110 | +exactly. |
| 111 | + |
| 112 | +The std::os::raw::c_* directly inherits the problems of core::ffi, which |
| 113 | +changes over time and seems to make a best guess at the correct definition for |
| 114 | +a given platform/target. This probably isn't a problem for all other platforms |
| 115 | +that Rust supports currently, but can anyone say that Rust got it right for all |
| 116 | +C compilers of all platforms/targets? |
| 117 | + |
| 118 | +To give an example: c_long is defined in |
| 119 | +footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]] |
| 120 | +footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]] |
| 121 | + |
| 122 | +=== Rust version 1.63.0 |
| 123 | + |
| 124 | +``` |
| 125 | +mod c_long_definition { |
| 126 | + cfg_if! { |
| 127 | + if #[cfg(all(target_pointer_width = "64", not(windows)))] { |
| 128 | + pub type c_long = i64; |
| 129 | + pub type NonZero_c_long = crate::num::NonZeroI64; |
| 130 | + pub type c_ulong = u64; |
| 131 | + pub type NonZero_c_ulong = crate::num::NonZeroU64; |
| 132 | + } else { |
| 133 | + // The minimal size of `long` in the C standard is 32 bits |
| 134 | + pub type c_long = i32; |
| 135 | + pub type NonZero_c_long = crate::num::NonZeroI32; |
| 136 | + pub type c_ulong = u32; |
| 137 | + pub type NonZero_c_ulong = crate::num::NonZeroU32; |
| 138 | + } |
| 139 | + } |
| 140 | +} |
| 141 | +``` |
| 142 | + |
| 143 | +=== Rust version 1.89.0 |
| 144 | + |
| 145 | +``` |
| 146 | +mod c_long_definition { |
| 147 | + crate::cfg_select! { |
| 148 | + any( |
| 149 | + all(target_pointer_width = "64", not(windows)), |
| 150 | + // wasm32 Linux ABI uses 64-bit long |
| 151 | + all(target_arch = "wasm32", target_os = "linux") |
| 152 | + ) => { |
| 153 | + pub(super) type c_long = i64; |
| 154 | + pub(super) type c_ulong = u64; |
| 155 | + } |
| 156 | + _ => { |
| 157 | + // The minimal size of `long` in the C standard is 32 bits |
| 158 | + pub(super) type c_long = i32; |
| 159 | + pub(super) type c_ulong = u32; |
| 160 | + } |
| 161 | + } |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +Even for the cases where C types are correctly mapped to Rust types via |
| 166 | +std::ffi::c_* there are still problems. Let's take c_char for example. On some |
| 167 | +platforms it's u8 on others it's i8. |
| 168 | + |
| 169 | +=== Subtraction underflow in debug mode |
| 170 | + |
| 171 | +The following code will panic in debug on platforms that define c_char as u8, |
| 172 | +but won't if it's an i8. |
| 173 | + |
| 174 | +``` |
| 175 | +let mut x: std::ffi::c_char = 0; |
| 176 | +x -= 1; |
| 177 | +``` |
| 178 | + |
| 179 | +=== Inconsistent shift behavior |
| 180 | + |
| 181 | +`x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8. |
| 182 | + |
| 183 | +``` |
| 184 | +let mut x: std::ffi::c_char = 0x80; |
| 185 | +x >>= 1; |
| 186 | +``` |
| 187 | + |
| 188 | +=== Equality fails to compile on some platforms |
| 189 | + |
| 190 | +The following will not compile on platforms that define c_char as i8, but will |
| 191 | +if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get |
| 192 | +a warning on platforms that use u8 and a clean compilation where i8 is used. |
| 193 | + |
| 194 | +``` |
| 195 | +let mut x: std::ffi::c_char = 0x61; |
| 196 | +assert_eq!(x, b'a'); |
| 197 | +``` |
| 198 | + |
| 199 | +== Enum types |
| 200 | +Rust enum types should not be used as FFI types. Rust enum types are more like |
| 201 | +C union types than C enum's. For something like: |
| 202 | + |
| 203 | +``` |
| 204 | +#[repr(C, u8)] |
| 205 | +enum Fruit { |
| 206 | + Apple, |
| 207 | + Banana, |
| 208 | + Cherry, |
| 209 | +} |
| 210 | +``` |
| 211 | + |
| 212 | +It's easy enough to make sure the Rust enum matches what C would expect, but a |
| 213 | +more complex type like. |
| 214 | + |
| 215 | +``` |
| 216 | +enum HashResult { |
| 217 | + SHA1([u8; 20]), |
| 218 | + SHA256([u8; 32]), |
| 219 | +} |
| 220 | +``` |
| 221 | + |
| 222 | +The Rust compiler has to add a discriminant to the enum to distinguish between |
| 223 | +the variants. The width, location, and values for that discriminant is up to |
| 224 | +the Rust compiler and is not ABI stable. |
0 commit comments