Skip to content

Commit 6971934

Browse files
ezekielnewrengitster
authored andcommitted
doc: define unambiguous type mappings across C and Rust
Document other nuances when crossing the FFI boundary. Other language mappings may be added in the future. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 143f58e commit 6971934

3 files changed

Lines changed: 226 additions & 0 deletions

File tree

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,7 @@ TECH_DOCS += technical/shallow
140140
TECH_DOCS += technical/sparse-checkout
141141
TECH_DOCS += technical/sparse-index
142142
TECH_DOCS += technical/trivial-merge
143+
TECH_DOCS += technical/unambiguous-types
143144
TECH_DOCS += technical/unit-tests
144145
SP_ARTICLES += $(TECH_DOCS)
145146
SP_ARTICLES += technical/api-index

Documentation/technical/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ articles = [
3131
'sparse-checkout.adoc',
3232
'sparse-index.adoc',
3333
'trivial-merge.adoc',
34+
'unambiguous-types.adoc',
3435
'unit-tests.adoc',
3536
]
3637

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
= Unambiguous types
2+
3+
Most of these mappings are obvious, but there are some nuances and gotchas with
4+
Rust FFI (Foreign Function Interface).
5+
6+
This document defines clear, one-to-one mappings between primitive types in C,
7+
Rust (and possible other languages in the future). Its purpose is to eliminate
8+
ambiguity in type widths, signedness, and binary representation across
9+
platforms and languages.
10+
11+
For Git, the only header required to use these unambiguous types in C is
12+
`git-compat-util.h`.
13+
14+
== Boolean types
15+
[cols="1,1", options="header"]
16+
|===
17+
| C Type | Rust Type
18+
| bool^1^ | bool
19+
|===
20+
21+
== Integer types
22+
23+
In C, `<stdint.h>` (or an equivalent) must be included.
24+
25+
[cols="1,1", options="header"]
26+
|===
27+
| C Type | Rust Type
28+
| uint8_t | u8
29+
| uint16_t | u16
30+
| uint32_t | u32
31+
| uint64_t | u64
32+
33+
| int8_t | i8
34+
| int16_t | i16
35+
| int32_t | i32
36+
| int64_t | i64
37+
|===
38+
39+
== Floating-point types
40+
41+
Rust requires IEEE-754 semantics.
42+
In C, that is typically true, but not guaranteed by the standard.
43+
44+
[cols="1,1", options="header"]
45+
|===
46+
| C Type | Rust Type
47+
| float^2^ | f32
48+
| double^2^ | f64
49+
|===
50+
51+
== Size types
52+
53+
These types represent pointer-sized integers and are typically defined in
54+
`<stddef.h>` or an equivalent header.
55+
56+
Size types should be used any time pointer arithmetic is performed e.g.
57+
indexing an array, describing the number of elements in memory, etc...
58+
59+
[cols="1,1", options="header"]
60+
|===
61+
| C Type | Rust Type
62+
| size_t^3^ | usize
63+
| ptrdiff_t^3^ | isize
64+
|===
65+
66+
== Character types
67+
68+
This is where C and Rust don't have a clean one-to-one mapping.
69+
70+
A C `char` and a Rust `u8` share the same bit width, so any C struct containing
71+
a `char` will have the same size as the corresponding Rust struct using `u8`.
72+
In that sense, such structs are safe to pass over the FFI boundary, because
73+
their fields will be laid out identically. However, beyond bit width, C `char`
74+
has additional semantics and platform-dependent behavior that can cause
75+
problems, as discussed below.
76+
77+
The C language leaves the signedness of `char` implementation defined. Because
78+
our developer build enables -Wsign-compare, comparison of a value of `char`
79+
type with either signed or unsigned integers may trigger warnings from the
80+
compiler.
81+
82+
Note: Rust's `char` type is an unsigned 32-bit integer that is used to describe
83+
Unicode code points.
84+
85+
=== Notes
86+
^1^ This is only true if stdbool.h (or equivalent) is used. +
87+
^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the
88+
platform/arch for C does not follow IEEE-754 then this equivalence does not
89+
hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but
90+
there may be a strange platform/arch where even this isn't true. +
91+
^3^ C also defines uintptr_t, ssize_t and intptr_t, but these types are
92+
discouraged for FFI purposes. For functions like `read()` and `write()` ssize_t
93+
should be cast to a different, and unambiguous, type before being passed over
94+
the FFI boundary. +
95+
96+
== Problems with std::ffi::c_* types in Rust
97+
TL;DR: In practice, Rust's `c_*` types aren't guaranteed to match C types for
98+
all possible C compilers, platforms, or architectures, because Rust only
99+
ensures correctness of C types on officially supported targets. These
100+
definitions have changed over time to match more targets which means that the
101+
c_* definitions will differ based on which Rust version Git chooses to use.
102+
103+
Current list of safe, Rust side, FFI types in Git: +
104+
105+
* `c_void`
106+
* `CStr`
107+
* `CString`
108+
109+
Even then, they should be used sparingly, and only where the semantics match
110+
exactly.
111+
112+
The std::os::raw::c_* directly inherits the problems of core::ffi, which
113+
changes over time and seems to make a best guess at the correct definition for
114+
a given platform/target. This probably isn't a problem for all other platforms
115+
that Rust supports currently, but can anyone say that Rust got it right for all
116+
C compilers of all platforms/targets?
117+
118+
To give an example: c_long is defined in
119+
footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]]
120+
footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]]
121+
122+
=== Rust version 1.63.0
123+
124+
```
125+
mod c_long_definition {
126+
cfg_if! {
127+
if #[cfg(all(target_pointer_width = "64", not(windows)))] {
128+
pub type c_long = i64;
129+
pub type NonZero_c_long = crate::num::NonZeroI64;
130+
pub type c_ulong = u64;
131+
pub type NonZero_c_ulong = crate::num::NonZeroU64;
132+
} else {
133+
// The minimal size of `long` in the C standard is 32 bits
134+
pub type c_long = i32;
135+
pub type NonZero_c_long = crate::num::NonZeroI32;
136+
pub type c_ulong = u32;
137+
pub type NonZero_c_ulong = crate::num::NonZeroU32;
138+
}
139+
}
140+
}
141+
```
142+
143+
=== Rust version 1.89.0
144+
145+
```
146+
mod c_long_definition {
147+
crate::cfg_select! {
148+
any(
149+
all(target_pointer_width = "64", not(windows)),
150+
// wasm32 Linux ABI uses 64-bit long
151+
all(target_arch = "wasm32", target_os = "linux")
152+
) => {
153+
pub(super) type c_long = i64;
154+
pub(super) type c_ulong = u64;
155+
}
156+
_ => {
157+
// The minimal size of `long` in the C standard is 32 bits
158+
pub(super) type c_long = i32;
159+
pub(super) type c_ulong = u32;
160+
}
161+
}
162+
}
163+
```
164+
165+
Even for the cases where C types are correctly mapped to Rust types via
166+
std::ffi::c_* there are still problems. Let's take c_char for example. On some
167+
platforms it's u8 on others it's i8.
168+
169+
=== Subtraction underflow in debug mode
170+
171+
The following code will panic in debug on platforms that define c_char as u8,
172+
but won't if it's an i8.
173+
174+
```
175+
let mut x: std::ffi::c_char = 0;
176+
x -= 1;
177+
```
178+
179+
=== Inconsistent shift behavior
180+
181+
`x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8.
182+
183+
```
184+
let mut x: std::ffi::c_char = 0x80;
185+
x >>= 1;
186+
```
187+
188+
=== Equality fails to compile on some platforms
189+
190+
The following will not compile on platforms that define c_char as i8, but will
191+
if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get
192+
a warning on platforms that use u8 and a clean compilation where i8 is used.
193+
194+
```
195+
let mut x: std::ffi::c_char = 0x61;
196+
assert_eq!(x, b'a');
197+
```
198+
199+
== Enum types
200+
Rust enum types should not be used as FFI types. Rust enum types are more like
201+
C union types than C enum's. For something like:
202+
203+
```
204+
#[repr(C, u8)]
205+
enum Fruit {
206+
Apple,
207+
Banana,
208+
Cherry,
209+
}
210+
```
211+
212+
It's easy enough to make sure the Rust enum matches what C would expect, but a
213+
more complex type like.
214+
215+
```
216+
enum HashResult {
217+
SHA1([u8; 20]),
218+
SHA256([u8; 32]),
219+
}
220+
```
221+
222+
The Rust compiler has to add a discriminant to the enum to distinguish between
223+
the variants. The width, location, and values for that discriminant is up to
224+
the Rust compiler and is not ABI stable.

0 commit comments

Comments
 (0)