Stabilize Rng and SystemRng#157168
Conversation
This comment has been minimized.
This comment has been minimized.
18dd02e to
eb9d7c8
Compare
|
If it’s only the first call that can fail, could we put |
|
@jdahlstrom That would force every caller to deal with it, albeit only once. If we (in the future) provide a fallible |
|
I'm un-marking this as a draft. Based on experiments with As for |
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
Personally, I do not support this stabilization. The most pressing needs can be alleviated by stabilizing a free-standing (potentially panicking)
IMO they should be named
I don't think that added
This does not apply to HW-based RNGs used in cryptography. Not only they are IO-based, but also commonly use internal security checks. The same somewhat applies to RNGs built-in into CPUs. For example, RDRAND may in theory fail at any moment and some buggy AMD CPUs are known to produce bad values (e.g. after hybernation) which are guarded against with runtime checks. In some niche cases it's also important to prove absence of panics and the suggested potentially panicking behavior will be an annoying hindrance. Checking for errors could also be useful in scenarios where we mix entropy from different sources where failure of one source does not stop the system. |
| /// A source of randomness. | ||
| #[unstable(feature = "random", issue = "130703")] | ||
| #[stable(feature = "random_source", since = "CURRENT_RUSTC_VERSION")] | ||
| pub trait RandomSource { |
There was a problem hiding this comment.
Regarding next_u32/next_u64, while I really want DefaultRandomSource.fill_bytes (in some form) stabilized ASAP, I have reservations about leaving it at "it's not clear we need them for performance and we can add them later". Personally I'd rather err on the side of adding these methods, unless we're quite sure we will never need them, or it's clear that we can't resolve the question in a reasonable time frame.
Adding the methods after stabilization has a cost (even besides opportunity cost). As @dhardy pointed out in the past, adding provided method later means existing implementers that want to offer reproducibility (as in stability of produced values) can't override the provided methods without breaking reproducibility for users who started using those methods. And for libraries that use RNGs to sample some distribution and want to promise reproducibility of that sampling, the same problem applies if they're first written against fill_bytes and later want to use next_uN.
Another (smaller) reason to err on the side of including these is to ease the ecosystem's transition from rand traits (which have always had next_u32/u64) to the std trait. If std doesn't have the methods at first and adds them later, that's two unnecessarily transitions (rand::Rng::next_uN -> fill_bytes + uN::from_*e_bytes -> RandomSource::next_uN). Stabilizing some subset of distributions would avoid this, but the distributions are far from ready for stabilization.
Finally, while the benchmarks in #157193 and on Zulip don't have a smoking gun that the methods are necessary for performance, it's also not clear that we won't want them. Even those benchmarks show a benefit for dyn RandomSource (the only argument is whether you consider that compatible with "cares about performance"), and @dhardy previously mentioned that rand has benchmarks justifying the methods in rand's context. At minimum we should look at those benchmarks as well and see if the fill_bytes semantics (which I think matches rand's) actually works for those benchmarks as well.
There was a problem hiding this comment.
IIRC, it's possible to use inlining and https://doc.rust-lang.org/std/intrinsics/fn.is_val_statically_known.html to perform these optimisations without needing the API surface.
There was a problem hiding this comment.
Inlining doesn't work for dyn RandomSource. And is_val_statically_known only helps when the two implementations that have the same behavior, but in this case, some potentially desirable optimizations change behavior. (Also, the intrinsic doesn't seem to have a clear path to being exposed on stable.)
|
I oppose this stabilization, as I've mentioned before I don't think we are at a point where we want to stabilize traits or anything that represents or implies a canonical "way to do random number generation". The current proposal with There is one real need from the standard library: a (no_std overridable) source of random bytes. This should simply be a function without further baggage or API precedent. Only once we have a clear view of what an opinionated |
To quote the
This is rather vague. Would a report of a defective implementation be considered a security issue? E.g. But my biggest concern is what happens on unsupported platforms, e.g. |
This was my understanding as well, but when I sat down and worked through it, I couldn’t come up with a benchmark that shows a difference (between Maybe this has changed over time as LLVM has improved? The way rand derives |
The specific problem with |
|
@hanna-kruppe I tried benchmarking Xoshiro256++, Sfc32 and Sfc64 using If there's desire to use only a single method, I would consider using |
|
Are there any code examples in the docs? If not it'd be great to add them before stabilisation. |
|
From zulip discussion, there seems to be some tension between goals
|
An alternative to panicking is to seed an infallible RNG from a fallible RNG. This at least defers the error condition to something that happens once up-front, and is avoidable thereafter. I'd probably only recommend that for bare metal embedded use cases though. Anywhere you have a proper kernel entropy pool (and potentially have to worry about forking) you're better off using that. |
Can you provide the benchmarking code? I'd love to see if we can optimize that in the style of hanna-kruppe/chacha8rand#1 . |
|
@joshtriplett here's a diff against rand_pcg code. |
This comment has been minimized.
This comment has been minimized.
eb9d7c8 to
44519e1
Compare
Plenty of sources may fail in practice. And it's not only IO-based generators, even RDRAND is technically fallible and it was encountered in practice (sure, it was a buggy CPU, but still). Just replace "random sources" with "allocators" in your comment to see the potential mess you intend to bake in. With distinct Finally, I believe it does not make sense to introduce just the |
|
The hardware may be fallible in those cases, but the OS normally retries accessing the hardware source later and either blocks or keep serving from the already seeded CSPRNG when you try to get random numbers and adds entropy from other lower quality but guaranteed to exist sources like interrupt jitter (if interrupts stop your system is completely broken and no userspace apps run anyway). |
|
Firstly, you can not assume what OS does. Most OSes do not make the infallibility guarantee (as an exception, modern Windows and Fuchsia do make such guarantee), for example, (IIRC) Hermit was simply forwarding RDRAND without any entropy mixing. On top of that in future we may have a way to override the system source (e.g. a cryptographic application may use an external IO-based certified RNG). Secondly, RNG traits defined in |
|
On Linux, the only documented error conditions for getrandom by glibc are either unconditional (not supported by the kernel, which would effectively be running the binary on a kernel not supported by libstd at all), if you ask for non-blocking (which libstd doesn't), should be immediately retried (EINTR) or you messed up the arguments to getrandom (and got yourself a memory safety bug). So for all practical purposes on Linux it will not fail.
I would expect the Rng trait in libstd to correspond to CryptoRng in rand_core. |
|
I am also partial to a standalone function here providing a global resource for reasons similar to what @hanna-kruppe brought up. It's also hard to tell exactly, but it seems like this is the path preferred by the rust-random folks? Do I have that right? I'd be curious to get @newpavlov's take here. (Apologies if you've given it already elsewhere.) @rfcbot concern global |
|
Yes, I wrote about it in my first comment:
I don't know about @dhardy.
Except when
|
|
I agree with @newpavlov here. The main concern I have is that without something like a Of course, if we can design an interface that works well enough for |
|
Can we just have a try_fill_bytes in addition to fill_bytes? |
|
I also like the idea of a freestanding function which is infallible and doesn't panic on mainstream OSes, along with a fallible one if people want to avoid panics on more esoteric platforms. Ideally embedded platforms could plug in their own infallible entropy pool that can seed itself from a fallible hardware RNG. |
Personally, I am fine with it, but the main problem is whether So the simplest solution IMO would be to stabilize only an "infallible" potentially (but extremely unlikely in practice) panicking
Did you mean "doesn't panic (in practice, but technically contains a panic path under the hood)"? Because I don't think we should ignore potential unexpected errors. |
|
I would like to see the stabilization report updated to include the exact APIs that are being proposed for stabilization, since given the long history of work in this space it's not immediately obvious that the To clarify, using a trait now would not prevent introducing a user-configurable EII in the future, yes? The source for SystemRng is already literally just this: impl Rng for SystemRng {
fn fill_bytes(&mut self, bytes: &mut [u8]) {
sys::fill_bytes(bytes)
}
}So it seems as though introducing an EII would just mean replacing the private Of course, the reverse is also true: we could add it as a free function now, and then add a trait later that calls the free function. I don't feel strongly either way. Regarding whether or not it should return an error, I'm of two minds: on the one hand, I think the fact that our prior art, Regarding the choice of error type, I'm loathe to suggest that we should block this issue on anything else, but I do want to note that Overall I'm extremely excited at the prospect of receiving a properly standardized way to read from the system's entropy source; IMO it's been the single-most egregious omission of libstd ever since Tokio was caught initializing their PRNG with bits scavenged from the thread-local hasher state. Please continue pursuing this with a fervor! |
In which case it will fail the very first time, producing a panic. And we've already talked about having a one-time "verify that we have a working RNG" call. If someone sets up a filter that says "allow the first And we are singularly unprepared to deal with an OS that passes on a raw hardware RNG together with a failing hardware RNG (e.g. RDRAND); we should not, for instance, try to do quality measurement on RNG output and reject things that look "stuck". |
I'd continue to advocate that we don't, precisely because nobody's going to be able to do anything useful with it short of propagating the error. But if you feel strongly that we must, then we could add a separate |
The only way I can see us being able to guarantee that only the first call can fail is if on fallible platforms we wrap the system randomness source with our own infallible CSPRNG initialized on the first call. That's certainly a possible strategy with trade-offs, but not one I'm sure we should commit to out of the box.
I agree, but I'd like to note that the |
This can be done by having a dedicated function to check if getting randomness is possible at all.
If those are using in the first place libstd, those should almost certainly be using catch_unwind liberally anyway. Libstd simply isn't designed to avoid all possible panics no matter how unlikely (eg Arc refcount overflow, bugs in libstd, the OS behaving weirdly) and even hard aborts (allocation failure, can unstably be a panic instead too) in error conditions that shouldn't be possible under normal circumstances. If you don't ever want to panic, restrict yourself to the subset of libcore that doesn't panic even in the face of bugs. libstd is designed for coarse grained (eg per-request or per cli invocation) fault recovery1 using catch_unwind and/or process restarts as makes sense for applications and services, not for fine grained fault recovery at every possible fault location as makes sense for safety-critical applications. To get rid of panics for getting random numbers entirely you will also have to for example replace the
On Linux at least after the first successful read, any future unsuccessful read would almost certainly be a kernel or glibc bug. We can't reasonably defend against every possible kernel and glibc bug. We fundamentally have to make assumptions about our execution environment and violations of those assumptions will break things one way or another.
Linux only uses rdrand as entropy source for the kernel CSPRNG, so rdrand errors are handled transparently. Footnotes
|
...so we are doing that then, right? Or is the proposed There is no documented Must unsafe {
// <invalid state>
let slice: &[u64] = ...;
let unique_ids = slice.iter().collect::<HashSet<_>>();
// <invalid state restored>
}EDIT: I just noticed that in today's code this already can fail. I consider that a defect we should fix. Or is the plan to use our own CSPRNG initialized with the system RNG after all, so that we can guarantee infallibility after the first call? Because "infallible except on the first call, with an For context,
|
|
That’s not specific to Note that your particular example is already unsound today, because there’s multiple ways it can panic: capacity overflow in |
I'm more or less registering this concern on behalf of the rust-random folks. It seems clear to me that they do strongly prefer a fallible mechanism here. In particular, my understanding of the comments from @dhardy and @newpavlov above is:
Do we need a second trait for it though? Can we have @rfcbot concern fallibility
Fair enough here. I re-read through the comments using this lens and I agree this seems like a not-great outcome. Particularly around not being able to rely on Which leads me to conclude that a free function is the right path only if we can say for certainty that we'll never need another RNG source in std. I don't have enough domain knowledge to know whether that's the case... Could someone chime in on this point? @joshtriplett what future scenario do you envision where a |
No one says that We already have the fallible allocator mess because arguments like "well, Linux overcommits by default, so we should not bother with potential allocation failures" won in the past, so I really hope that the history will not be repeated here.
IMO the existing As I argued many times, I believe that the libs team should test RNG traits in |
I think the more relevant comparison is the default hasher, which can technically panic. |
Can you comment on the downsides of this please? In particular, the latter half of #157168 (comment) I want to understand your position better. It helps to know how you view the downsides of this choice. |
|
As I see it, there are several minor concerns with a free-standing function:
I consider stabilization of a free-standing function as a good first step which alleviates most of the pressure from users who want to have a proper access to system randomness in As a maintainer of
I am not sure I understand this part. Right now, I imagine roughly the following design (it does not account for potential uninit buffer support or // core
trait TryRng {
type Error;
fn try_fill_bytes(&mut self, buf: &mut [u8]) -> Result<(), Self::Error>;
// potentially other methods
}
trait Rng: TryRng<Error = !> {
fn fill_bytes(&mut self, buf: &mut [u8]);
// ...
}
impl<T: TryRng<Error = !>> Rng for T { ... }
// Other RNG traits
// =========================
// core or a new sysroot crate
// alternatively, we could use `core::io::Error` instead
struct SystemRngError(RawOsError);
#[eii(random_fill_bytes)]
fn random_fill_bytes(buf: &mut [u8]) -> Result<(), SystemRngError> {
Err(SystemRngError::UNIMPLEMENTED)
}
#[derive(Copy, Clone, ...)]
struct TrySystemRng;
impl TryRng for TrySystemRng {
type Error = SystemRngError;
fn try_fill_bytes(&mut self, buf: &mut [u8]) -> Result<(), Self::Error> {
random_fill_bytes(buf)
}
}
#[derive(Copy, Clone, ...)]
struct SystemRng;
impl TryRng for SystemRng {
type Error = !;
fn try_fill_bytes(&mut self, buf: &mut [u8]) -> Result<(), !> {
random_fill_bytes(buf).map_err(|err| panic!("SysRng failure: {err}"))
}
}
// Potentially another "insecure" RNG to expose `GRND_INSECURE`
// =========================
// std
fn fill_bytes(buf: &mut [u8]) {
SystemRng.fill_bytes(buf)
}Regardless of how RNG source override will be done under the hood, it should not influence the free function. |
|
Regarding an infallible free function and panics: I think for all possible randomness sources it should be possible to have an infallible-after-initialization model which is also panic-free in practice after initialization, as an abstraction over various implementations of a hardware RNG-seeded CSRNG. In the My understanding is on Linux, once I'm not the best person to ask about Windows but my understanding is it offers similar properties: from what I can tell the error cases for BSDs and their derivatives including MacOS all seem to have a similar story: An MVP could hide everything behind the scenes and panic on any initialization failure, but ideally I think would abstract over this infallible-after-initialization model, and then down the road make initialization more pluggable so initialization errors can be explicitly handled rather than panicking (as well as e.g. allowing embedded targets to wire up their own hardware-seeded entropy pool, rather than relying on an OS-provided one, which might also be a good idea on operating systems that don't have an infallible OS-provided entropy pool). Edit: I would be curious to know what OSes don't provide an infallible entropy pool on recent versions. Every one I spot checked (e.g. I just checked Solaris) seems to have a |
|
I would strongly caution against using behavior of OSes as an argument for relying on the "infallible-after-initialization" model. Firstly, the default source may be overwritten with a custom potentially fallible source (I had an unfortunate "pleasure" of working in a project where it was necessary). Secondly, on embedded targets hardware sources may fail at any moment and it's not always desirable to pull a whole CSPRNG to provide a fallback.
|
I would generally caution against directly using the output of embedded hardware (T)RNGs. Many of them generate biased outputs even during normal operation, and that can be further exploited by things like physical attacks. "Whitening" the output of such RNGs by seeding a CSPRNG is standard practice. |
|
What about a compromise? We can add both a fallible overridable cryptographically secure entropy device, and the convenient infallible That is is, we add the following to /// When read from, returns uniformly random bytes.
///
/// # Safety
/// The random bytes read must be of cryptographic strength.
unsafe trait EntropySource : std::io::Read {}
/// Implements EntropySource, always the implementation from stdlib,
/// may unconditionally return ErrorKind::Unsupported.
pub struct SystemEntropy;
/// Implements EntropySource, defaults to SystemEntropy,
/// overridable with `#[global_entropy_source]`.
pub struct GlobalEntropy;
// Unchanged.
pub trait Rng {
fn fill_bytes(&mut self, bytes: &mut [u8]);
}
// Can pass entropy source where Rng is expected.
impl<T: EntropySource> Rng for T {
fn fill_bytes(&mut self, bytes: &mut [u8]) {
self.read_exact(bytes).unwrap();
}
} |
|
Is convenience (i.e. not returning |
|
@orlp that's somewhat similar to the
Agreed (though I'm unsure thread-locality is a sufficiently useful property in this case).
In the near future I'd like to see In the longer term I'd be happy to see An important question here is whether we need
Is it possible to require a cfg be set when using EII to provide an entropy source, as used by getrandom for custom backends? This makes it hard to overlook usage of a custom entropy source when buliding an application (some As @tarcieri pointed out, some non-OS entropy sources are biased and should only be used to seed a user-space CSPRNG; I believe this should be the responsibility of the external EII impl since in other cases a user-space CSPRNG is not wanted. At this point the design in my mind looks fairly similar to what is proposed for stabilisation here, with an additional method: // Free function, supporting EII
// Must be unbiased on success.
// Uses core::io::Error type?
pub fn try_fill_bytes(&mut bytes) -> Result<(), Error>;
// Trait primarily for PRNGs
pub trait Rng {
// Required method
fn fill_bytes(&mut self, bytes: &mut [u8]);
// Also next_u32(), next_u64() ?
}
// Infallible system RNG; try_fill_bytes may be used instead for a fallible API
pub struct SystemRng;
impl Rng for SystemRng { /* .. */ }
// Other PRNGs implementing Rng may be added in the futureThere are compromises here (e.g. no |
|
I would like to suggest that notions of cryptographic suitability should be a separate discussion for a separate API. IMO what we're asking for here is I/O access to the system's entropy source. This is why I've been suggesting an API namespaced under However, if there are concerns that making a general-purpose high-level API is blocked on the question of cryptographic use cases (e.g. the |
View all comments
Stabilization report
This partial stabilization provides enough of an interface for people to obtain random bytes, which is a common need in the ecosystem, currently fulfilled via the
getrandomcrate.There have been many requests for a
fill_bytesinterface in the standard library. Per previous libs-api discussions,SystemRng.fill_bytescan serve that function, rather than adding a separate free function.Alternatives and Future Work
Uninitialized buffers
We're likely to add a
fill_buffunction to fill aBorrowedCursor<'_, u8>. We can do so onceBorrowedBuf/BorrowedCursoris stable. Deferring this means we will need to support trait impls that providefill_bytesbut notfill_buf, which we might not need to if we waited until afterBorrowedBuf/BorrowedCursoris stable. However, that isn't any worse of a problem than we already have withio::Read, and we don't necessarily want to couple the stabilization ofBorrowedBuf/BorrowedCursorwithDistributions
The
Distributiontrait and therandomfunction remain unstable; those don't need to block stabilization ofRngandSystemRng.Optimized paths for
u32/u64Some RNGs can provide faster results for generating a whole
u32/u64rather than individual bytes.The definition and documentation of
fill_bytessays:We hope that this will allow RNGs that can generate whole words to do so efficiently as a fast path in
fill_bytes/fill_buf. If dedicatednext_u32/next_u64functions still end up being substantially faster, we can always add them as optional trait methods in the future.Some experimentation suggests that it's possible to match the performance.
Resultversus panickingThere's been extensive discussion about whether the function should return a
Resultrather than panicking, or providing an additional such function. The previous conclusion from libs-api was that while it's possible for the first such call to fail (e.g. because the OS or sandbox provides no access to randomness at all), subsequent calls should never fail, and user code will not be prepared to deal with such failure.Furthermore, an API returning
Resultwould propagate throughout higher-level calls, forcing operations as simple as "roll a d20" to either returnResultor callexpect/unwrap. And even providing atryvariant will lead to higher-level APIs having to consider which variant to call. We should, instead, make the guarantee that a well-behaved underlying OS won't panic after the first call.Note, in particular, that
HashMapalready fails via panic if it can't get data from itsRandomState.If there's a need to allow error recovery for the "no OS/sandbox support" case, we could provide a one-time call to check for an error. Or, such users could continue using
getrandomor the underlying OS APIs.If we did want to make every call fallible, we have the capability, using upcoming language features ("supertrait auto impl"), to add a
TryRngsupertrait without breaking backwards compatibility.