Skip to content

Stabilize Rng and SystemRng#157168

Open
joshtriplett wants to merge 1 commit into
rust-lang:mainfrom
joshtriplett:stabilize-random-source
Open

Stabilize Rng and SystemRng#157168
joshtriplett wants to merge 1 commit into
rust-lang:mainfrom
joshtriplett:stabilize-random-source

Conversation

@joshtriplett

@joshtriplett joshtriplett commented May 30, 2026

Copy link
Copy Markdown
Member

View all comments

Stabilization report

This partial stabilization provides enough of an interface for people to obtain random bytes, which is a common need in the ecosystem, currently fulfilled via the getrandom crate.

There have been many requests for a fill_bytes interface in the standard library. Per previous libs-api discussions, SystemRng.fill_bytes can serve that function, rather than adding a separate free function.

Alternatives and Future Work

Uninitialized buffers

We're likely to add a fill_buf function to fill a BorrowedCursor<'_, u8>. We can do so once BorrowedBuf/BorrowedCursor is stable. Deferring this means we will need to support trait impls that provide fill_bytes but not fill_buf, which we might not need to if we waited until after BorrowedBuf/BorrowedCursor is stable. However, that isn't any worse of a problem than we already have with io::Read, and we don't necessarily want to couple the stabilization of BorrowedBuf/BorrowedCursor with

Distributions

The Distribution trait and the random function remain unstable; those don't need to block stabilization of Rng and SystemRng.

Optimized paths for u32/u64

Some RNGs can provide faster results for generating a whole u32/u64 rather than individual bytes.

The definition and documentation of fill_bytes says:

Note that calling fill_bytes multiple times is not equivalent to calling fill_bytes once
with a larger buffer. A RandomSource is allowed to return different bytes for those two
For instance, this allows a RandomSource to generate a word at a time and throw
of it away if not needed.

We hope that this will allow RNGs that can generate whole words to do so efficiently as a fast path in fill_bytes/fill_buf. If dedicated next_u32/next_u64 functions still end up being substantially faster, we can always add them as optional trait methods in the future.

Some experimentation suggests that it's possible to match the performance.

Result versus panicking

There's been extensive discussion about whether the function should return a Result rather than panicking, or providing an additional such function. The previous conclusion from libs-api was that while it's possible for the first such call to fail (e.g. because the OS or sandbox provides no access to randomness at all), subsequent calls should never fail, and user code will not be prepared to deal with such failure.

Furthermore, an API returning Result would propagate throughout higher-level calls, forcing operations as simple as "roll a d20" to either return Result or call expect/unwrap. And even providing a try variant will lead to higher-level APIs having to consider which variant to call. We should, instead, make the guarantee that a well-behaved underlying OS won't panic after the first call.

Note, in particular, that HashMap already fails via panic if it can't get data from its RandomState.

If there's a need to allow error recovery for the "no OS/sandbox support" case, we could provide a one-time call to check for an error. Or, such users could continue using getrandom or the underlying OS APIs.

If we did want to make every call fallible, we have the capability, using upcoming language features ("supertrait auto impl"), to add a TryRng supertrait without breaking backwards compatibility.

@joshtriplett joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 30, 2026
@rustbot rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 30, 2026
@rust-log-analyzer

This comment has been minimized.

@joshtriplett joshtriplett force-pushed the stabilize-random-source branch from 18dd02e to eb9d7c8 Compare May 30, 2026 19:04
@jdahlstrom

Copy link
Copy Markdown

If it’s only the first call that can fail, could we put DefaultRandomSource behind a fallible constructor and guarantee that fill_bytes won’t panic?

@joshtriplett

Copy link
Copy Markdown
Member Author

@jdahlstrom That would force every caller to deal with it, albeit only once. If we (in the future) provide a fallible have_random function or similar, then people who want to rule out failure can call that (and the standard library can make sure it gets evaluated only once), but most users won't have to care about it.

@joshtriplett

Copy link
Copy Markdown
Member Author

I'm un-marking this as a draft.

Based on experiments with next_u32/next_u64, it's not clear we need them for performance. (Thanks to @hanna-kruppe for providing crates and benchmarking to help explore this! I got nerdsniped into doing some optimization on chacha8rand as a result, to verify this.)

As for fill_buf, it definitely has some value (based on benchmarks), but that doesn't mean we should block waiting on it.

@joshtriplett joshtriplett marked this pull request as ready for review May 31, 2026 17:28
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 31, 2026
@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 31, 2026
@rustbot

rustbot commented May 31, 2026

Copy link
Copy Markdown
Collaborator

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @ChrisDenton, libs
  • @ChrisDenton, libs expanded to 8 candidates
  • Random selection from Mark-Simulacrum, jhpratt

@joshtriplett joshtriplett added I-libs-api-nominated Nominated for discussion during a libs-api team meeting. S-waiting-on-t-libs-api Status: Awaiting decision from T-libs-api and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 31, 2026
@hanna-kruppe

hanna-kruppe commented May 31, 2026

Copy link
Copy Markdown
Contributor

cc @dhardy @newpavlov

@newpavlov

newpavlov commented May 31, 2026

Copy link
Copy Markdown
Contributor

Personally, I do not support this stabilization.

The most pressing needs can be alleviated by stabilizing a free-standing (potentially panicking) fill_bytes function. The API ignores rand_core experience and misses the SeedableRng/CryptoRng traits important in practice. I also believe there should be a clear future path for overriding the "default" RNG and using it on no_std targets.

We could call this Rng/DefaultRng rather than RandomSource/DefaultRandomSource.

IMO they should be named Rng/SysRng. For my taste, RandomSource/DefaultRandomSource are simply abhorrent.

Optimized paths for u32/u64

I don't think that added next_u32/u64 methods should have blanket impls, see here.

The previous conclusion from libs-api was that while it's possible for the first such call to fail (e.g. because the OS or sandbox provides no access to randomness at all), subsequent calls should never fail, and user code will not be prepared to deal with such failure.

This does not apply to HW-based RNGs used in cryptography. Not only they are IO-based, but also commonly use internal security checks. The same somewhat applies to RNGs built-in into CPUs. For example, RDRAND may in theory fail at any moment and some buggy AMD CPUs are known to produce bad values (e.g. after hybernation) which are guarded against with runtime checks.

In some niche cases it's also important to prove absence of panics and the suggested potentially panicking behavior will be an annoying hindrance.

Checking for errors could also be useful in scenarios where we mix entropy from different sources where failure of one source does not stop the system.

Comment thread library/core/src/random.rs Outdated
/// A source of randomness.
#[unstable(feature = "random", issue = "130703")]
#[stable(feature = "random_source", since = "CURRENT_RUSTC_VERSION")]
pub trait RandomSource {

@hanna-kruppe hanna-kruppe May 31, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding next_u32/next_u64, while I really want DefaultRandomSource.fill_bytes (in some form) stabilized ASAP, I have reservations about leaving it at "it's not clear we need them for performance and we can add them later". Personally I'd rather err on the side of adding these methods, unless we're quite sure we will never need them, or it's clear that we can't resolve the question in a reasonable time frame.

Adding the methods after stabilization has a cost (even besides opportunity cost). As @dhardy pointed out in the past, adding provided method later means existing implementers that want to offer reproducibility (as in stability of produced values) can't override the provided methods without breaking reproducibility for users who started using those methods. And for libraries that use RNGs to sample some distribution and want to promise reproducibility of that sampling, the same problem applies if they're first written against fill_bytes and later want to use next_uN.

Another (smaller) reason to err on the side of including these is to ease the ecosystem's transition from rand traits (which have always had next_u32/u64) to the std trait. If std doesn't have the methods at first and adds them later, that's two unnecessarily transitions (rand::Rng::next_uN -> fill_bytes + uN::from_*e_bytes -> RandomSource::next_uN). Stabilizing some subset of distributions would avoid this, but the distributions are far from ready for stabilization.

Finally, while the benchmarks in #157193 and on Zulip don't have a smoking gun that the methods are necessary for performance, it's also not clear that we won't want them. Even those benchmarks show a benefit for dyn RandomSource (the only argument is whether you consider that compatible with "cares about performance"), and @dhardy previously mentioned that rand has benchmarks justifying the methods in rand's context. At minimum we should look at those benchmarks as well and see if the fill_bytes semantics (which I think matches rand's) actually works for those benchmarks as well.

View changes since the review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, it's possible to use inlining and https://doc.rust-lang.org/std/intrinsics/fn.is_val_statically_known.html to perform these optimisations without needing the API surface.

@hanna-kruppe hanna-kruppe May 31, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlining doesn't work for dyn RandomSource. And is_val_statically_known only helps when the two implementations that have the same behavior, but in this case, some potentially desirable optimizations change behavior. (Also, the intrinsic doesn't seem to have a clear path to being exposed on stable.)

@orlp

orlp commented May 31, 2026

Copy link
Copy Markdown
Contributor

I oppose this stabilization, as I've mentioned before I don't think we are at a point where we want to stabilize traits or anything that represents or implies a canonical "way to do random number generation". The current proposal with RandomSource being a trait and potentially having multiple sources with the provided one only "being a default" is way too close to that.

There is one real need from the standard library: a (no_std overridable) source of random bytes. This should simply be a function without further baggage or API precedent.

Only once we have a clear view of what an opinionated std random API should look like should we stabilize any generic traits, distributions or generators. Not a piece-wise stabilization that will only end up shooting us in the foot later.

@dhardy

dhardy commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Based on experiments with next_u32/next_u64, it's not clear we need them for performance. (Thanks to @hanna-kruppe for providing crates and benchmarking to help explore this! I got nerdsniped into doing some optimization on chacha8rand as a result, to verify this.)

next_u32/next_u64 aren't important for block-PRNGs like ChaCha. They're for word-PRNGs like Xoshiro, PCG, SFC. (It's still unclear to me whether the RandomSource trait is supposed to be a complete replacement for rand_core::Rng or only a trait over entropy sources, though in the latter case I don't see much justification in using a trait over a free function.)

To quote the DefaultRandomSource docs:

If security is a concern, consult the platform documentation below for the specific guarantees your target provides.

This is rather vague. Would a report of a defective implementation be considered a security issue? E.g. esp_fill_random advertises "true random values" but only given some additional criteria which might enable some form of side-channel attack (or worse, given that the documentation doesn't specify that the underlying PRNG is a CSPRNG).

But my biggest concern is what happens on unsupported platforms, e.g. wasm32-unknown-unknown? I think if the rand crate were to switch to DefaultRandomSource over the getrandom crate we'd have a minor rebellion (fork) due to the lost support for wasm32-unknown-unknown (this isn't the only target of concern, but by far the most widely used from the feedback we've had).

@hanna-kruppe

Copy link
Copy Markdown
Contributor

next_u32/next_u64 aren't important for block-PRNGs like ChaCha. They're for word-PRNGs like Xoshiro, PCG, SFC.

This was my understanding as well, but when I sat down and worked through it, I couldn’t come up with a benchmark that shows a difference (between next_uN vs fill_bytes+uN::from_le_bytes). When the fill_bytes loop generates words and writes them to the buffer, I’d expect LLVM to simplify all of that away when fill_bytes can be inlined. There is a difference for the dyn Rng case since that prevents inlining, but that also hurts block based RNGs similarly. If rand has benchmarks that show something different, it would be great to know.

Maybe this has changed over time as LLVM has improved? The way rand derives fill_bytes from next_uN generically involves a lot of small fixed sized memcpys and LLVM has historically been been pretty bad at optimizing those (it’s still not great but much better now).

@ChrisDenton

Copy link
Copy Markdown
Member

But my biggest concern is what happens on unsupported platforms, e.g. wasm32-unknown-unknown? I think if the rand crate were to switch to DefaultRandomSource over the getrandom crate we'd have a minor rebellion (fork) due to the lost support for wasm32-unknown-unknown (this isn't the only target of concern, but by far the most widely used from the feedback we've had).

The specific problem with wasm32-unknown-unknown is its dual nature as both a -none target and a -web target. Which one it is depends on the user of the target. I don't think this is solvable by std without separating out those use cases. In the meantime, it seems unfortunate to block improvements to std on that.

@dhardy

dhardy commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@hanna-kruppe I tried benchmarking Xoshiro256++, Sfc32 and Sfc64 using rand_core::utils::next_word_via_fill to implement next_uN, and only Sfc64 was significantly slower (~10%). I then tried running the rand_distr benches with a modified Pcg64Mcg without significant affect on performance (thirty benches with <1% deltas, four 2-3% faster, nine 3-6% slower, one 15% slower). So, yes, it looks like LLVM can optimise over most performance issues of next_uN -> fill_bytes -> next_uN conversions; at least, on my (Zen 3) machine using static-dispatch with lots of function inlining.

If there's desire to use only a single method, I would consider using fill_words(&mut [u32; _]) instead. This guarantees alignment >= 4, and I can't recall ever seeing a use-case for less than one u32 word of output. The caveat is worse compatibility with every other RNG interface, but ultimately does that matter? Only a few trait impls (most in std and RNG libraries) are required and most users will want a higher-level interface like RngExt or choose anyway.

@ChrisDenton

Copy link
Copy Markdown
Member

Are there any code examples in the docs? If not it'd be great to add them before stabilisation.

@the8472

the8472 commented Jun 2, 2026

Copy link
Copy Markdown
Member

From zulip discussion, there seems to be some tension between goals

  • we want to promise cryptographic quality
  • people argue for the infallible API, which requires panicking when entropy can't be supplied
  • ESP-IDF can't unconditionally provide crypto-quality (thus has to panic)
  • libs-api has some precedent that stubbing APIs via panics can be something that disqualifies a target from advancing beyond tier 3 (but this isn't hard policy)
  • Promote tier 3 riscv32 ESP-IDF targets to tier 2 compiler-team#864

@tarcieri

tarcieri commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

people argue for the infallible API, which requires panicking when entropy can't be supplied

An alternative to panicking is to seed an infallible RNG from a fallible RNG. This at least defers the error condition to something that happens once up-front, and is avoidable thereafter.

I'd probably only recommend that for bare metal embedded use cases though. Anywhere you have a proper kernel entropy pool (and potentially have to worry about forking) you're better off using that.

@joshtriplett

Copy link
Copy Markdown
Member Author

I tried benchmarking Xoshiro256++, Sfc32 and Sfc64 using rand_core::utils::next_word_via_fill to implement next_uN, and only Sfc64 was significantly slower (~10%).

Can you provide the benchmarking code? I'd love to see if we can optimize that in the style of hanna-kruppe/chacha8rand#1 .

@dhardy

dhardy commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@joshtriplett here's a diff against rand_pcg code.
pcg128.diff.txt

@rust-bors

This comment has been minimized.

@joshtriplett joshtriplett force-pushed the stabilize-random-source branch from eb9d7c8 to 44519e1 Compare June 23, 2026 15:45
@newpavlov

newpavlov commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@Amanieu

Fallible random sources don't really exist in practice, and it would make the API worse if users have to handle a condition that never occurs in practice.

Plenty of sources may fail in practice. And it's not only IO-based generators, even RDRAND is technically fallible and it was encountered in practice (sure, it was a buggy CPU, but still). Just replace "random sources" with "allocators" in your comment to see the potential mess you intend to bake in.

With distinct rand_core-like TryRng/Rng traits users of infallible RNGs (or RNGs which use panics to hide potential unlikely errors) do not have to deal with potential errors.

Finally, I believe it does not make sense to introduce just the Rng trait. If RNG traits to be stabilized, I believe they should aim to completely replace rand_core. Otherwise we are going to end up with a weird situation where Rng is defined in core, but widely used SeedableRng and CryptoRng reside in rand_core.

@bjorn3

bjorn3 commented Jun 24, 2026

Copy link
Copy Markdown
Member

The hardware may be fallible in those cases, but the OS normally retries accessing the hardware source later and either blocks or keep serving from the already seeded CSPRNG when you try to get random numbers and adds entropy from other lower quality but guaranteed to exist sources like interrupt jitter (if interrupts stop your system is completely broken and no userspace apps run anyway).

@newpavlov

newpavlov commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Firstly, you can not assume what OS does. Most OSes do not make the infallibility guarantee (as an exception, modern Windows and Fuchsia do make such guarantee), for example, (IIRC) Hermit was simply forwarding RDRAND without any entropy mixing. On top of that in future we may have a way to override the system source (e.g. a cryptographic application may use an external IO-based certified RNG).

Secondly, RNG traits defined in core will be used with other RNGs as well, not only with SysRng.

@bjorn3

bjorn3 commented Jun 24, 2026

Copy link
Copy Markdown
Member

On Linux, the only documented error conditions for getrandom by glibc are either unconditional (not supported by the kernel, which would effectively be running the binary on a kernel not supported by libstd at all), if you ask for non-blocking (which libstd doesn't), should be immediately retried (EINTR) or you messed up the arguments to getrandom (and got yourself a memory safety bug). So for all practical purposes on Linux it will not fail.

Secondly, RNG traits defined in core will be used with other RNGs as well, not only with SysRng

I would expect the Rng trait in libstd to correspond to CryptoRng in rand_core.

@BurntSushi

Copy link
Copy Markdown
Member

I am also partial to a standalone function here providing a global resource for reasons similar to what @hanna-kruppe brought up. It's also hard to tell exactly, but it seems like this is the path preferred by the rust-random folks? Do I have that right? I'd be curious to get @newpavlov's take here. (Apologies if you've given it already elsewhere.)

@rfcbot concern global

@newpavlov

newpavlov commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Yes, I wrote about it in my first comment:

The most pressing needs can be alleviated by stabilizing a free-standing (potentially panicking) fill_bytes function.

I don't know about @dhardy.

So for all practical purposes on Linux it will not fail.

Except when getrandom gets blocked by seccomp (see rust-random/getrandom#229), or when libc::getrandom call gets intercepted, or whatever.

getrandom potentially returns an error. Period. You can either ignore it (very dangerous), or panic (problematic in some scenarios even if it's virtually unreachable in practice). And we have a plethora of other RNG sources and OSes outside of Linux.

@dhardy

dhardy commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

I agree with @newpavlov here.

The main concern I have is that without something like a #![system_rng] override, switching from getrandom to std will be a downgrade for anyone using platforms not supported by std.

Of course, if we can design an interface that works well enough for getrandom to use internally, then getrandom can just use std on most platforms, dropping most of its own code. The pressure to simplify the interface as much as possible (especially, no error handling) makes this more difficult to achieve.

@BurntSushi

Copy link
Copy Markdown
Member

Can we just have a try_fill_bytes in addition to fill_bytes?

@tarcieri

Copy link
Copy Markdown
Contributor

I also like the idea of a freestanding function which is infallible and doesn't panic on mainstream OSes, along with a fallible one if people want to avoid panics on more esoteric platforms.

Ideally embedded platforms could plug in their own infallible entropy pool that can seed itself from a fallible hardware RNG.

@newpavlov

newpavlov commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@BurntSushi

Can we just have a try_fill_bytes in addition to fill_bytes?

Personally, I am fine with it, but the main problem is whether io::Error is an appropriate error type for it. Relying on it would mean that we will not be able to expose try_fill_bytes in core (or some new sysroot crate).

So the simplest solution IMO would be to stabilize only an "infallible" potentially (but extremely unlikely in practice) panicking fill_bytes function first.

@tarcieri

doesn't panic on mainstream OSes

Did you mean "doesn't panic (in practice, but technically contains a panic path under the hood)"? Because I don't think we should ignore potential unexpected errors.

@bstrie

bstrie commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

I would like to see the stabilization report updated to include the exact APIs that are being proposed for stabilization, since given the long history of work in this space it's not immediately obvious that the Rng trait under consideration is only comprised of a single fill_bytes method.

To clarify, using a trait now would not prevent introducing a user-configurable EII in the future, yes? The source for SystemRng is already literally just this:

impl Rng for SystemRng {
    fn fill_bytes(&mut self, bytes: &mut [u8]) {
        sys::fill_bytes(bytes)
    }
}

So it seems as though introducing an EII would just mean replacing the private sys::fill_bytes above with the EII (or equivalently making this the EII and making it public).

Of course, the reverse is also true: we could add it as a free function now, and then add a trait later that calls the free function. I don't feel strongly either way.

Regarding whether or not it should return an error, I'm of two minds: on the one hand, I think the fact that our prior art, getrandom::fill, does return an Error suggests that this should be our default position, and any divergence from this design should be well-motivated. On the other hand, I acknowledge that I, personally, would never do anything but unwrap this result, although on balance I don't think that means that people who think otherwise should be denied the opportunity, if at all realistic. I would be content having both fill_bytes and try_fill_bytes if that's the compromise necessary to unblock this proposal.

Regarding the choice of error type, I'm loathe to suggest that we should block this issue on anything else, but I do want to note that io::Error in core appears to be coming along nicely: #154046

Overall I'm extremely excited at the prospect of receiving a properly standardized way to read from the system's entropy source; IMO it's been the single-most egregious omission of libstd ever since Tokio was caught initializing their PRNG with bits scavenged from the thread-local hasher state. Please continue pursuing this with a fervor!

@joshtriplett

Copy link
Copy Markdown
Member Author

Except when getrandom gets blocked by seccomp

In which case it will fail the very first time, producing a panic. And we've already talked about having a one-time "verify that we have a working RNG" call.

If someone sets up a filter that says "allow the first getrandom call but fail subsequent ones", well, we are not trying to guard against people aiming that carefully at their own foot.

And we are singularly unprepared to deal with an OS that passes on a raw hardware RNG together with a failing hardware RNG (e.g. RDRAND); we should not, for instance, try to do quality measurement on RNG output and reject things that look "stuck".

@joshtriplett

Copy link
Copy Markdown
Member Author

Can we just have a try_fill_bytes in addition to fill_bytes?

I'd continue to advocate that we don't, precisely because nobody's going to be able to do anything useful with it short of propagating the error. But if you feel strongly that we must, then we could add a separate TryRng with try_fill_bytes. I'd still advocate that over having standalone methods that would precommit us to an EII override mechanism.

@orlp

orlp commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

nobody's going to be able to do anything useful with it short of propagating the error

  • A CLI tool might want to print a nice error that randomness is unavailable and gracefully exit rather than crash.
  • A hasher using randomness to prevent HashDoS wants to ignore the error and use a fixed seed if randomness is unavailable. Because crashing here would be self-denying service, the precise thing it's supposed to prevent.
  • Fuzzy logic in medical firmware, aviation software or a similar critical no-fail environment might want to switch to a deterministic fallback algorithm while setting off an alert. This is preferable to crashing and letting the patient die or plane crash.

In which case it will fail the very first time, producing a panic.

The only way I can see us being able to guarantee that only the first call can fail is if on fallible platforms we wrap the system randomness source with our own infallible CSPRNG initialized on the first call.

That's certainly a possible strategy with trade-offs, but not one I'm sure we should commit to out of the box.

And we are singularly unprepared to deal with an OS that passes on a raw hardware RNG together with a failing hardware RNG (e.g. RDRAND); we should not, for instance, try to do quality measurement on RNG output and reject things that look "stuck".

I agree, but I'd like to note that the RDRAND instruction itself is fallible, it indicates its success using the carry flag.

@bjorn3

bjorn3 commented Jul 3, 2026

Copy link
Copy Markdown
Member

A CLI tool might want to print a nice error that randomness is unavailable and gracefully exit rather than crash.

This can be done by having a dedicated function to check if getting randomness is possible at all.

Fuzzy logic in medical firmware, aviation software or a similar critical no-fail environment might want to switch to a deterministic fallback algorithm while setting off an alert. This is preferable to crashing and letting the patient die or plane crash.

If those are using in the first place libstd, those should almost certainly be using catch_unwind liberally anyway. Libstd simply isn't designed to avoid all possible panics no matter how unlikely (eg Arc refcount overflow, bugs in libstd, the OS behaving weirdly) and even hard aborts (allocation failure, can unstably be a panic instead too) in error conditions that shouldn't be possible under normal circumstances. If you don't ever want to panic, restrict yourself to the subset of libcore that doesn't panic even in the face of bugs. libstd is designed for coarse grained (eg per-request or per cli invocation) fault recovery1 using catch_unwind and/or process restarts as makes sense for applications and services, not for fine grained fault recovery at every possible fault location as makes sense for safety-critical applications.

To get rid of panics for getting random numbers entirely you will also have to for example replace the read_exact call for /dev/urandom to not panic when the read syscall returns that it read more bytes than requested. You will also have to turn the panic when the poll syscall returns more ready fds than fds were passed in into an error. Both of these conditions are fundamental violations of the contract we have with the kernel and thus if anything I would argue we should abort instead of panic, but given that they currently panic you can use catch_unwind to catch them.

The only way I can see us being able to guarantee that only the first call can fail is if on fallible platforms we wrap the system randomness source with our own infallible CSPRNG initialized on the first call.

On Linux at least after the first successful read, any future unsuccessful read would almost certainly be a kernel or glibc bug. We can't reasonably defend against every possible kernel and glibc bug. We fundamentally have to make assumptions about our execution environment and violations of those assumptions will break things one way or another.

I agree, but I'd like to note that the RDRAND instruction itself is fallible, it indicates its success using the carry flag.

Linux only uses rdrand as entropy source for the kernel CSPRNG, so rdrand errors are handled transparently.

Footnotes

  1. With faults I mean software bugs (either in a bug in your program or the kernel violating their end of the contract) or hardware failures, not expected errors. For expected errors returning an Err is the sensible thing to do.

@orlp

orlp commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

To get rid of panics for getting random numbers entirely you will also have to for example replace the read_exact call for /dev/urandom to not panic when the read syscall returns that it read more bytes than requested. You will also have to turn the panic when the poll syscall returns more ready fds than fds were passed in into an error.

...so we are doing that then, right? Or is the proposed SystemRng still allowed to panic even if you checked std::random::SystemRng::is_available_bikeshed()?

There is no documented # Panics section anywhere I can see in this stabilization.

Must unsafe code be prepared that any call to SystemRng can panic? This means the following innocent-looking code can suddenly become unsound if the hasher were to use SystemRng, even after checking SystemRng::is_available_bikeshed():

unsafe {
    // <invalid state>
    let slice: &[u64] = ...;
    let unique_ids = slice.iter().collect::<HashSet<_>>();
    // <invalid state restored>
}

EDIT: I just noticed that in today's code this already can fail. I consider that a defect we should fix. RandomState::new does not claim it can panic, and it's not obvious that it can.


Or is the plan to use our own CSPRNG initialized with the system RNG after all, so that we can guarantee infallibility after the first call?

Because "infallible except on the first call, with an is_available_bikeshed to avoid the panic, except it can still actually panic for any number of reasons anyway" doesn't seem that good of a design to me.


For context, getrandom offers the following claim:

We strive to eliminate all potential panics from our backend implementations. In other words, when compiled with optimizations enabled, the generated binary code for getrandom functions should not contain any panic branches. Even if the platform misbehaves and returns an unexpected result, our code should correctly handle it and return an error, e.g. Error::UNEXPECTED.

@hanna-kruppe

Copy link
Copy Markdown
Contributor

That’s not specific to SystemRng. There is no implicit promise that standard library APIs won’t panic (or abort, for that matter). In particular, absence of a # Panics section does not guarantee anything. There are reasonable arguments that some “core” APIs must not panic to be fit for purpose and to enable any non-trivial panic-free Rust code to be written, even if it’s not explicitly stated, but generally we want to reserve the ability to introduce panics whenever there’s a good reason to. Unsafe code sections that are unsound in the face of panics should call into as little code as possible outside of the author’s immediate control.

Note that your particular example is already unsound today, because there’s multiple ways it can panic: capacity overflow in HashSet allocation, allocation failure (panic in OOM isn’t stable but std explicitly reserves the right to do it), and (as you noticed) internal errors in the code that obtains randomness for the default hasher.

@BurntSushi

Copy link
Copy Markdown
Member

Can we just have a try_fill_bytes in addition to fill_bytes?

I'd continue to advocate that we don't, precisely because nobody's going to be able to do anything useful with it short of propagating the error. But if you feel strongly that we must, then we could add a separate TryRng with try_fill_bytes.

I'm more or less registering this concern on behalf of the rust-random folks. It seems clear to me that they do strongly prefer a fallible mechanism here. In particular, my understanding of the comments from @dhardy and @newpavlov above is:

  1. They want to build getrandom on top of std.
  2. A lack of a fallible API makes this difficult to do.

Do we need a second trait for it though? Can we have Rng with one required routine, try_fill_bytes, and then fill_bytes is provided with a default implementation?

@rfcbot concern fallibility

I'd still advocate that over having standalone methods that would precommit us to an EII override mechanism.

Fair enough here. I re-read through the comments using this lens and I agree this seems like a not-great outcome. Particularly around not being able to rely on fill_bytes being a secure source of bytes. I think that's worse than the situation that @hanna-kruppe brought up regarding dependency injection. Namely, if we start with a fill_bytes free function, then our future progression seems complicated. If we reject EII, then we're back to the trait design. And if we do that, then we have a somewhat confusing story around what folks should be using. If we accept EII, then we run the risk of folks not being able to rely on the global resource not always providing the right guarantees.

Which leads me to conclude that a free function is the right path only if we can say for certainty that we'll never need another RNG source in std. I don't have enough domain knowledge to know whether that's the case... Could someone chime in on this point? @joshtriplett what future scenario do you envision where a fill_bytes free function would warrant EII?

@newpavlov

newpavlov commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

And we are singularly unprepared to deal with an OS that passes on a raw hardware RNG together with a failing hardware RNG (e.g. RDRAND); we should not, for instance, try to do quality measurement on RNG output and reject things that look "stuck".

No one says that std should implement such checks. The checks are already built-in into some RNG sources and they reserve the right to report an error at any moment during operation, so if you want the RNG trait(s) to be universal enough, then they should account for such cases. Any side-channel checks like the aforementioned "one-time call" are a horrible way to deal with this reality.

We already have the fallible allocator mess because arguments like "well, Linux overcommits by default, so we should not bother with potential allocation failures" won in the past, so I really hope that the history will not be repeated here.

Do we need a second trait for it though? Can we have Rng with one required routine, try_fill_bytes, and then fill_bytes is provided with a default implementation?

IMO the existing rand_core traits are good enough (see rust-random/rand_core#72 for remaining issues). The Rng trait is just a convenience wrapper around TryRng<Error=!>. It may be somewhat annoying to specify type Error = !; in most RNG implementations, but on the user side it's completely painless. Some new magic functionality (like "supertrait auto impl" mentioned in OP) may help to improve the design, but it's a separate question.

As I argued many times, I believe that the libs team should test RNG traits in rand_core first and then move them into core without any changes. But many users want for system entropy sources to be exposed in std now and a free-standing potentially-panicking function is a perfectly fine, simple, and least controversial solution for it.

@ChrisDenton

Copy link
Copy Markdown
Member

We already have the fallible allocator mess because arguments like "well, Linux overcommits by default, so we should not bother with potential allocation failures" won in the past, so I really hope that the history will not be repeated here.

I think the more relevant comparison is the default hasher, which can technically panic.

@BurntSushi

Copy link
Copy Markdown
Member

@newpavlov

a free-standing potentially-panicking function is a perfectly fine, simple, and least controversial solution for it.

Can you comment on the downsides of this please? In particular, the latter half of #157168 (comment)

I want to understand your position better. It helps to know how you view the downsides of this choice.

@newpavlov

newpavlov commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

As I see it, there are several minor concerns with a free-standing function:

  1. Potential panics. I agree that it's not a concern for most users, same as with potential fallibility of allocations. Use of this function in getrandom is likely to stay opt-in in the near future, so it's not a big conflict with the crate promises around panics.
  2. Lack of uninitialized buffer support. In the case of SystemRng we call external functions, so the compiler will not be able to optimize unnecessary buffer zeroization. But in practice zeroization cost is likely to be dwarfed by syscall cost.
  3. In future we may end up with several ways of doing the same (i.e. std::random::fill_bytes and std::random::SystemRng.fill_bytes), but we already have a similar situation with alloc, so it's probably fine.

I consider stabilization of a free-standing function as a good first step which alleviates most of the pressure from users who want to have a proper access to system randomness in std. It places less restrictions on std compared to stabilization of RNG trait(s) and opens time for proper development of RNG traits (mostly blocked on missing language features) and overridable SystemRng.

As a maintainer of rand_core and getrandom I hope that std/core eventually will supplant them completely for all use-cases. Until then, I believe they could play role of a good testing ground for developing RNG interfaces instead of rushing their stabilization in std. It also would make migration of the ecosystem from rand_core/getrandom to std/core easier.

Namely, if we start with a fill_bytes free function, then our future progression seems complicated

I am not sure I understand this part.

Right now, I imagine roughly the following design (it does not account for potential uninit buffer support or next_u32/u64):

// core
trait TryRng {
    type Error;
    fn try_fill_bytes(&mut self, buf: &mut [u8]) -> Result<(), Self::Error>;
    // potentially other methods
}

trait Rng: TryRng<Error = !> {
    fn fill_bytes(&mut self, buf: &mut [u8]);
    // ...
}

impl<T: TryRng<Error = !>> Rng for T { ... }

// Other RNG traits

// =========================
// core or a new sysroot crate

// alternatively, we could use `core::io::Error` instead
struct SystemRngError(RawOsError);

#[eii(random_fill_bytes)]
fn random_fill_bytes(buf: &mut [u8]) -> Result<(), SystemRngError> {
    Err(SystemRngError::UNIMPLEMENTED)
}

#[derive(Copy, Clone, ...)]
struct TrySystemRng;

impl TryRng for TrySystemRng {
    type Error = SystemRngError;
    fn try_fill_bytes(&mut self, buf: &mut [u8]) -> Result<(), Self::Error> {
        random_fill_bytes(buf)
    }
}

#[derive(Copy, Clone, ...)]
struct SystemRng;

impl TryRng for SystemRng {
    type Error = !;
    fn try_fill_bytes(&mut self, buf: &mut [u8]) -> Result<(), !> {
        random_fill_bytes(buf).map_err(|err| panic!("SysRng failure: {err}"))
    }
}

// Potentially another "insecure" RNG to expose `GRND_INSECURE`

// =========================
// std

fn fill_bytes(buf: &mut [u8]) {
    SystemRng.fill_bytes(buf)
}

Regardless of how RNG source override will be done under the hood, it should not influence the free function.

@tarcieri

tarcieri commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Regarding an infallible free function and panics: I think for all possible randomness sources it should be possible to have an infallible-after-initialization model which is also panic-free in practice after initialization, as an abstraction over various implementations of a hardware RNG-seeded CSRNG.

In the rust-random world this looks like SysRng (which is fallible and impls TryRng) seeding a ThreadRng (which is infallible and impls Rng), though ideally I think the free function would use an OS-provided entropy pool if it can be read from infallibly, so as to avoid issues around e.g. forking.

My understanding is on Linux, once getrandom(flags=0) succeeds it should continue succeeding, and while panic conditions may continue to live in the code, semantically they shouldn't happen practice. The main failure cases I'm aware of are in early boot (which I'm definitely not an expert in), or if the system call itself is blocked (by seccomp or other sandboxing mechanisms). Internally Linux is doing something quite similar to rust-random: both are using a ChaCha20 CSRNG seeded from hardware, and once seeded successfully the error conditions should go away.

I'm not the best person to ask about Windows but my understanding is it offers similar properties: from what I can tell the error cases for BCryptGenRandom are for invalid arguments and it should not fail in practice. The new ProcessPrng API seems to be completely infallible.

BSDs and their derivatives including MacOS all seem to have a similar story: arc4random* is completely infallible, and getentropy seems to be a similar case of returning errors for invalid arguments.

An MVP could hide everything behind the scenes and panic on any initialization failure, but ideally I think would abstract over this infallible-after-initialization model, and then down the road make initialization more pluggable so initialization errors can be explicitly handled rather than panicking (as well as e.g. allowing embedded targets to wire up their own hardware-seeded entropy pool, rather than relying on an OS-provided one, which might also be a good idea on operating systems that don't have an infallible OS-provided entropy pool).

Edit: I would be curious to know what OSes don't provide an infallible entropy pool on recent versions. Every one I spot checked (e.g. I just checked Solaris) seems to have a getrandom/getentropy-like API with these semantics in recent versions.

@newpavlov

newpavlov commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

I would strongly caution against using behavior of OSes as an argument for relying on the "infallible-after-initialization" model. Firstly, the default source may be overwritten with a custom potentially fallible source (I had an unfortunate "pleasure" of working in a project where it was necessary). Secondly, on embedded targets hardware sources may fail at any moment and it's not always desirable to pull a whole CSPRNG to provide a fallback.

std could easily provide a ThreadRng-like source, with potentially fallible initialization, but with infallible implementation of the RNG trait, but IMO it should be separate from SystemRng.

@tarcieri

tarcieri commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

on embedded targets hardware sources may fail at any moment and it's not always desirable to pull a whole CSPRNG.

I would generally caution against directly using the output of embedded hardware (T)RNGs. Many of them generate biased outputs even during normal operation, and that can be further exploited by things like physical attacks. "Whitening" the output of such RNGs by seeding a CSPRNG is standard practice.

@orlp

orlp commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What about a compromise? We can add both a fallible overridable cryptographically secure entropy device, and the convenient infallible Rng trait, with a blanket impl of the latter for the former.

That is is, we add the following to std::random:

/// When read from, returns uniformly random bytes.
///
/// # Safety
/// The random bytes read must be of cryptographic strength.
unsafe trait EntropySource : std::io::Read {}

/// Implements EntropySource, always the implementation from stdlib,
/// may unconditionally return ErrorKind::Unsupported.
pub struct SystemEntropy;

/// Implements EntropySource, defaults to SystemEntropy,
/// overridable with `#[global_entropy_source]`.
pub struct GlobalEntropy;

// Unchanged.
pub trait Rng {
    fn fill_bytes(&mut self, bytes: &mut [u8]);
}

// Can pass entropy source where Rng is expected.
impl<T: EntropySource> Rng for T {
    fn fill_bytes(&mut self, bytes: &mut [u8]) {
        self.read_exact(bytes).unwrap();
    }
}

@Kobzol

Kobzol commented Jul 4, 2026

Copy link
Copy Markdown
Member

Is convenience (i.e. not returning Result) an important factor for this specific API? I don't expect that most users would be using this low-level interface manually in their programs, I think that people usually just use a higher level interface to generate randomness. rand has 10x more dependant crates than getrandom.

@dhardy

dhardy commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

@orlp that's somewhat similar to the rand_core traits in design, though with a fixed error type (std::io::Error; #154046 tracks moving this type to core) and using unsafe (inappropriate and not particularly useful IMO).
Also, this is supposedly a stabilisation PR, not a development one.

std could easily provide a ThreadRng-like source, with potentially fallible initialization, but with infallible implementation of the RNG trait, but IMO it should be separate from SystemRng.

Agreed (though I'm unsure thread-locality is a sufficiently useful property in this case).

As a maintainer of rand_core and getrandom I hope that std/core eventually will supplant them completely for all use-cases. Until then, I believe they could play role of a good testing ground for developing RNG interfaces instead of rushing their stabilization in std. It also would make migration of the ecosystem from rand_core/getrandom to std/core easier.

In the near future I'd like to see std provide a random source, leaving getrandom to fill in where std is insufficient (no-std, maybe some use-cases wanting to fill an un-init buffer or use getrandom::u64).

In the longer term I'd be happy to see rand_core traits move into std (or core), but as mentioned there are some changes we might want to make to these traits depending on unfinished Rust features.

An important question here is whether we need rand_core::TryRng. It is used pretty-much exclusively used by getrandom::SysRng (ignoring impls with Error = Infallible, i.e. Rng). This isn't to say we don't need error handling, but that uses which do need error handling may not need to go through this trait.

[From @BurntSushi] Fair enough here. I re-read through the comments using this lens and I agree this seems like a not-great outcome. Particularly around not being able to rely on fill_bytes being a secure source of bytes. I think that's worse than the situation that @hanna-kruppe brought up regarding dependency injection. Namely, if we start with a fill_bytes free function, then our future progression seems complicated. If we reject EII, then we're back to the trait design. And if we do that, then we have a somewhat confusing story around what folks should be using. If we accept EII, then we run the risk of folks not being able to rely on the global resource not always providing the right guarantees.

Is it possible to require a cfg be set when using EII to provide an entropy source, as used by getrandom for custom backends? This makes it hard to overlook usage of a custom entropy source when buliding an application (some wasm-unknown users have complained about this being too tedious, but otherwise it appears to work for getrandom and rand).

As @tarcieri pointed out, some non-OS entropy sources are biased and should only be used to seed a user-space CSPRNG; I believe this should be the responsibility of the external EII impl since in other cases a user-space CSPRNG is not wanted.

At this point the design in my mind looks fairly similar to what is proposed for stabilisation here, with an additional method:

// Free function, supporting EII
// Must be unbiased on success.
// Uses core::io::Error type?
pub fn try_fill_bytes(&mut bytes) -> Result<(), Error>;

// Trait primarily for PRNGs
pub trait Rng {
    // Required method
    fn fill_bytes(&mut self, bytes: &mut [u8]);

    // Also next_u32(), next_u64() ?
}

// Infallible system RNG; try_fill_bytes may be used instead for a fallible API
pub struct SystemRng;
impl Rng for SystemRng { /* .. */ }

// Other PRNGs implementing Rng may be added in the future

There are compromises here (e.g. no getrandom::u64, no support for fill-uninit-buffer).

@bstrie

bstrie commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

I would like to suggest that notions of cryptographic suitability should be a separate discussion for a separate API. IMO what we're asking for here is I/O access to the system's entropy source. This is why I've been suggesting an API namespaced under std::io rather than std::random; for an MVP I'd even be content with a collection of platform-specific APIs under std::os, even if that would be step backward in usability from what the getrandom crate provides. The reason why I classify this omission as egregious is that the stdlib already includes all the code necessary to make use of these APIs (for HashMap), and yet it doesn't expose this functionality to user code.

However, if there are concerns that making a general-purpose high-level API is blocked on the question of cryptographic use cases (e.g. the Rng trait), then I'd be happy to completely punt on that high-level API for now. I agree with Kobzol that we should expect this to be a low-level I/O API that most users will not interact with directly, and the users that do interact with it will probably use it only once to seed a userspace PRNG (be it a CSPRNG or otherwise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. S-waiting-on-t-libs-api Status: Awaiting decision from T-libs-api T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.