Bugfix/Fix potential use after free in every multipassd request handler#4818
Bugfix/Fix potential use after free in every multipassd request handler#4818theartful wants to merge 1 commit intocanonical:mainfrom
Conversation
|
Interesting catch @theartful ! I will attempt to reproduce, although the cause may be something else. What you point out would require that the entire grpc stack unwinds and gets back to the client before the daemon stack unwinds a single level, at least from as far as I understand the code. The grpc channel is held in include/multipass/cli/command.h:87, but it does not look entirely impossible. I will get back to you with my findings. |
|
@tobe2098 yeah it's crazy, but the coredump indicates this. Thread 2 is the Qt thread, and you can see it's trying to remove the logger, but can't because it's already in use. |
|
Btw, I couldn't reproduce this using gcc, so try building with clang, and leave it running for a while. |
|
@tobe2098 Actually this is also can happen normally because gRPC doesn't use a single thread (it has a thread pool AFAIK). So it can also happen if we're processing multiple requests simultaneously. |
|
Interesting. You are correct, there is a race condition. The only time the logger is called from DaemonRpc is in handle_socket_restrictions and its constructor. Outside of the constructor, handle_socket_restrictions is only called in accept_cert, in incoming requests from the gRPC client. That means that the race condition is triggered by 1st an incoming request while the logger/channel is in an invalid state. According to the stack trace you posted on the issue, it seems like the logging call came at the exact moment the stack is unwinding from a previous request. That would cause a call to the ClientLogger in the stack which is being de-allocated. What do you think? The race condition is a great catch! It actually is quite rare but could reasonably happen under certain conditions. I am not confident we can reproduce it with the test though. I have been trying and nothing for now. |
|
I think a simple solution is not to use the logger at all in DaemonRpc, although it will still leave a bad taste that the logger in Daemon will be holding an invalid reference to the server. We also have to make sure that we don't use the logger in Daemon after setting the value of the promise. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a potential use-after-free in multipassd gRPC request handlers by ensuring the per-request ClientLogger is unregistered before the handler signals completion via the std::promise<grpc::Status> (which can unblock the gRPC thread and lead to stream destruction while the logger is still registered).
Changes:
- Introduces
PromiseLoggerGuardto couple per-requestClientLoggerlifetime to promise completion (reset logger, then set promise). - Replaces many direct
status_promise->set_value(...)calls withpromise_logger_guard.set_value(...)in synchronous request handlers. - Updates a few call sites to use the guard for explicit
logger.log(...)calls (e.g.wait_ready) and refactorscreate/launchto rely oncreate_vmfor the guarded logger.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The server is held in a unique_ptr in the client side. For the daemon to hold an invalid reference it would require that after the daemon sets the promise status the thread is unable to unwind until the grpc thread pool and client have completed the entire request. I understand the concern, and the fix you proposed would indeed address this. Although it is unrelated, we can include your fix with the PR, since the added safety to make sure the logger is de-registered is out is worthwhile when applicable. On the daemonrpc not calling the logger at all, that is not a good solution either, those logs are informative. Ideally logging would be somewhat thread_local to avoid these issues.
Yes, although your solution would cover that. As far as I am aware, setting the promise is only done at the end of the functions, but certainty is best. |
|
@tobe2098 Is multipass supposed to work with multiple clients?
|
73f026d to
c8907b0
Compare
|
This is an issue we are aware of, and we do not consider to be an issue. The logs are informative for both clients (both know another request is going on in the background), and it is otherwise unobtrusive. On the other hand, although using the GUI and CLI or 2 CLIs is possible, Multipass was never engineered to serve multiple clients simultaneously, it is something we are actively working on improving. |
|
@tobe2098 I manageed to reproduce on gcc (debug build 1a052d5) as well within 2 minutes of running this command, except this time we didn't reach the destructor (happened while in set_value). Edit: managed to reproduce in gcc release build as well gRPC thread Qt Thread |
|
I think a cleaner solution to this issue would be to create the logger in DaemonRpc before emitting the signal, and remove client logger creation from the Qt side. |
|
Great work @theartful , that shows the problem may be more difficult to diagnose. We are planning to do more analysis on this issue before we proceed. Moving the logger is a good solution in the abstract, but it would require a serious rewiring of the current architecture, which is also on our sights. |
Description
This was such a fun issue to debug! This might not be a proper fix, but it just serves as a proof of concept for a possible fix of the issue.
So essentially every grpc request handler has a potential for use after free.
In
DaemonRpcwe have:which emits a signal based on the request that the
Daemonclass handles. The arguments of these signal contains a pointer of the request, and a pointer for the output stream.Once the signal is emitted, the
DaemonRpcobject waits for the response by passing a promise to that signal, which gets handled by aDaemonobject.The issue here is that
DaemonandDaemonRpclive in different threads (Daemonlives in an event loop thread by Qt, whileDaemonRpclives in an event loop thread by gRPC). And onceDaemonsets the value for the promise, the thread can yield, and move toDaemonRpcand finish the request, deleting all related objects. Specifically the server stream.Now, the server stream is used by the loggers created by
Daemon, and in the time between setting the promise, and calling the destructor of the logger, another request can be processed, making a log fromDaemonRpc, and using the logger of the finished request fromDaemonwhich references the deleted server.Related Issue(s)
Closes #4806
Testing
I was able to reproduce the issue by running:
for a couple of minutes on a clang build of multipass. Now I can't reproduce it after leaving this running for an hour.
Additional Notes
I managed to get a coredump of the issue using clang and building with asan. The coredump shows two threads.
Interesting part of the stack trace of thread 1:
Interesting part of the stack trace of thread 2:
Checklist