Skip to content

blk-flush: fix possibe deadlock when process nvme_timeout()#944

Open
blktests-ci[bot] wants to merge 1 commit into
linus-master_basefrom
series/1107792=>linus-master
Open

blk-flush: fix possibe deadlock when process nvme_timeout()#944
blktests-ci[bot] wants to merge 1 commit into
linus-master_basefrom
series/1107792=>linus-master

Conversation

@blktests-ci

@blktests-ci blktests-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown

Pull request for series with
subject: blk-flush: fix possibe deadlock when process nvme_timeout()
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1107792

@blktests-ci

blktests-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown
Author

Upstream branch: 979c294
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci

blktests-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Author

Upstream branch: acb7500
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from 1837d92 to 344bf73 Compare June 10, 2026 13:49
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 5e41a3b to c3a084b Compare June 10, 2026 20:26
@blktests-ci

blktests-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Author

Upstream branch: 9716c08
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from 344bf73 to 6a345ca Compare June 11, 2026 09:22
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from c3a084b to 5f78e5d Compare June 12, 2026 22:27
@blktests-ci

blktests-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Author

Upstream branch: 2a2974b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from 6a345ca to d2059a6 Compare June 12, 2026 22:44
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 5f78e5d to e48f9db Compare June 13, 2026 01:19
@blktests-ci

blktests-ci Bot commented Jun 13, 2026

Copy link
Copy Markdown
Author

Upstream branch: 062871f
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from d2059a6 to 065e66f Compare June 13, 2026 01:41
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch 2 times, most recently from 199644a to e6d9eb8 Compare June 17, 2026 12:02
@blktests-ci

blktests-ci Bot commented Jun 17, 2026

Copy link
Copy Markdown
Author

Upstream branch: 66affa3
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from 065e66f to d487453 Compare June 17, 2026 13:41
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from e6d9eb8 to 7d8604f Compare June 24, 2026 01:11
@blktests-ci

blktests-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Author

Upstream branch: bade58e
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from d487453 to 45c4cfe Compare June 24, 2026 01:39
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 7d8604f to 4cc45a3 Compare June 26, 2026 08:14
@blktests-ci

blktests-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Author

Upstream branch: 4edcdef
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from 45c4cfe to b663d19 Compare June 26, 2026 08:59
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 4cc45a3 to 90ffd56 Compare June 29, 2026 17:14
@blktests-ci

blktests-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Author

Upstream branch: dc59e4f
series: https://patchwork.kernel.org/project/linux-block/list/?series=1107792
version: 1

 There's when process nvme_timeout():
 [  206.734601][ T8184] nvme nvme0: I/O tag 512 (1200) opcode 0x0 (I/O Cmd) QID 3 timeout, aborting req_op:FLUSH(2) size:0
 [  206.736112][    C0] nvme nvme0: Abort status: 0x0
 [  208.094637][ T8184] nvme nvme0: I/O tag 512 (1200) opcode 0x0 (I/O Cmd) QID 3 timeout, reset controller

 [root@localhost ~]# cat /proc/8184/stack
 [<0>] msleep+0x37/0x50
 [<0>] blk_mq_tagset_wait_completed_request+0x6f/0xe0
 [<0>] nvme_cancel_tagset+0x79/0xa0
 [<0>] nvme_dev_disable+0x55c/0x7e0
 [<0>] nvme_timeout+0x25b/0x1530
 [<0>] blk_mq_handle_expired+0x210/0x2c0
 [<0>] bt_iter+0x2bb/0x360
 [<0>] blk_mq_queue_tag_busy_iter+0x9f8/0x1f30
 [<0>] blk_mq_timeout_work+0x5dc/0x7d0
 [<0>] process_one_work+0xa08/0x1d00
 [<0>] worker_thread+0x698/0xeb0
 [<0>] kthread+0x408/0x540
 [<0>] ret_from_fork+0xa4d/0xdd0
 [<0>] ret_from_fork_asm+0x1a/0x30

 Above issue may happen as follows:
 nvme_timeout  // tag 512 request's flush request the first timeout
   iod->aborted = 1;
   abort_req = nvme_alloc_request(dev->ctrl.admin_q, &cmd,
          BLK_MQ_REQ_NOWAIT, NVME_QID_ANY);  // Abort tag 512 flush request
   blk_execute_rq_nowait(abort_req->q, NULL, abort_req, 0, abort_endio);
      // Abort request completion, will no wait
         ....
  ****'abort_req' not complete***
         ....
 nvme_timeout  // tag 512 request's flush request the second timeout
  if (!nvmeq->qid || (iod->flags & IOD_ABORTED))
    nvme_req(req)->flags |= NVME_REQ_CANCELLED;
    goto disable;
      ...
    **** tag 512 request's flush request end ****
         nvme_try_complete_req
          blk_mq_complete_request_remote(req);
           WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
            ...
             nvme_end_req(req);
              blk_mq_end_request(req, status);
               __blk_mq_end_request(rq, error);
                if (rq->end_io)
                 rq->end_io(rq, error);
                  flush_end_io(rq, error);
                  // The timeout process holds the reference count.
                  // so request keep MQ_RQ_COMPLETE state
                   if (!refcount_dec_and_test(&flush_rq->ref))
                    fq->rq_status = error;
                    return;
    **** tag 512 flush request is MQ_RQ_COMPLETE state ****
 disable:
   nvme_dev_disable(dev, false);
     nvme_cancel_tagset(&dev->ctrl);
       blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request,
                               &dev->ctrl);
         nvme_cancel_request
           if (blk_mq_request_completed(req))
             return true;
      blk_mq_tagset_wait_completed_request(&dev->tagset);
        while (true)
          blk_mq_tagset_busy_iter(tagset,
                           blk_mq_tagset_count_completed_rqs, &count);
             blk_mq_tagset_count_completed_rqs();
             // request is MQ_RQ_COMPLETE state
                if (blk_mq_request_completed(rq))   // return true
                  (*count)++;
          if (!count) // So the value of 'count' is never 0, loop endless
              break;
          msleep(5);
The preceding problem occurs because the timeout processing flow holds
the reference count of the request, and the flush request is always in
the MQ_RQ_COMPLETE state due to the special nature of the flush request.
As a result, a dead loop occurs in the nvme_dev_disable() process.
To solve the preceding problem, if only the timeout processing flow holds
the reference count when the flush request times out, the request status
must be changed to MQ_RQ_IDLE in advance. In this way, it is safe to call
blk_mq_tagset_wait_completed_request () during the timeout processing.

Fixes: e1569a1 ("nvme: do not restart the request timeout if we're resetting the controller")
Signed-off-by: Ye Bin <yebin10@huawei.com>
@blktests-ci blktests-ci Bot force-pushed the series/1107792=>linus-master branch from b663d19 to 7bbba1e Compare June 29, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants