block: avoid potential deadlock on zone revalidation failure#1000
Open
blktests-ci[bot] wants to merge 1 commit into
Open
block: avoid potential deadlock on zone revalidation failure#1000blktests-ci[bot] wants to merge 1 commit into
blktests-ci[bot] wants to merge 1 commit into
Conversation
Author
|
Upstream branch: bade58e |
7d8604f to
4cc45a3
Compare
Author
|
Upstream branch: 4edcdef |
a484937 to
bcff6e6
Compare
4cc45a3 to
90ffd56
Compare
If revalidating the zones of a zoned block device with
blk_revalidate_disk_zones() fails during a SCSI disk rescan, the following
lockdep splat is thrown:
[ 347.251859] [ T11230] sda: failed to revalidate zones
[ 347.261380] [ T11230] ======================================================
[ 347.263882] [ T11230] WARNING: possible circular locking dependency detected
[ 347.266353] [ T11230] 7.1.0+ #1194 Not tainted
[ 347.268052] [ T11230] ------------------------------------------------------
[ 347.270537] [ T11230] tcsh/11230 is trying to acquire lock:
[ 347.272555] [ T11230] ffffffff8f91d400 (wq_pool_mutex){+.+.}-{4:4}, at: destroy_workqueue+0x15d/0x8d0
[ 347.275914] [ T11230]
but task is already holding lock:
[ 347.278646] [ T11230] ffff88812fa1bcc0 (&q->q_usage_counter(io)#5){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 347.282503] [ T11230]
which lock already depends on the new lock.
[ 347.286239] [ T11230]
the existing dependency chain (in reverse order) is:
[ 347.289408] [ T11230]
-> #2 (&q->q_usage_counter(io)#5){++++}-{0:0}:
[ 347.292437] [ T11230] blk_alloc_queue+0x5ca/0x750
[ 347.294379] [ T11230] blk_mq_alloc_queue+0x14c/0x240
[ 347.296375] [ T11230] scsi_alloc_sdev+0x871/0xd10 [scsi_mod]
[ 347.298619] [ T11230] scsi_probe_and_add_lun+0x600/0xc50 [scsi_mod]
[ 347.301056] [ T11230] __scsi_scan_target+0x187/0x3b0 [scsi_mod]
[ 347.303385] [ T11230] scsi_scan_channel+0xf2/0x180 [scsi_mod]
[ 347.305651] [ T11230] scsi_scan_host_selected+0x20b/0x2d0 [scsi_mod]
[ 347.308119] [ T11230] do_scan_async+0x42/0x420 [scsi_mod]
[ 347.310276] [ T11230] async_run_entry_fn+0x94/0x5a0
[ 347.312284] [ T11230] process_one_work+0x8da/0x1690
[ 347.314287] [ T11230] worker_thread+0x5fe/0x1010
[ 347.316216] [ T11230] kthread+0x358/0x450
[ 347.317675] [ T11230] ret_from_fork+0x5b9/0x8e0
[ 347.319181] [ T11230] ret_from_fork_asm+0x11/0x20
[ 347.320778] [ T11230]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 347.322890] [ T11230] fs_reclaim_acquire+0xd5/0x120
[ 347.324464] [ T11230] __kmalloc_cache_node_noprof+0x39/0x620
[ 347.326223] [ T11230] init_rescuer+0x19b/0x560
[ 347.327697] [ T11230] workqueue_init+0x33b/0x6a0
[ 347.329224] [ T11230] kernel_init_freeable+0x2eb/0x600
[ 347.330881] [ T11230] kernel_init+0x1c/0x140
[ 347.332334] [ T11230] ret_from_fork+0x5b9/0x8e0
[ 347.333847] [ T11230] ret_from_fork_asm+0x11/0x20
[ 347.335360] [ T11230]
-> #0 (wq_pool_mutex){+.+.}-{4:4}:
[ 347.337510] [ T11230] __lock_acquire+0xdea/0x2260
[ 347.339030] [ T11230] lock_acquire+0x187/0x2f0
[ 347.340495] [ T11230] __mutex_lock+0x1ab/0x2600
[ 347.341464] [ T11230] destroy_workqueue+0x15d/0x8d0
[ 347.342485] [ T11230] disk_free_zone_resources+0xd5/0x560
[ 347.343577] [ T11230] blk_revalidate_disk_zones+0x620/0xac7
[ 347.344723] [ T11230] sd_zbc_revalidate_zones+0x1dd/0x790 [sd_mod]
[ 347.345938] [ T11230] sd_revalidate_disk+0xc66/0x8e60 [sd_mod]
[ 347.347112] [ T11230] scsi_rescan_device+0x1f9/0x310 [scsi_mod]
[ 347.348318] [ T11230] store_rescan_field+0x19/0x20 [scsi_mod]
[ 347.349507] [ T11230] kernfs_fop_write_iter+0x3d2/0x5e0
[ 347.350565] [ T11230] vfs_write+0x469/0x1000
[ 347.351484] [ T11230] ksys_write+0x116/0x250
[ 347.352403] [ T11230] do_syscall_64+0xf0/0x6e0
[ 347.353361] [ T11230] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 347.354533] [ T11230]
other info that might help us debug this:
[ 347.356432] [ T11230] Chain exists of:
wq_pool_mutex --> fs_reclaim --> &q->q_usage_counter(io)#5
[ 347.358919] [ T11230] Possible unsafe locking scenario:
[ 347.360307] [ T11230] CPU0 CPU1
[ 347.361327] [ T11230] ---- ----
[ 347.362340] [ T11230] lock(&q->q_usage_counter(io)#5);
[ 347.363344] [ T11230] lock(fs_reclaim);
[ 347.364526] [ T11230] lock(&q->q_usage_counter(io)#5);
[ 347.365968] [ T11230] lock(wq_pool_mutex);
[ 347.366811] [ T11230]
*** DEADLOCK ***
This happens because SCSI disk rescan is executed from a work context
and a failure of blk_revalidate_disk_zones() causes a call to
disk_free_zone_resources() which will free the disk zone write plug
workqueue.
Avoid this by delaying the destruction of the disk zone write plug
workqueue to disk_release(). Do this by introducing the function
disk_release_zone_resources() and using this new function from
disk_release(). This new function calls disk_free_zone_resources() and
destroys the zone write plugs workqueue, thus allowing to remove the
call to destroy_workqueue() from disk_free_zone_resources().
disk_alloc_zone_resources() is modified to not create the disk zone
write plug work queue if it already exists.
Fixes: a8f59e5 ("block: use a per disk workqueue for zone write plugging")
Cc: stable@vger.kernek.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Author
|
Upstream branch: dc59e4f |
bcff6e6 to
eac4c67
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: block: avoid potential deadlock on zone revalidation failure
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1116289