| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
PM / devfreq: Synchronize devfreq_monitor_[start/stop]
There is a chance if a frequent switch of the governor
done in a loop result in timer list corruption where
timer cancel being done from two place one from
cancel_delayed_work_sync() and followed by expire_timers()
can be seen from the traces[1].
while true
do
echo "simple_ondemand" > /sys/class/devfreq/1d84000.ufshc/governor
echo "performance" > /sys/class/devfreq/1d84000.ufshc/governor
done
It looks to be issue with devfreq driver where
device_monitor_[start/stop] need to synchronized so that
delayed work should get corrupted while it is either
being queued or running or being cancelled.
Let's use polling flag and devfreq lock to synchronize the
queueing the timer instance twice and work data being
corrupted.
[1]
...
..
<idle>-0 [003] 9436.209662: timer_cancel timer=0xffffff80444f0428
<idle>-0 [003] 9436.209664: timer_expire_entry timer=0xffffff80444f0428 now=0x10022da1c function=__typeid__ZTSFvP10timer_listE_global_addr baseclk=0x10022da1c
<idle>-0 [003] 9436.209718: timer_expire_exit timer=0xffffff80444f0428
kworker/u16:6-14217 [003] 9436.209863: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2b now=0x10022da1c flags=182452227
vendor.xxxyyy.ha-1593 [004] 9436.209888: timer_cancel timer=0xffffff80444f0428
vendor.xxxyyy.ha-1593 [004] 9436.216390: timer_init timer=0xffffff80444f0428
vendor.xxxyyy.ha-1593 [004] 9436.216392: timer_start timer=0xffffff80444f0428 function=__typeid__ZTSFvP10timer_listE_global_addr expires=0x10022da2c now=0x10022da1d flags=186646532
vendor.xxxyyy.ha-1593 [005] 9436.220992: timer_cancel timer=0xffffff80444f0428
xxxyyyTraceManag-7795 [004] 9436.261641: timer_cancel timer=0xffffff80444f0428
[2]
9436.261653][ C4] Unable to handle kernel paging request at virtual address dead00000000012a
[ 9436.261664][ C4] Mem abort info:
[ 9436.261666][ C4] ESR = 0x96000044
[ 9436.261669][ C4] EC = 0x25: DABT (current EL), IL = 32 bits
[ 9436.261671][ C4] SET = 0, FnV = 0
[ 9436.261673][ C4] EA = 0, S1PTW = 0
[ 9436.261675][ C4] Data abort info:
[ 9436.261677][ C4] ISV = 0, ISS = 0x00000044
[ 9436.261680][ C4] CM = 0, WnR = 1
[ 9436.261682][ C4] [dead00000000012a] address between user and kernel address ranges
[ 9436.261685][ C4] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 9436.261701][ C4] Skip md ftrace buffer dump for: 0x3a982d0
...
[ 9436.262138][ C4] CPU: 4 PID: 7795 Comm: TraceManag Tainted: G S W O 5.10.149-android12-9-o-g17f915d29d0c #1
[ 9436.262141][ C4] Hardware name: Qualcomm Technologies, Inc. (DT)
[ 9436.262144][ C4] pstate: 22400085 (nzCv daIf +PAN -UAO +TCO BTYPE=--)
[ 9436.262161][ C4] pc : expire_timers+0x9c/0x438
[ 9436.262164][ C4] lr : expire_timers+0x2a4/0x438
[ 9436.262168][ C4] sp : ffffffc010023dd0
[ 9436.262171][ C4] x29: ffffffc010023df0 x28: ffffffd0636fdc18
[ 9436.262178][ C4] x27: ffffffd063569dd0 x26: ffffffd063536008
[ 9436.262182][ C4] x25: 0000000000000001 x24: ffffff88f7c69280
[ 9436.262185][ C4] x23: 00000000000000e0 x22: dead000000000122
[ 9436.262188][ C4] x21: 000000010022da29 x20: ffffff8af72b4e80
[ 9436.262191][ C4] x19: ffffffc010023e50 x18: ffffffc010025038
[ 9436.262195][ C4] x17: 0000000000000240 x16: 0000000000000201
[ 9436.262199][ C4] x15: ffffffffffffffff x14: ffffff889f3c3100
[ 9436.262203][ C4] x13: ffffff889f3c3100 x12: 00000000049f56b8
[ 9436.262207][ C4] x11: 00000000049f56b8 x10: 00000000ffffffff
[ 9436.262212][ C4] x9 : ffffffc010023e50 x8 : dead000000000122
[ 9436.262216][ C4] x7 : ffffffffffffffff x6 : ffffffc0100239d8
[ 9436.262220][ C4] x5 : 0000000000000000 x4 : 0000000000000101
[ 9436.262223][ C4] x3 : 0000000000000080 x2 : ffffff8
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
um: time-travel: fix time corruption
In 'basic' time-travel mode (without =inf-cpu or =ext), we
still get timer interrupts. These can happen at arbitrary
points in time, i.e. while in timer_read(), which pushes
time forward just a little bit. Then, if we happen to get
the interrupt after calculating the new time to push to,
but before actually finishing that, the interrupt will set
the time to a value that's incompatible with the forward,
and we'll crash because time goes backwards when we do the
forwarding.
Fix this by reading the time_travel_time, calculating the
adjustment, and doing the adjustment all with interrupts
disabled. |
| In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid online resizing failures due to oversized flex bg
When we online resize an ext4 filesystem with a oversized flexbg_size,
mkfs.ext4 -F -G 67108864 $dev -b 4096 100M
mount $dev $dir
resize2fs $dev 16G
the following WARN_ON is triggered:
==================================================================
WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550
Modules linked in: sg(E)
CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314
RIP: 0010:__alloc_pages+0x411/0x550
Call Trace:
<TASK>
__kmalloc_large_node+0xa2/0x200
__kmalloc+0x16e/0x290
ext4_resize_fs+0x481/0xd80
__ext4_ioctl+0x1616/0x1d90
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xf0/0x150
do_syscall_64+0x3b/0x90
==================================================================
This is because flexbg_size is too large and the size of the new_group_data
array to be allocated exceeds MAX_ORDER. Currently, the minimum value of
MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding
maximum number of groups that can be allocated is:
(PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) ≈ 21845
And the value that is down-aligned to the power of 2 is 16384. Therefore,
this value is defined as MAX_RESIZE_BG, and the number of groups added
each time does not exceed this value during resizing, and is added multiple
times to complete the online resizing. The difference is that the metadata
in a flex_bg may be more dispersed. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers
These three bpf_map_{lookup,update,delete}_elem() helpers are also
available for sleepable bpf program, so add the corresponding lock
assertion for sleepable bpf program, otherwise the following warning
will be reported when a sleepable bpf program manipulates bpf map under
interpreter mode (aka bpf_jit_enable=0):
WARNING: CPU: 3 PID: 4985 at kernel/bpf/helpers.c:40 ......
CPU: 3 PID: 4985 Comm: test_progs Not tainted 6.6.0+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
RIP: 0010:bpf_map_lookup_elem+0x54/0x60
......
Call Trace:
<TASK>
? __warn+0xa5/0x240
? bpf_map_lookup_elem+0x54/0x60
? report_bug+0x1ba/0x1f0
? handle_bug+0x40/0x80
? exc_invalid_op+0x18/0x50
? asm_exc_invalid_op+0x1b/0x20
? __pfx_bpf_map_lookup_elem+0x10/0x10
? rcu_lockdep_current_cpu_online+0x65/0xb0
? rcu_is_watching+0x23/0x50
? bpf_map_lookup_elem+0x54/0x60
? __pfx_bpf_map_lookup_elem+0x10/0x10
___bpf_prog_run+0x513/0x3b70
__bpf_prog_run32+0x9d/0xd0
? __bpf_prog_enter_sleepable_recur+0xad/0x120
? __bpf_prog_enter_sleepable_recur+0x3e/0x120
bpf_trampoline_6442580665+0x4d/0x1000
__x64_sys_getpgid+0x5/0x30
? do_syscall_64+0x36/0xb0
entry_SYSCALL_64_after_hwframe+0x6e/0x76
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: disallow timeout for anonymous sets
Never used from userspace, disallow these parameters. |
| In the Linux kernel, the following vulnerability has been resolved:
pstore/ram: Fix crash when setting number of cpus to an odd number
When the number of cpu cores is adjusted to 7 or other odd numbers,
the zone size will become an odd number.
The address of the zone will become:
addr of zone0 = BASE
addr of zone1 = BASE + zone_size
addr of zone2 = BASE + zone_size*2
...
The address of zone1/3/5/7 will be mapped to non-alignment va.
Eventually crashes will occur when accessing these va.
So, use ALIGN_DOWN() to make sure the zone size is even
to avoid this bug. |
| In the Linux kernel, the following vulnerability has been resolved:
PCI: switchtec: Fix stdev_release() crash after surprise hot remove
A PCI device hot removal may occur while stdev->cdev is held open. The call
to stdev_release() then happens during close or exit, at a point way past
switchtec_pci_remove(). Otherwise the last ref would vanish with the
trailing put_device(), just before return.
At that later point in time, the devm cleanup has already removed the
stdev->mmio_mrpc mapping. Also, the stdev->pdev reference was not a counted
one. Therefore, in DMA mode, the iowrite32() in stdev_release() will cause
a fatal page fault, and the subsequent dma_free_coherent(), if reached,
would pass a stale &stdev->pdev->dev pointer.
Fix by moving MRPC DMA shutdown into switchtec_pci_remove(), after
stdev_kill(). Counting the stdev->pdev ref is now optional, but may prevent
future accidents.
Reproducible via the script at
https://lore.kernel.org/r/20231113212150.96410-1-dns@arista.com |
| In the Linux kernel, the following vulnerability has been resolved:
powerpc/lib: Validate size for vector operations
Some of the fp/vmx code in sstep.c assume a certain maximum size for the
instructions being emulated. The size of those operations however is
determined separately in analyse_instr().
Add a check to validate the assumption on the maximum size of the
operations, so as to prevent any unintended kernel stack corruption. |
| In the Linux kernel, the following vulnerability has been resolved:
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6
index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]')
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
================================================================================
Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
panic+0x30f/0x770 kernel/panic.c:340
check_panic_on_warn+0x82/0xa0 kernel/panic.c:236
ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348
dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867
dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834
dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331
dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline]
dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402
txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534
txUpdateMap+0x342/0x9e0
txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline]
jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than
CTLTREESIZE which is the max size of stree. Adding a simple check
solves this issue.
Dave:
As the function returns a void, good error handling
would require a more intrusive code reorganization, so I modified
Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot. |
| In the Linux kernel, the following vulnerability has been resolved:
UBSAN: array-index-out-of-bounds in dtSplitRoot
Syzkaller reported the following issue:
oop0: detected capacity change from 0 to 32768
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:1971:9
index -2 is out of range for type 'struct dtslot [128]'
CPU: 0 PID: 3613 Comm: syz-executor270 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_out_of_bounds+0xdb/0x130 lib/ubsan.c:283
dtSplitRoot+0x8d8/0x1900 fs/jfs/jfs_dtree.c:1971
dtSplitUp fs/jfs/jfs_dtree.c:985 [inline]
dtInsert+0x1189/0x6b80 fs/jfs/jfs_dtree.c:863
jfs_mkdir+0x757/0xb00 fs/jfs/namei.c:270
vfs_mkdir+0x3b3/0x590 fs/namei.c:4013
do_mkdirat+0x279/0x550 fs/namei.c:4038
__do_sys_mkdirat fs/namei.c:4053 [inline]
__se_sys_mkdirat fs/namei.c:4051 [inline]
__x64_sys_mkdirat+0x85/0x90 fs/namei.c:4051
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fcdc0113fd9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffeb8bc67d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000102
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcdc0113fd9
RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000003
RBP: 00007fcdc00d37a0 R08: 0000000000000000 R09: 00007fcdc00d37a0
R10: 00005555559a72c0 R11: 0000000000000246 R12: 00000000f8008000
R13: 0000000000000000 R14: 00083878000000f8 R15: 0000000000000000
</TASK>
The issue is caused when the value of fsi becomes less than -1.
The check to break the loop when fsi value becomes -1 is present
but syzbot was able to produce value less than -1 which cause the error.
This patch simply add the change for the values less than 0.
The patch is tested via syzbot. |
| In the Linux kernel, the following vulnerability has been resolved:
jfs: fix slab-out-of-bounds Read in dtSearch
Currently while searching for current page in the sorted entry table
of the page there is a out of bound access. Added a bound check to fix
the error.
Dave:
Set return code to -EIO |
| In the Linux kernel, the following vulnerability has been resolved:
jfs: fix array-index-out-of-bounds in dbAdjTree
Currently there is a bound check missing in the dbAdjTree while
accessing the dmt_stree. To add the required check added the bool is_ctl
which is required to determine the size as suggest in the following
commit.
https://lore.kernel.org/linux-kernel-mentees/f9475918-2186-49b8-b801-6f0f9e75f4fa@oracle.com/ |
| In the Linux kernel, the following vulnerability has been resolved:
jfs: fix uaf in jfs_evict_inode
When the execution of diMount(ipimap) fails, the object ipimap that has been
released may be accessed in diFreeSpecial(). Asynchronous ipimap release occurs
when rcu_core() calls jfs_free_node().
Therefore, when diMount(ipimap) fails, sbi->ipimap should not be initialized as
ipimap. |
| In the Linux kernel, the following vulnerability has been resolved:
jfs: fix array-index-out-of-bounds in diNewExt
[Syz report]
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_imap.c:2360:2
index -878706688 is out of range for type 'struct iagctl[128]'
CPU: 1 PID: 5065 Comm: syz-executor282 Not tainted 6.7.0-rc4-syzkaller-00009-gbee0e7762ad2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
diNewExt+0x3cf3/0x4000 fs/jfs/jfs_imap.c:2360
diAllocExt fs/jfs/jfs_imap.c:1949 [inline]
diAllocAG+0xbe8/0x1e50 fs/jfs/jfs_imap.c:1666
diAlloc+0x1d3/0x1760 fs/jfs/jfs_imap.c:1587
ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56
jfs_mkdir+0x1c5/0xb90 fs/jfs/namei.c:225
vfs_mkdir+0x2f1/0x4b0 fs/namei.c:4106
do_mkdirat+0x264/0x3a0 fs/namei.c:4129
__do_sys_mkdir fs/namei.c:4149 [inline]
__se_sys_mkdir fs/namei.c:4147 [inline]
__x64_sys_mkdir+0x6e/0x80 fs/namei.c:4147
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fcb7e6a0b57
Code: ff ff 77 07 31 c0 c3 0f 1f 40 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd83023038 EFLAGS: 00000286 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fcb7e6a0b57
RDX: 00000000000a1020 RSI: 00000000000001ff RDI: 0000000020000140
RBP: 0000000020000140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffd830230d0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Analysis]
When the agstart is too large, it can cause agno overflow.
[Fix]
After obtaining agno, if the value is invalid, exit the subsequent process.
Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next
report by kernel test robot (Dan Carpenter). |
| In the Linux kernel, the following vulnerability has been resolved:
s390/ptrace: handle setting of fpc register correctly
If the content of the floating point control (fpc) register of a traced
process is modified with the ptrace interface the new value is tested for
validity by temporarily loading it into the fpc register.
This may lead to corruption of the fpc register of the tracing process:
if an interrupt happens while the value is temporarily loaded into the
fpc register, and within interrupt context floating point or vector
registers are used, the current fp/vx registers are saved with
save_fpu_regs() assuming they belong to user space and will be loaded into
fp/vx registers when returning to user space.
test_fp_ctl() restores the original user space fpc register value, however
it will be discarded, when returning to user space.
In result the tracer will incorrectly continue to run with the value that
was supposed to be used for the traced process.
Fix this by saving fpu register contents with save_fpu_regs() before using
test_fp_ctl(). |
| In the Linux kernel, the following vulnerability has been resolved:
sysctl: Fix out of bounds access for empty sysctl registers
When registering tables to the sysctl subsystem there is a check to see
if header is a permanently empty directory (used for mounts). This check
evaluates the first element of the ctl_table. This results in an out of
bounds evaluation when registering empty directories.
The function register_sysctl_mount_point now passes a ctl_table of size
1 instead of size 0. It now relies solely on the type to identify
a permanently empty register.
Make sure that the ctl_table has at least one element before testing for
permanent emptiness. |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: rt2x00: restart beacon queue when hardware reset
When a hardware reset is triggered, all registers are reset, so all
queues are forced to stop in hardware interface. However, mac80211
will not automatically stop the queue. If we don't manually stop the
beacon queue, the queue will be deadlocked and unable to start again.
This patch fixes the issue where Apple devices cannot connect to the
AP after calling ieee80211_restart_hw(). |
| In the Linux kernel, the following vulnerability has been resolved:
reiserfs: Avoid touching renamed directory if parent does not change
The VFS will not be locking moved directory if its parent does not
change. Change reiserfs rename code to avoid touching renamed directory
if its parent does not change as without locking that can corrupt the
filesystem. |
| In the Linux kernel, the following vulnerability has been resolved:
ocfs2: Avoid touching renamed directory if parent does not change
The VFS will not be locking moved directory if its parent does not
change. Change ocfs2 rename code to avoid touching renamed directory if
its parent does not change as without locking that can corrupt the
filesystem. |
| In the Linux kernel, the following vulnerability has been resolved:
IB/ipoib: Fix mcast list locking
Releasing the `priv->lock` while iterating the `priv->multicast_list` in
`ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to
remove the items while in the middle of iteration. If the mcast is removed
while the lock was dropped, the for loop spins forever resulting in a hard
lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel):
Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below)
-----------------------------------+-----------------------------------
ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work)
spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...)
list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev = priv->dev)
&priv->multicast_list, list) |
ipoib_mcast_join(dev, mcast) |
spin_unlock_irq(&priv->lock) |
| spin_lock_irqsave(&priv->lock, flags)
| list_for_each_entry_safe(mcast, tmcast,
| &priv->multicast_list, list)
| list_del(&mcast->list);
| list_add_tail(&mcast->list, &remove_list)
| spin_unlock_irqrestore(&priv->lock, flags)
spin_lock_irq(&priv->lock) |
| ipoib_mcast_remove_list(&remove_list)
(Here, `mcast` is no longer on the | list_for_each_entry_safe(mcast, tmcast,
`priv->multicast_list` and we keep | remove_list, list)
spinning on the `remove_list` of | >>> wait_for_completion(&mcast->done)
the other thread which is blocked |
and the list is still valid on |
it's stack.)
Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent
eventual sleeps.
Unfortunately we could not reproduce the lockup and confirm this fix but
based on the code review I think this fix should address such lockups.
crash> bc 31
PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2"
--
[exception RIP: ipoib_mcast_join_task+0x1b1]
RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002
RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000
work (&priv->mcast_task{,.work})
RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000
&mcast->list
RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000
R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00
mcast
R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8
dev priv (&priv->lock) &priv->multicast_list (aka head)
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib]
#6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967
crash> rx ff646f199a8c7e68
ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work = &priv->mcast_task.work
crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000
(empty)
crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000
mcast_task.work.func = 0xffffffffc0944910 <ipoib_mcast_join_task>,
mcast_mutex.owner.counter = 0xff1c69998efec000
crash> b 8
PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0"
--
#3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646
#4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib]
#5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib]
#6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib]
#7 [ff
---truncated--- |