| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
nfs: return EISDIR on nfs3_proc_create if d_alias is a dir
If we found an alias through nfs3_do_create/nfs_add_or_obtain
/d_splice_alias which happens to be a dir dentry, we don't return
any error, and simply forget about this alias, but the original
dentry we were adding and passed as parameter remains negative.
This later causes an oops on nfs_atomic_open_v23/finish_open since we
supply a negative dentry to do_dentry_open.
This has been observed running lustre-racer, where dirs and files are
created/removed concurrently with the same name and O_EXCL is not
used to open files (frequent file redirection).
While d_splice_alias typically returns a directory alias or NULL, we
explicitly check d_is_dir() to ensure that we don't attempt to perform
file operations (like finish_open) on a directory inode, which triggers
the observed oops. |
| In the Linux kernel, the following vulnerability has been resolved:
xprtrdma: Decrement re_receiving on the early exit paths
In the event that rpcrdma_post_recvs() fails to create a work request
(due to memory allocation failure, say) or otherwise exits early, we
should decrement ep->re_receiving before returning. Otherwise we will
hang in rpcrdma_xprt_drain() as re_receiving will never reach zero and
the completion will never be triggered.
On a system with high memory pressure, this can appear as the following
hung task:
INFO: task kworker/u385:17:8393 blocked for more than 122 seconds.
Tainted: G S E 6.19.0 #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u385:17 state:D stack:0 pid:8393 tgid:8393 ppid:2 task_flags:0x4248060 flags:0x00080000
Workqueue: xprtiod xprt_autoclose [sunrpc]
Call Trace:
<TASK>
__schedule+0x48b/0x18b0
? ib_post_send_mad+0x247/0xae0 [ib_core]
schedule+0x27/0xf0
schedule_timeout+0x104/0x110
__wait_for_common+0x98/0x180
? __pfx_schedule_timeout+0x10/0x10
wait_for_completion+0x24/0x40
rpcrdma_xprt_disconnect+0x444/0x460 [rpcrdma]
xprt_rdma_close+0x12/0x40 [rpcrdma]
xprt_autoclose+0x5f/0x120 [sunrpc]
process_one_work+0x191/0x3e0
worker_thread+0x2e3/0x420
? __pfx_worker_thread+0x10/0x10
kthread+0x10d/0x230
? __pfx_kthread+0x10/0x10
ret_from_fork+0x273/0x2b0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30 |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Fix deadlock between devlink lock and esw->wq
esw->work_queue executes esw_functions_changed_event_handler ->
esw_vfs_changed_event_handler and acquires the devlink lock.
.eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) ->
mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked ->
mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks
when esw_vfs_changed_event_handler executes.
Fix that by no longer flushing the work to avoid the deadlock, and using
a generation counter to keep track of work relevance. This avoids an old
handler manipulating an esw that has undergone one or more mode changes:
- the counter is incremented in mlx5_eswitch_event_handler_unregister.
- the counter is read and passed to the ephemeral mlx5_host_work struct.
- the work handler takes the devlink lock and bails out if the current
generation is different than the one it was scheduled to operate on.
- mlx5_eswitch_cleanup does the final draining before destroying the wq.
No longer flushing the workqueue has the side effect of maybe no longer
cancelling pending vport_change_handler work items, but that's ok since
those are disabled elsewhere:
- mlx5_eswitch_disable_locked disables the vport eq notifier.
- mlx5_esw_vport_disable disarms the HW EQ notification and marks
vport->enabled under state_lock to false to prevent pending vport
handler from doing anything.
- mlx5_eswitch_cleanup destroys the workqueue and makes sure all events
are disabled/finished. |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Fix crash when moving to switchdev mode
When moving to switchdev mode when the device doesn't support IPsec,
we try to clean up the IPsec resources anyway which causes the crash
below, fix that by correctly checking for IPsec support before trying
to clean up its resources.
[27642.515799] WARNING: arch/x86/mm/fault.c:1276 at
do_user_addr_fault+0x18a/0x680, CPU#4: devlink/6490
[27642.517159] Modules linked in: xt_conntrack xt_MASQUERADE
ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat xt_addrtype
rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl nfnetlink
zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi
scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core
ib_core
[27642.521358] CPU: 4 UID: 0 PID: 6490 Comm: devlink Not tainted
6.19.0-rc5_for_upstream_min_debug_2026_01_14_16_47 #1 NONE
[27642.522923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[27642.524528] RIP: 0010:do_user_addr_fault+0x18a/0x680
[27642.525362] Code: ff 0f 84 75 03 00 00 48 89 ee 4c 89 e7 e8 5e b9 22
00 49 89 c0 48 85 c0 0f 84 a8 02 00 00 f7 c3 60 80 00 00 74 22 31 c9 eb
ae <0f> 0b 48 83 c4 10 48 89 ea 48 89 de 4c 89 f7 5b 5d 41 5c 41 5d
41
[27642.528166] RSP: 0018:ffff88810770f6b8 EFLAGS: 00010046
[27642.529038] RAX: 0000000000000000 RBX: 0000000000000002 RCX:
ffff88810b980f00
[27642.530158] RDX: 00000000000000a0 RSI: 0000000000000002 RDI:
ffff88810770f728
[27642.531270] RBP: 00000000000000a0 R08: 0000000000000000 R09:
0000000000000000
[27642.532383] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff888103f3c4c0
[27642.533499] R13: 0000000000000000 R14: ffff88810770f728 R15:
0000000000000000
[27642.534614] FS: 00007f197c741740(0000) GS:ffff88856a94c000(0000)
knlGS:0000000000000000
[27642.535915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27642.536858] CR2: 00000000000000a0 CR3: 000000011334c003 CR4:
0000000000172eb0
[27642.537982] Call Trace:
[27642.538466] <TASK>
[27642.538907] exc_page_fault+0x76/0x140
[27642.539583] asm_exc_page_fault+0x22/0x30
[27642.540282] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
[27642.541134] Code: 07 85 c0 75 11 ba ff 00 00 00 f0 0f b1 17 75 06 b8
01 00 00 00 c3 31 c0 c3 90 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 7e 02 00 00 48 89 d8
5b
[27642.543936] RSP: 0018:ffff88810770f7d8 EFLAGS: 00010046
[27642.544803] RAX: 0000000000000000 RBX: 0000000000000202 RCX:
ffff888113ad96d8
[27642.545916] RDX: 0000000000000001 RSI: ffff88810770f818 RDI:
00000000000000a0
[27642.547027] RBP: 0000000000000098 R08: 0000000000000400 R09:
ffff88810b980f00
[27642.548140] R10: 0000000000000001 R11: ffff888101845a80 R12:
00000000000000a8
[27642.549263] R13: ffffffffa02a9060 R14: 00000000000000a0 R15:
ffff8881130d8a40
[27642.550379] complete_all+0x20/0x90
[27642.551010] mlx5e_ipsec_disable_events+0xb6/0xf0 [mlx5_core]
[27642.552022] mlx5e_nic_disable+0x12d/0x220 [mlx5_core]
[27642.552929] mlx5e_detach_netdev+0x66/0xf0 [mlx5_core]
[27642.553822] mlx5e_netdev_change_profile+0x5b/0x120 [mlx5_core]
[27642.554821] mlx5e_vport_rep_load+0x419/0x590 [mlx5_core]
[27642.555757] ? xa_load+0x53/0x90
[27642.556361] __esw_offloads_load_rep+0x54/0x70 [mlx5_core]
[27642.557328] mlx5_esw_offloads_rep_load+0x45/0xd0 [mlx5_core]
[27642.558320] esw_offloads_enable+0xb4b/0xc90 [mlx5_core]
[27642.559247] mlx5_eswitch_enable_locked+0x34e/0x4f0 [mlx5_core]
[27642.560257] ? mlx5_rescan_drivers_locked+0x222/0x2d0 [mlx5_core]
[27642.561284] mlx5_devlink_eswitch_mode_set+0x5ac/0x9c0 [mlx5_core]
[27642.562334] ? devlink_rate_set_ops_supported+0x21/0x3a0
[27642.563220] devlink_nl_eswitch_set_doit+0x67/0xe0
[27642.564026] genl_family_rcv_msg_doit+0xe0/0x130
[27642.564816] genl_rcv_msg+0x183/0x290
[27642.565466] ? __devlink_nl_pre_doit.isra.0+0x160/0x160
[27642.566329] ? d
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Fix DMA FIFO desync on error CQE SQ recovery
In case of a TX error CQE, a recovery flow is triggered,
mlx5e_reset_txqsq_cc_pc() resets dma_fifo_cc to 0 but not dma_fifo_pc,
desyncing the DMA FIFO producer and consumer.
After recovery, the producer pushes new DMA entries at the old
dma_fifo_pc, while the consumer reads from position 0.
This causes us to unmap stale DMA addresses from before the recovery.
The DMA FIFO is a purely software construct with no HW counterpart.
At the point of reset, all WQEs have been flushed so dma_fifo_cc is
already equal to dma_fifo_pc. There is no need to reset either counter,
similar to how skb_fifo pc/cc are untouched.
Remove the 'dma_fifo_cc = 0' reset.
This fixes the following WARNING:
WARNING: CPU: 0 PID: 0 at drivers/iommu/dma-iommu.c:1240 iommu_dma_unmap_page+0x79/0x90
Modules linked in: mlx5_vdpa vringh vdpa bonding mlx5_ib mlx5_vfio_pci ipip mlx5_fwctl tunnel4 mlx5_core ib_ipoib geneve ip6_gre ip_gre gre nf_tables ip6_tunnel rdma_ucm ib_uverbs ib_umad vfio_pci vfio_pci_core act_mirred act_skbedit act_vlan vhost_net vhost tap ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress vhost_iotlb iptable_raw tunnel6 vfio_iommu_type1 vfio openvswitch nsh rpcsec_gss_krb5 auth_rpcgss oid_registry xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter overlay zram zsmalloc rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core fuse [last unloaded: nf_tables]
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc5_for_upstream_min_debug_2024_12_30_21_33 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:iommu_dma_unmap_page+0x79/0x90
Code: 2b 4d 3b 21 72 26 4d 3b 61 08 73 20 49 89 d8 44 89 f9 5b 4c 89 f2 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 c7 ae 9e ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00
Call Trace:
<IRQ>
? __warn+0x7d/0x110
? iommu_dma_unmap_page+0x79/0x90
? report_bug+0x16d/0x180
? handle_bug+0x4f/0x90
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? iommu_dma_unmap_page+0x79/0x90
? iommu_dma_unmap_page+0x2e/0x90
dma_unmap_page_attrs+0x10d/0x1b0
mlx5e_tx_wi_dma_unmap+0xbe/0x120 [mlx5_core]
mlx5e_poll_tx_cq+0x16d/0x690 [mlx5_core]
mlx5e_napi_poll+0x8b/0xac0 [mlx5_core]
__napi_poll+0x24/0x190
net_rx_action+0x32a/0x3b0
? mlx5_eq_comp_int+0x7e/0x270 [mlx5_core]
? notifier_call_chain+0x35/0xa0
handle_softirqs+0xc9/0x270
irq_exit_rcu+0x71/0xd0
common_interrupt+0x7f/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40 |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ
XDP multi-buf programs can modify the layout of the XDP buffer when the
program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The
referenced commit in the fixes tag corrected the assumption in the mlx5
driver that the XDP buffer layout doesn't change during a program
execution. However, this fix introduced another issue: the dropped
fragments still need to be counted on the driver side to avoid page
fragment reference counting issues.
The issue was discovered by the drivers/net/xdp.py selftest,
more specifically the test_xdp_native_tx_mb:
- The mlx5 driver allocates a page_pool page and initializes it with
a frag counter of 64 (pp_ref_count=64) and the internal frag counter
to 0.
- The test sends one packet with no payload.
- On RX (mlx5e_skb_from_cqe_mpwrq_nonlinear()), mlx5 configures the XDP
buffer with the packet data starting in the first fragment which is the
page mentioned above.
- The XDP program runs and calls bpf_xdp_pull_data() which moves the
header into the linear part of the XDP buffer. As the packet doesn't
contain more data, the program drops the tail fragment since it no
longer contains any payload (pp_ref_count=63).
- mlx5 device skips counting this fragment. Internal frag counter
remains 0.
- mlx5 releases all 64 fragments of the page but page pp_ref_count is
63 => negative reference counting error.
Resulting splat during the test:
WARNING: CPU: 0 PID: 188225 at ./include/net/page_pool/helpers.h:297 mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core]
Modules linked in: [...]
CPU: 0 UID: 0 PID: 188225 Comm: ip Not tainted 6.18.0-rc7_for_upstream_min_debug_2025_12_08_11_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core]
[...]
Call Trace:
<TASK>
mlx5e_free_rx_mpwqe+0x20a/0x250 [mlx5_core]
mlx5e_dealloc_rx_mpwqe+0x37/0xb0 [mlx5_core]
mlx5e_free_rx_descs+0x11a/0x170 [mlx5_core]
mlx5e_close_rq+0x78/0xa0 [mlx5_core]
mlx5e_close_queues+0x46/0x2a0 [mlx5_core]
mlx5e_close_channel+0x24/0x90 [mlx5_core]
mlx5e_close_channels+0x5d/0xf0 [mlx5_core]
mlx5e_safe_switch_params+0x2ec/0x380 [mlx5_core]
mlx5e_change_mtu+0x11d/0x490 [mlx5_core]
mlx5e_change_nic_mtu+0x19/0x30 [mlx5_core]
netif_set_mtu_ext+0xfc/0x240
do_setlink.isra.0+0x226/0x1100
rtnl_newlink+0x7a9/0xba0
rtnetlink_rcv_msg+0x220/0x3c0
netlink_rcv_skb+0x4b/0xf0
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
____sys_sendmsg+0x1e8/0x240
___sys_sendmsg+0x7c/0xb0
[...]
__sys_sendmsg+0x5f/0xb0
do_syscall_64+0x55/0xc70
The problem applies for XDP_PASS as well which is handled in a different
code path in the driver.
This patch fixes the issue by doing page frag counting on all the
original XDP buffer fragments for all relevant XDP actions (XDP_TX ,
XDP_REDIRECT and XDP_PASS). This is basically reverting to the original
counting before the commit in the fixes tag.
As frag_page is still pointing to the original tail, the nr_frags
parameter to xdp_update_skb_frags_info() needs to be calculated
in a different way to reflect the new nr_frags. |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: RX, Fix XDP multi-buf frag counting for legacy RQ
XDP multi-buf programs can modify the layout of the XDP buffer when the
program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The
referenced commit in the fixes tag corrected the assumption in the mlx5
driver that the XDP buffer layout doesn't change during a program
execution. However, this fix introduced another issue: the dropped
fragments still need to be counted on the driver side to avoid page
fragment reference counting issues.
Such issue can be observed with the
test_xdp_native_adjst_tail_shrnk_data selftest when using a payload of
3600 and shrinking by 256 bytes (an upcoming selftest patch): the last
fragment gets released by the XDP code but doesn't get tracked by the
driver. This results in a negative pp_ref_count during page release and
the following splat:
WARNING: include/net/page_pool/helpers.h:297 at mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core], CPU#12: ip/3137
Modules linked in: [...]
CPU: 12 UID: 0 PID: 3137 Comm: ip Not tainted 6.19.0-rc3+ #12 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core]
[...]
Call Trace:
<TASK>
mlx5e_dealloc_rx_wqe+0xcb/0x1a0 [mlx5_core]
mlx5e_free_rx_descs+0x7f/0x110 [mlx5_core]
mlx5e_close_rq+0x50/0x60 [mlx5_core]
mlx5e_close_queues+0x36/0x2c0 [mlx5_core]
mlx5e_close_channel+0x1c/0x50 [mlx5_core]
mlx5e_close_channels+0x45/0x80 [mlx5_core]
mlx5e_safe_switch_params+0x1a5/0x230 [mlx5_core]
mlx5e_change_mtu+0xf3/0x2f0 [mlx5_core]
netif_set_mtu_ext+0xf1/0x230
do_setlink.isra.0+0x219/0x1180
rtnl_newlink+0x79f/0xb60
rtnetlink_rcv_msg+0x213/0x3a0
netlink_rcv_skb+0x48/0xf0
netlink_unicast+0x24a/0x350
netlink_sendmsg+0x1ee/0x410
__sock_sendmsg+0x38/0x60
____sys_sendmsg+0x232/0x280
___sys_sendmsg+0x78/0xb0
__sys_sendmsg+0x5f/0xb0
[...]
do_syscall_64+0x57/0xc50
This patch fixes the issue by doing page frag counting on all the
original XDP buffer fragments for all relevant XDP actions (XDP_TX ,
XDP_REDIRECT and XDP_PASS). This is basically reverting to the original
counting before the commit in the fixes tag.
As frag_page is still pointing to the original tail, the nr_frags
parameter to xdp_update_skb_frags_info() needs to be calculated
in a different way to reflect the new nr_frags. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc, afs: Fix missing error pointer check after rxrpc_kernel_lookup_peer()
rxrpc_kernel_lookup_peer() can also return error pointers in addition to
NULL, so just checking for NULL is not sufficient.
Fix this by:
(1) Changing rxrpc_kernel_lookup_peer() to return -ENOMEM rather than NULL
on allocation failure.
(2) Making the callers in afs use IS_ERR() and PTR_ERR() to pass on the
error code returned. |
| In the Linux kernel, the following vulnerability has been resolved:
net: spacemit: Fix error handling in emac_tx_mem_map()
The DMA mappings were leaked on mapping error. Free them with the
existing emac_free_tx_buf() function. |
| In the Linux kernel, the following vulnerability has been resolved:
spi: amlogic: spifc-a4: Fix DMA mapping error handling
Fix three bugs in aml_sfc_dma_buffer_setup() error paths:
1. Unnecessary goto: When the first DMA mapping (sfc->daddr) fails,
nothing needs cleanup. Use direct return instead of goto.
2. Double-unmap bug: When info DMA mapping failed, the code would
unmap sfc->daddr inline, then fall through to out_map_data which
would unmap it again, causing a double-unmap.
3. Wrong unmap size: The out_map_info label used datalen instead of
infolen when unmapping sfc->iaddr, which could lead to incorrect
DMA sync behavior. |
| In the Linux kernel, the following vulnerability has been resolved:
spi: rockchip-sfc: Fix double-free in remove() callback
The driver uses devm_spi_register_controller() for registration, which
automatically unregisters the controller via devm cleanup when the
device is removed. The manual call to spi_unregister_controller() in
the remove() callback can lead to a double-free.
And to make sure controller is unregistered before DMA buffer is
unmapped, switch to use spi_register_controller() in probe(). |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: soc-core: flush delayed work before removing DAIs and widgets
When a sound card is unbound while a PCM stream is open, a
use-after-free can occur in snd_soc_dapm_stream_event(), called from
the close_delayed_work workqueue handler.
During unbind, snd_soc_unbind_card() flushes delayed work and then
calls soc_cleanup_card_resources(). Inside cleanup,
snd_card_disconnect_sync() releases all PCM file descriptors, and
the resulting PCM close path can call snd_soc_dapm_stream_stop()
which schedules new delayed work with a pmdown_time timer delay.
Since this happens after the flush in snd_soc_unbind_card(), the
new work is not caught. soc_remove_link_components() then frees
DAPM widgets before this work fires, leading to the use-after-free.
The existing flush in soc_free_pcm_runtime() also cannot help as it
runs after soc_remove_link_components() has already freed the widgets.
Add a flush in soc_cleanup_card_resources() after
snd_card_disconnect_sync() (after which no new PCM closes can
schedule further delayed work) and before soc_remove_link_dais()
and soc_remove_link_components() (which tear down the structures the
delayed work accesses). |
| In the Linux kernel, the following vulnerability has been resolved:
serial: caif: hold tty->link reference in ldisc_open and ser_release
A reproducer triggers a KASAN slab-use-after-free in pty_write_room()
when caif_serial's TX path calls tty_write_room(). The faulting access
is on tty->link->port.
Hold an extra kref on tty->link for the lifetime of the caif_serial line
discipline: get it in ldisc_open() and drop it in ser_release(), and
also drop it on the ldisc_open() error path.
With this change applied, the reproducer no longer triggers the UAF in
my testing. |
| In the Linux kernel, the following vulnerability has been resolved:
mctp: i2c: fix skb memory leak in receive path
When 'midev->allow_rx' is false, the newly allocated skb isn't consumed
by netif_rx(), it needs to free the skb directly. |
| In the Linux kernel, the following vulnerability has been resolved:
bonding: fix type confusion in bond_setup_by_slave()
kernel BUG at net/core/skbuff.c:2306!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
RIP: 0010:pskb_expand_head+0xa08/0xfe0 net/core/skbuff.c:2306
RSP: 0018:ffffc90004aff760 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88807e3c8780 RCX: ffffffff89593e0e
RDX: ffff88807b7c4900 RSI: ffffffff89594747 RDI: ffff88807b7c4900
RBP: 0000000000000820 R08: 0000000000000005 R09: 0000000000000000
R10: 00000000961a63e0 R11: 0000000000000000 R12: ffff88807e3c8780
R13: 00000000961a6560 R14: dffffc0000000000 R15: 00000000961a63e0
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1a0ed8df0 CR3: 000000002d816000 CR4: 00000000003526f0
Call Trace:
<TASK>
ipgre_header+0xdd/0x540 net/ipv4/ip_gre.c:900
dev_hard_header include/linux/netdevice.h:3439 [inline]
packet_snd net/packet/af_packet.c:3028 [inline]
packet_sendmsg+0x3ae5/0x53c0 net/packet/af_packet.c:3108
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa54/0xc30 net/socket.c:2592
___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
__sys_sendmsg+0x170/0x220 net/socket.c:2678
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe1a0e6c1a9
When a non-Ethernet device (e.g. GRE tunnel) is enslaved to a bond,
bond_setup_by_slave() directly copies the slave's header_ops to the
bond device:
bond_dev->header_ops = slave_dev->header_ops;
This causes a type confusion when dev_hard_header() is later called
on the bond device. Functions like ipgre_header(), ip6gre_header(),all use
netdev_priv(dev) to access their device-specific private data. When
called with the bond device, netdev_priv() returns the bond's private
data (struct bonding) instead of the expected type (e.g. struct
ip_tunnel), leading to garbage values being read and kernel crashes.
Fix this by introducing bond_header_ops with wrapper functions that
delegate to the active slave's header_ops using the slave's own
device. This ensures netdev_priv() in the slave's header functions
always receives the correct device.
The fix is placed in the bonding driver rather than individual device
drivers, as the root cause is bond blindly inheriting header_ops from
the slave without considering that these callbacks expect a specific
netdev_priv() layout.
The type confusion can be observed by adding a printk in
ipgre_header() and running the following commands:
ip link add dummy0 type dummy
ip addr add 10.0.0.1/24 dev dummy0
ip link set dummy0 up
ip link add gre1 type gre local 10.0.0.1
ip link add bond1 type bond mode active-backup
ip link set gre1 master bond1
ip link set gre1 up
ip link set bond1 up
ip addr add fe80::1/64 dev bond1 |
| In the Linux kernel, the following vulnerability has been resolved:
mctp: route: hold key->lock in mctp_flow_prepare_output()
mctp_flow_prepare_output() checks key->dev and may call
mctp_dev_set_key(), but it does not hold key->lock while doing so.
mctp_dev_set_key() and mctp_dev_release_key() are annotated with
__must_hold(&key->lock), so key->dev access is intended to be
serialized by key->lock. The mctp_sendmsg() transmit path reaches
mctp_flow_prepare_output() via mctp_local_output() -> mctp_dst_output()
without holding key->lock, so the check-and-set sequence is racy.
Example interleaving:
CPU0 CPU1
---- ----
mctp_flow_prepare_output(key, devA)
if (!key->dev) // sees NULL
mctp_flow_prepare_output(
key, devB)
if (!key->dev) // still NULL
mctp_dev_set_key(devB, key)
mctp_dev_hold(devB)
key->dev = devB
mctp_dev_set_key(devA, key)
mctp_dev_hold(devA)
key->dev = devA // overwrites devB
Now both devA and devB references were acquired, but only the final
key->dev value is tracked for release. One reference can be lost,
causing a resource leak as mctp_dev_release_key() would only decrease
the reference on one dev.
Fix by taking key->lock around the key->dev check and
mctp_dev_set_key() call. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: Fix for duplicate device in netdev hooks
When handling NETDEV_REGISTER notification, duplicate device
registration must be avoided since the device may have been added by
nft_netdev_hook_alloc() already when creating the hook. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_set_pipapo: fix stack out-of-bounds read in pipapo_drop()
pipapo_drop() passes rulemap[i + 1].n to pipapo_unmap() as the
to_offset argument on every iteration, including the last one where
i == m->field_count - 1. This reads one element past the end of the
stack-allocated rulemap array (declared as rulemap[NFT_PIPAPO_MAX_FIELDS]
with NFT_PIPAPO_MAX_FIELDS == 16).
Although pipapo_unmap() returns early when is_last is true without
using the to_offset value, the argument is evaluated at the call site
before the function body executes, making this a genuine out-of-bounds
stack read confirmed by KASAN:
BUG: KASAN: stack-out-of-bounds in pipapo_drop+0x50c/0x57c [nf_tables]
Read of size 4 at addr ffff8000810e71a4
This frame has 1 object:
[32, 160) 'rulemap'
The buggy address is at offset 164 -- exactly 4 bytes past the end
of the rulemap array.
Pass 0 instead of rulemap[i + 1].n on the last iteration to avoid
the out-of-bounds read. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: x_tables: guard option walkers against 1-byte tail reads
When the last byte of options is a non-single-byte option kind, walkers
that advance with i += op[i + 1] ? : 1 can read op[i + 1] past the end
of the option area.
Add an explicit i == optlen - 1 check before dereferencing op[i + 1]
in xt_tcpudp and xt_dccp option walkers. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nfnetlink_queue: fix entry leak in bridge verdict error path
nfqnl_recv_verdict() calls find_dequeue_entry() to remove the queue
entry from the queue data structures, taking ownership of the entry.
For PF_BRIDGE packets, it then calls nfqa_parse_bridge() to parse VLAN
attributes. If nfqa_parse_bridge() returns an error (e.g. NFQA_VLAN
present but NFQA_VLAN_TCI missing), the function returns immediately
without freeing the dequeued entry or its sk_buff.
This leaks the nf_queue_entry, its associated sk_buff, and all held
references (net_device refcounts, struct net refcount). Repeated
triggering exhausts kernel memory.
Fix this by dropping the entry via nfqnl_reinject() with NF_DROP verdict
on the error path, consistent with other error handling in this file. |