linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-03 12:05:13 +02:00

Author	SHA1	Message	Date
Hyunwoo Kim	f7ad8b1d0e	espintcp: Fix race condition in espintcp_close() [ Upstream commit `e1512c1db9` ] This issue was discovered during a code audit. After cancel_work_sync() is called from espintcp_close(), espintcp_tx_work() can still be scheduled from paths such as the Delayed ACK handler or ksoftirqd. As a result, the espintcp_tx_work() worker may dereference a freed espintcp ctx or sk. The following is a simple race scenario: cpu0 cpu1 espintcp_close() cancel_work_sync(&ctx->work); espintcp_write_space() schedule_work(&ctx->work); To prevent this race condition, cancel_work_sync() is replaced with disable_work_sync(). Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/aZSie7rEdh9Nu0eM@v4bel Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:21:31 -05:00
Tetsuo Handa	a3c8fede03	xfrm: always flush state and policy upon NETDEV_UNREGISTER event [ Upstream commit `4efa91a285` ] syzbot is reporting that "struct xfrm_state" refcount is leaking. unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2 ref_tracker: netdev@ffff888052f24618 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline] netdev_tracker_alloc include/linux/netdevice.h:4412 [inline] xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316 xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline] xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022 xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550 xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2646 __sys_sendmsg+0x16d/0x220 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because commit `d77e38e612` ("xfrm: Add an IPsec hardware offloading API") implemented xfrm_dev_unregister() as no-op despite xfrm_dev_state_add() from xfrm_state_construct() acquires a reference to "struct net_device". I guess that that commit expected that NETDEV_DOWN event is fired before NETDEV_UNREGISTER event fires, and also assumed that xfrm_dev_state_add() is called only if (dev->features & NETIF_F_HW_ESP) != 0. Sabrina Dubroca identified steps to reproduce the same symptoms as below. echo 0 > /sys/bus/netdevsim/new_device dev=$(ls -1 /sys/bus/netdevsim/devices/netdevsim0/net/) ip xfrm state add src 192.168.13.1 dst 192.168.13.2 proto esp \ spi 0x1000 mode tunnel aead 'rfc4106(gcm(aes))' $key 128 \ offload crypto dev $dev dir out ethtool -K $dev esp-hw-offload off echo 0 > /sys/bus/netdevsim/del_device Like these steps indicate, the NETIF_F_HW_ESP bit can be cleared after xfrm_dev_state_add() acquired a reference to "struct net_device". Also, xfrm_dev_state_add() does not check for the NETIF_F_HW_ESP bit when acquiring a reference to "struct net_device". Commit `03891f820c` ("xfrm: handle NETDEV_UNREGISTER for xfrm device") re-introduced the NETDEV_UNREGISTER event to xfrm_dev_event(), but that commit for unknown reason chose to share xfrm_dev_down() between the NETDEV_DOWN event and the NETDEV_UNREGISTER event. I guess that that commit missed the behavior in the previous paragraph. Therefore, we need to re-introduce xfrm_dev_unregister() in order to release the reference to "struct net_device" by unconditionally flushing state and policy. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: `d77e38e612` ("xfrm: Add an IPsec hardware offloading API") Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:21:31 -05:00
Leon Romanovsky	f1713988e9	xfrm: skip templates check for packet offload tunnel mode [ Upstream commit `0a4524bc69` ] In packet offload, hardware is responsible to check templates. The result of its operation is forwarded through secpath by relevant drivers. That secpath is actually removed in __xfrm_policy_check2(). In case packet is forwarded, this secpath is reset in RX, but pushed again to TX where policy is rechecked again against dummy secpath in xfrm_policy_ok(). Such situation causes to unexpected XfrmInTmplMismatch increase. As a solution, simply skip template mismatch check. Fixes: `600258d555` ("xfrm: delete intermediate secpath entry in packet offload mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:21:31 -05:00
Antony Antony	a2a3c7bf2c	xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is set [ Upstream commit `c196def07b` ] The XFRM_STATE_NOPMTUDISC flag is only meaningful for output SAs, but it was being applied regardless of the SA direction when the sysctl ip_no_pmtu_disc is enabled. This can unintentionally affect input SAs. Limit setting XFRM_STATE_NOPMTUDISC to output SAs when the SA direction is configured. Closes: https://github.com/strongswan/strongswan/issues/2946 Fixes: `a4a87fa4e9` ("xfrm: Add Direction to the SA in or out") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-23 11:18:35 +01:00
Sabrina Dubroca	f68ffd58d4	xfrm: flush all states in xfrm_state_fini [ Upstream commit `42e42562c9` ] While reverting commit `f75a2804da` ("xfrm: destroy xfrm_state synchronously on net exit path"), I incorrectly changed xfrm_state_flush's "proto" argument back to IPSEC_PROTO_ANY. This reverts some of the changes in commit `dbb2483b2a` ("xfrm: clean up xfrm protocol checks"), and leads to some states not being removed when we exit the netns. Pass 0 instead of IPSEC_PROTO_ANY from both xfrm_state_fini xfrm6_tunnel_net_exit, so that xfrm_state_flush deletes all states. Fixes: `2a198bbec6` ("Revert "xfrm: destroy xfrm_state synchronously on net exit path"") Reported-by: syzbot+6641a61fe0e2e89ae8c5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6641a61fe0e2e89ae8c5 Tested-by: syzbot+6641a61fe0e2e89ae8c5@syzkaller.appspotmail.com Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-12 18:37:17 +01:00
Sabrina Dubroca	f7d879c19d	xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added [ Upstream commit `10deb69864` ] In commit `b441cf3f8c` ("xfrm: delete x->tunnel as we delete x"), I missed the case where state creation fails between full initialization (->init_state has been called) and being inserted on the lists. In this situation, ->init_state has been called, so for IPcomp tunnels, the fallback tunnel has been created and added onto the lists, but the user state never gets added, because we fail before that. The user state doesn't go through __xfrm_state_delete, so we don't call xfrm_state_delete_tunnel for those states, and we end up leaking the FB tunnel. There are several codepaths affected by this: the add/update paths, in both net/key and xfrm, and the migrate code (xfrm_migrate, xfrm_state_migrate). A "proper" rollback of the init_state work would probably be doable in the add/update code, but for migrate it gets more complicated as multiple states may be involved. At some point, the new (not-inserted) state will be destroyed, so call xfrm_state_delete_tunnel during xfrm_state_gc_destroy. Most states will have their fallback tunnel cleaned up during __xfrm_state_delete, which solves the issue that `b441cf3f8c` (and other patches before it) aimed at. All states (including FB tunnels) will be removed from the lists once xfrm_state_fini has called flush_work(&xfrm_state_gc_work). Reported-by: syzbot+999eb23467f83f9bf9bf@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=999eb23467f83f9bf9bf Fixes: `b441cf3f8c` ("xfrm: delete x->tunnel as we delete x") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-12 18:37:17 +01:00
Sabrina Dubroca	13b32a7e54	Revert "xfrm: destroy xfrm_state synchronously on net exit path" [ Upstream commit `2a198bbec6` ] This reverts commit `f75a2804da`. With all states (whether user or kern) removed from the hashtables during deletion, there's no need for synchronous destruction of states. xfrm6_tunnel states still need to have been destroyed (which will be the case when its last user is deleted (not destroyed)) so that xfrm6_tunnel_free_spi removes it from the per-netns hashtable before the netns is destroyed. This has the benefit of skipping one synchronize_rcu per state (in __xfrm_state_destroy(sync=true)) when we exit a netns. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-12 18:37:17 +01:00
Sabrina Dubroca	dc3636912d	xfrm: delete x->tunnel as we delete x [ Upstream commit `b441cf3f8c` ] The ipcomp fallback tunnels currently get deleted (from the various lists and hashtables) as the last user state that needed that fallback is destroyed (not deleted). If a reference to that user state still exists, the fallback state will remain on the hashtables/lists, triggering the WARN in xfrm_state_fini. Because of those remaining references, the fix in commit `f75a2804da` ("xfrm: destroy xfrm_state synchronously on net exit path") is not complete. We recently fixed one such situation in TCP due to defered freeing of skbs (commit `9b6412e697` ("tcp: drop secpath at the same time as we currently drop dst")). This can also happen due to IP reassembly: skbs with a secpath remain on the reassembly queue until netns destruction. If we can't guarantee that the queues are flushed by the time xfrm_state_fini runs, there may still be references to a (user) xfrm_state, preventing the timely deletion of the corresponding fallback state. Instead of chasing each instance of skbs holding a secpath one by one, this patch fixes the issue directly within xfrm, by deleting the fallback state as soon as the last user state depending on it has been deleted. Destruction will still happen when the final reference is dropped. A separate lockdep class for the fallback state is required since we're going to lock x->tunnel while x is locked. Fixes: `9d4139c769` ("netns xfrm: per-netns xfrm_state_all list") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-12 18:37:17 +01:00
Jianbo Liu	f15a3d7910	xfrm: Prevent locally generated packets from direct output in tunnel mode [ Upstream commit `59630e2ccd` ] Add a check to ensure locally generated packets (skb->sk != NULL) do not use direct output in tunnel mode, as these packets require proper L2 header setup that is handled by the normal XFRM processing path. Fixes: `5eddd76ec2` ("xfrm: fix tunnel mode TX datapath in packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-01 11:43:29 +01:00
Sabrina Dubroca	e7b4e6e189	xfrm: set err and extack on failure to create pcpu SA [ Upstream commit `1dcf617bec` ] xfrm_state_construct can fail without setting an error if the requested pcpu_num value is too big. Set err and add an extack message to avoid confusing userspace. Fixes: `1ddf9916ac` ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-01 11:43:29 +01:00
Sabrina Dubroca	ba670eba99	xfrm: drop SA reference in xfrm_state_update if dir doesn't match [ Upstream commit `8d2a2a49c3` ] We're not updating x1, but we still need to put() it. Fixes: `a4a87fa4e9` ("xfrm: Add Direction to the SA in or out") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-01 11:43:29 +01:00
Sabrina Dubroca	9fcedabaae	xfrm: xfrm_alloc_spi shouldn't use 0 as SPI [ Upstream commit `cd8ae32e4e` ] x->id.spi == 0 means "no SPI assigned", but since commit `94f39804d8` ("xfrm: Duplicate SPI Handling"), we now create states and add them to the byspi list with this value. __xfrm_state_delete doesn't remove those states from the byspi list, since they shouldn't be there, and this shows up as a UAF the next time we go through the byspi list. Reported-by: syzbot+a25ee9d20d31e483ba7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a25ee9d20d31e483ba7b Fixes: `94f39804d8` ("xfrm: Duplicate SPI Handling") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-02 13:44:09 +02:00
Aakash Kumar S	2fc5b54368	xfrm: Duplicate SPI Handling [ Upstream commit `94f39804d8` ] The issue originates when Strongswan initiates an XFRM_MSG_ALLOCSPI Netlink message, which triggers the kernel function xfrm_alloc_spi(). This function is expected to ensure uniqueness of the Security Parameter Index (SPI) for inbound Security Associations (SAs). However, it can return success even when the requested SPI is already in use, leading to duplicate SPIs assigned to multiple inbound SAs, differentiated only by their destination addresses. This behavior causes inconsistencies during SPI lookups for inbound packets. Since the lookup may return an arbitrary SA among those with the same SPI, packet processing can fail, resulting in packet drops. According to RFC 4301 section 4.4.2 , for inbound processing a unicast SA is uniquely identified by the SPI and optionally protocol. Reproducing the Issue Reliably: To consistently reproduce the problem, restrict the available SPI range in charon.conf : spi_min = 0x10000000 spi_max = 0x10000002 This limits the system to only 2 usable SPI values. Next, create more than 2 Child SA. each using unique pair of src/dst address. As soon as the 3rd Child SA is initiated, it will be assigned a duplicate SPI, since the SPI pool is already exhausted. With a narrow SPI range, the issue is consistently reproducible. With a broader/default range, it becomes rare and unpredictable. Current implementation: xfrm_spi_hash() lookup function computes hash using daddr, proto, and family. So if two SAs have the same SPI but different destination addresses, then they will: a. Hash into different buckets b. Be stored in different linked lists (byspi + h) c. Not be seen in the same hlist_for_each_entry_rcu() iteration. As a result, the lookup will result in NULL and kernel allows that Duplicate SPI Proposed Change: xfrm_state_lookup_spi_proto() does a truly global search - across all states, regardless of hash bucket and matches SPI and proto. Signed-off-by: Aakash Kumar S <saakashkumar@marvell.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-20 18:30:34 +02:00
Eyal Birger	5918c3f480	xfrm: interface: fix use-after-free after changing collect_md xfrm interface [ Upstream commit `a90b2a1aaa` ] collect_md property on xfrm interfaces can only be set on device creation, thus xfrmi_changelink() should fail when called on such interfaces. The check to enforce this was done only in the case where the xi was returned from xfrmi_locate() which doesn't look for the collect_md interface, and thus the validation was never reached. Calling changelink would thus errornously place the special interface xi in the xfrmi_net->xfrmi hash, but since it also exists in the xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when the net namespace was taken down [1]. Change the check to use the xi from netdev_priv which is available earlier in the function to prevent changes in xfrm collect_md interfaces. [1] resulting oops: [ 8.516540] kernel BUG at net/core/dev.c:12029! [ 8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary) [ 8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 8.516569] Workqueue: netns cleanup_net [ 8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0 [ 8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24 [ 8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206 [ 8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60 [ 8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122 [ 8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100 [ 8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00 [ 8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00 [ 8.516615] FS: 0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000 [ 8.516619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0 [ 8.516625] PKRU: 55555554 [ 8.516627] Call Trace: [ 8.516632] <TASK> [ 8.516635] ? rtnl_is_locked+0x15/0x20 [ 8.516641] ? unregister_netdevice_queue+0x29/0xf0 [ 8.516650] ops_undo_list+0x1f2/0x220 [ 8.516659] cleanup_net+0x1ad/0x2e0 [ 8.516664] process_one_work+0x160/0x380 [ 8.516673] worker_thread+0x2aa/0x3c0 [ 8.516679] ? __pfx_worker_thread+0x10/0x10 [ 8.516686] kthread+0xfb/0x200 [ 8.516690] ? __pfx_kthread+0x10/0x10 [ 8.516693] ? __pfx_kthread+0x10/0x10 [ 8.516697] ret_from_fork+0x82/0xf0 [ 8.516705] ? __pfx_kthread+0x10/0x10 [ 8.516709] ret_from_fork_asm+0x1a/0x30 [ 8.516718] </TASK> Fixes: `abc340b38b` ("xfrm: interface: support collect metadata mode") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-01 09:48:41 +01:00
Sabrina Dubroca	085f24f0be	xfrm: state: use a consistent pcpu_id in xfrm_state_find [ Upstream commit `7eb11c0ab7` ] If we get preempted during xfrm_state_find, we could run xfrm_state_look_at using a different pcpu_id than the one xfrm_state_find saw. This could lead to ignoring states that should have matched, and triggering acquires on a CPU that already has a pcpu state. xfrm_state_find starts on CPU1 pcpu_id = 1 lookup starts <preemption, we're now on CPU2> xfrm_state_look_at pcpu_id = 2 finds a state found: best->pcpu_num != pcpu_id (2 != 1) if (!x && !error && !acquire_in_progress) { ... xfrm_state_alloc xfrm_init_tempstate ... This can be avoided by passing the original pcpu_id down to all xfrm_state_look_at() calls. Also switch to raw_smp_processor_id, disabling preempting just to re-enable it immediately doesn't really make sense. Fixes: `1ddf9916ac` ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-01 09:48:40 +01:00
Sabrina Dubroca	6bf2daafc5	xfrm: state: initialize state_ptrs earlier in xfrm_state_find [ Upstream commit `94d077c331` ] In case of preemption, xfrm_state_look_at will find a different pcpu_id and look up states for that other CPU. If we matched a state for CPU2 in the state_cache while the lookup started on CPU1, we will jump to "found", but the "best" state that we got will be ignored and we will enter the "acquire" block. This block uses state_ptrs, which isn't initialized at this point. Let's initialize state_ptrs just after taking rcu_read_lock. This will also prevent a possible misuse in the future, if someone adjusts this function. Reported-by: syzbot+7ed9d47e15e88581dc5b@syzkaller.appspotmail.com Fixes: `e952837f3d` ("xfrm: state: fix out-of-bounds read during lookup") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-08-01 09:48:40 +01:00
Leon Romanovsky	8b0741b167	xfrm: validate assignment of maximal possible SEQ number [ Upstream commit `e86212b6b1` ] Users can set any seq/seq_hi/oseq/oseq_hi values. The XFRM core code doesn't prevent from them to set even 0xFFFFFFFF, however this value will cause for traffic drop. Is is happening because SEQ numbers here mean that packet with such number was processed and next number should be sent on the wire. In this case, the next number will be 0, and it means overflow which causes to (expected) packet drops. While it can be considered as misconfiguration and handled by XFRM datapath in the same manner as any other SEQ number, let's add validation to easy for packet offloads implementations which need to configure HW with next SEQ to send and not with current SEQ like it is done in core code. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-27 11:11:31 +01:00
Cosmin Ratiu	5063fe8789	xfrm: Use xdo.dev instead of xdo.real_dev [ Upstream commit `25ac138f58` ] The policy offload struct was reused from the state offload and real_dev was copied from dev, but it was never set to anything else. Simplify the code by always using xdo.dev for policies. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:43 +02:00
Paul Chaignon	252f78a931	xfrm: Sanitize marks before insert [ Upstream commit `0b91fda3a1` ] Prior to this patch, the mark is sanitized (applying the state's mask to the state's value) only on inserts when checking if a conflicting XFRM state or policy exists. We discovered in Cilium that this same sanitization does not occur in the hot-path __xfrm_state_lookup. In the hot-path, the sk_buff's mark is simply compared to the state's value: if ((mark & x->mark.m) != x->mark.v) continue; Therefore, users can define unsanitized marks (ex. 0xf42/0xf00) which will never match any packet. This commit updates __xfrm_state_insert and xfrm_policy_insert to store the sanitized marks, thus removing this footgun. This has the side effect of changing the ip output, as the returned mark will have the mask applied to it when printed. Fixes: `3d6acfa764` ("xfrm: SA lookups with mark") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com> Co-developed-by: Louis DeLosSantos <louis.delos.devel@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:03:15 +02:00
Sabrina Dubroca	9cbca30102	espintcp: remove encap socket caching to avoid reference leak [ Upstream commit `028363685b` ] The current scheme for caching the encap socket can lead to reference leaks when we try to delete the netns. The reference chain is: xfrm_state -> enacp_sk -> netns Since the encap socket is a userspace socket, it holds a reference on the netns. If we delete the espintcp state (through flush or individual delete) before removing the netns, the reference on the socket is dropped and the netns is correctly deleted. Otherwise, the netns may not be reachable anymore (if all processes within the ns have terminated), so we cannot delete the xfrm state to drop its reference on the socket. This patch results in a small (~2% in my tests) performance regression. A GC-type mechanism could be added for the socket cache, to clear references if the state hasn't been used "recently", but it's a lot more complex than just not caching the socket. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:03:14 +02:00
Sabrina Dubroca	28756f22de	espintcp: fix skb leaks [ Upstream commit `63c1f19a3b` ] A few error paths are missing a kfree_skb. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:03:13 +02:00
Leon Romanovsky	06daedb443	xfrm: prevent high SEQ input in non-ESN mode [ Upstream commit `e3aa43a50a` ] In non-ESN mode, the SEQ numbers are limited to 32 bits and seq_hi/oseq_hi are not used. So make sure that user gets proper error message, in case such assignment occurred. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:55 +02:00
Cosmin Ratiu	25e4700489	xfrm_output: Force software GSO only in tunnel mode [ Upstream commit `0aae2867aa` ] The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel mode. Unfortunately, it is slightly broader than necessary, as it also severely affects performance for Geneve + IPSec transport mode over a device capable of both HW GSO and IPSec crypto offload. In this case, xfrm_output unnecessarily triggers software GSO instead of letting the HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over a back-2-back pair of NICs with MTU 1500, the performance was observed to be up to 6x worse when doing software GSO compared to leaving it to the hardware. This commit makes xfrm_output only trigger software GSO in crypto offload cases for already encapsulated packets in tunnel mode, as not doing so would then cause the inner tunnel skb->inner_networking_header to be overwritten and break software GSO for that packet later if the device turns out to not be capable of HW GSO. Taking a closer look at the conditions for the original bug, to better understand the reasons for this change: - vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and inner network header. - then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and network headers. - later in the xmit path, xfrm_output -> xfrm_outer_mode_output -> xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner network header with the one set in ip_tunnel_xmit before adding the second outer header. - __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation needs to happen based on dev features. In the original bug, the hw couldn't segment the packets, so skb_gso_segment was invoked. - deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment tries to use the wrong inner network header, expecting the one set in iptunnel_handle_offloads but getting the one set by xfrm instead. - a bit later, ipv6_gso_segment accesses the wrong memory based on that wrong inner network header. With the new change, the original bug (or similar ones) cannot happen again, as xfrm will now trigger software GSO before applying a tunnel. This concern doesn't exist in packet offload mode, when the HW adds encapsulation headers. For the non-offloaded packets (crypto in SW), software GSO is still done unconditionally in the else branch. Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Fixes: `a204aef9fd` ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-03-28 22:03:25 +01:00
Alexandre Cassen	a0395e9683	xfrm: fix tunnel mode TX datapath in packet offload mode [ Upstream commit `5eddd76ec2` ] Packets that match the output xfrm policy are delivered to the netstack. In IPsec packet mode for tunnel mode, the HW is responsible for building the hard header and outer IP header. In such a situation, the inner header may refer to a network that is not directly reachable by the host, resulting in a failed neighbor resolution. The packet is then dropped. xfrm policy defines the netdevice to use for xmit so we can send packets directly to it. Makes direct xmit exclusive to tunnel mode, since some rules may apply in transport mode. Fixes: `f8a70afafc` ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Alexandre Cassen <acassen@corp.free.fr> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-03-28 22:03:25 +01:00
Steffen Klassert	364d7e7f30	xfrm: Fix acquire state insertion. commit `a35672819f` upstream. A recent commit jumped over the dst hash computation and left the symbol uninitialized. Fix this by explicitly computing the dst hash before it is used. Fixes: `0045e3d806` ("xfrm: Cache used outbound xfrm states at the policy.") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-08 09:58:18 +01:00
Everest K.C	78b366aa0e	xfrm: Add error handling when nla_put_u32() returns an error commit `9d287e70c5` upstream. Error handling is missing when call to nla_put_u32() fails. Handle the error when the call to nla_put_u32() returns an error. The error was reported by Coverity Scan. Report: CID 1601525: (#1 of 1): Unused value (UNUSED_VALUE) returned_value: Assigning value from nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num) to err here, but that stored value is overwritten before it can be used Fixes: `1ddf9916ac` ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Everest K.C. <everestkc@everestkc.com.np> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-08 09:58:18 +01:00
Sebastian Sewior	e468aace40	xfrm: Don't disable preemption while looking up cache state. [ Upstream commit `6c9b7db96d` ] For the state cache lookup xfrm_input_state_lookup() first disables preemption, to remain on the CPU and then retrieves a per-CPU pointer. Within the preempt-disable section it also acquires netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be acquired with explicit disabled preemption (such as by get_cpu()) because this lock becomes a sleeping lock on PREEMPT_RT. To remain on the same CPU is just an optimisation for the CPU local lookup. The actual modification of the per-CPU variable happens with netns_xfrm::xfrm_state_lock acquired. Remove get_cpu() and use the state_cache_input on the current CPU. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/ Fixes: `81a331a0e7` ("xfrm: Add an inbound percpu state cache.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:58:03 +01:00
Florian Westphal	a16871c783	xfrm: state: fix out-of-bounds read during lookup [ Upstream commit `e952837f3d` ] lookup and resize can run in parallel. The xfrm_state_hash_generation seqlock ensures a retry, but the hash functions can observe a hmask value that is too large for the new hlist array. rehash does: rcu_assign_pointer(net->xfrm.state_bydst, ndst) [..] net->xfrm.state_hmask = nhashmask; While state lookup does: h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { This is only safe in case the update to state_bydst is larger than net->xfrm.xfrm_state_hmask (or if the lookup function gets serialized via state spinlock again). Fix this by prefetching state_hmask and the associated pointers. The xfrm_state_hash_generation seqlock retry will ensure that the pointer and the hmask will be consistent. The existing helpers, like xfrm_dst_hash(), are now unsafe for RCU side, add lockdep assertions to document that they are only safe for insert side. xfrm_state_lookup_byaddr() uses the spinlock rather than RCU. AFAICS this is an oversight from back when state lookup was converted to RCU, this lock should be replaced with RCU in a future patch. Reported-by: syzbot+5f9f31cb7d985f584d8e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CACT4Y+azwfrE3uz6A5ZErov5YN2LYBN5KrsymBerT36VU8qzBA@mail.gmail.com/ Diagnosed-by: Dmitry Vyukov <dvyukov@google.com> Fixes: `c2f672fc94` ("xfrm: state lookup can be lockless") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:58:00 +01:00
Steffen Klassert	5e4334dc39	xfrm: Add an inbound percpu state cache. [ Upstream commit `81a331a0e7` ] Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we add a percpu cache to cache the used inbound xfrm states. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org> Stable-dep-of: `e952837f3d` ("xfrm: state: fix out-of-bounds read during lookup") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:58:00 +01:00
Steffen Klassert	aa48a18fdb	xfrm: Cache used outbound xfrm states at the policy. [ Upstream commit `0045e3d806` ] Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we cache the used xfrm states at the policy for outbound IPsec traffic. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org> Stable-dep-of: `e952837f3d` ("xfrm: state: fix out-of-bounds read during lookup") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:58:00 +01:00
Steffen Klassert	2a64cff2aa	xfrm: Add support for per cpu xfrm state handling. [ Upstream commit `1ddf9916ac` ] Currently all flows for a certain SA must be processed by the same cpu to avoid packet reordering and lock contention of the xfrm state lock. To get rid of this limitation, the IETF standardized per cpu SAs in RFC 9611. This patch implements the xfrm part of it. We add the cpu as a lookup key for xfrm states and a config option to generate acquire messages for each cpu. With that, we can have on each cpu a SA with identical traffic selector so that flows can be processed in parallel on all cpus. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org> Stable-dep-of: `e952837f3d` ("xfrm: state: fix out-of-bounds read during lookup") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:58:00 +01:00
Jianbo Liu	87fdd4c447	xfrm: replay: Fix the update of replay_esn->oseq_hi for GSO [ Upstream commit `c05c5e5aa1` ] When skb needs GSO and wrap around happens, if xo->seq.low (seqno of the first skb segment) is before the last seq number but oseq (seqno of the last segment) is after it, xo->seq.low is still bigger than replay_esn->oseq while oseq is smaller than it, so the update of replay_esn->oseq_hi is missed for this case wrap around because of the change in the cited commit. For example, if sending a packet with gso_segs=3 while old replay_esn->oseq=0xfffffffe, we calculate: xo->seq.low = 0xfffffffe + 1 = 0x0xffffffff oseq = 0xfffffffe + 3 = 0x1 (oseq < replay_esn->oseq) is true, but (xo->seq.low < replay_esn->oseq) is false, so replay_esn->oseq_hi is not incremented. To fix this issue, change the outer checking back for the update of replay_esn->oseq_hi. And add new checking inside for the update of packet's oseq_hi. Fixes: `4b549ccce9` ("xfrm: replay: Fix ESN wrap around for GSO") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:58:00 +01:00
Paolo Abeni	1e424d08d3	Merge tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-10-22 1) Fix routing behavior that relies on L4 information for xfrm encapsulated packets. From Eyal Birger. 2) Remove leftovers of pernet policy_inexact lists. From Florian Westphal. 3) Validate new SA's prefixlen when the selector family is not set from userspace. From Sabrina Dubroca. 4) Fix a kernel-infoleak when dumping an auth algorithm. From Petr Vaganov. Please pull or let me know if there are problems. ipsec-2024-10-22 * tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix one more kernel-infoleak in algo dumping xfrm: validate new SA's prefixlen using SA family when sel.family is unset xfrm: policy: remove last remnants of pernet inexact list xfrm: respect ip protocols rules criteria when performing dst lookups xfrm: extract dst lookup parameters into a struct ==================== Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-10-24 11:11:33 +02:00
Petr Vaganov	6889cd2a93	xfrm: fix one more kernel-infoleak in algo dumping During fuzz testing, the following issue was discovered: BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x598/0x2a30 _copy_to_iter+0x598/0x2a30 __skb_datagram_iter+0x168/0x1060 skb_copy_datagram_iter+0x5b/0x220 netlink_recvmsg+0x362/0x1700 sock_recvmsg+0x2dc/0x390 __sys_recvfrom+0x381/0x6d0 __x64_sys_recvfrom+0x130/0x200 x64_sys_call+0x32c8/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Uninit was stored to memory at: copy_to_user_state_extra+0xcc1/0x1e00 dump_one_state+0x28c/0x5f0 xfrm_state_walk+0x548/0x11e0 xfrm_dump_sa+0x1e0/0x840 netlink_dump+0x943/0x1c40 __netlink_dump_start+0x746/0xdb0 xfrm_user_rcv_msg+0x429/0xc00 netlink_rcv_skb+0x613/0x780 xfrm_netlink_rcv+0x77/0xc0 netlink_unicast+0xe90/0x1280 netlink_sendmsg+0x126d/0x1490 __sock_sendmsg+0x332/0x3d0 ____sys_sendmsg+0x863/0xc30 ___sys_sendmsg+0x285/0x3e0 __x64_sys_sendmsg+0x2d6/0x560 x64_sys_call+0x1316/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Uninit was created at: __kmalloc+0x571/0xd30 attach_auth+0x106/0x3e0 xfrm_add_sa+0x2aa0/0x4230 xfrm_user_rcv_msg+0x832/0xc00 netlink_rcv_skb+0x613/0x780 xfrm_netlink_rcv+0x77/0xc0 netlink_unicast+0xe90/0x1280 netlink_sendmsg+0x126d/0x1490 __sock_sendmsg+0x332/0x3d0 ____sys_sendmsg+0x863/0xc30 ___sys_sendmsg+0x285/0x3e0 __x64_sys_sendmsg+0x2d6/0x560 x64_sys_call+0x1316/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Bytes 328-379 of 732 are uninitialized Memory access of size 732 starts at ffff88800e18e000 Data copied to user address 00007ff30f48aff0 CPU: 2 PID: 18167 Comm: syz-executor.0 Not tainted 6.8.11 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Fixes copying of xfrm algorithms where some random data of the structure fields can end up in userspace. Padding in structures may be filled with random (possibly sensitve) data and should never be given directly to user-space. A similar issue was resolved in the commit `8222d5910d` ("xfrm: Zero padding when dumping algos and encap") Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: `c7a5899eb2` ("xfrm: redact SA secret with lockdown confidentiality") Cc: stable@vger.kernel.org Co-developed-by: Boris Tonofa <b.tonofa@ideco.ru> Signed-off-by: Boris Tonofa <b.tonofa@ideco.ru> Signed-off-by: Petr Vaganov <p.vaganov@ideco.ru> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-10-11 09:00:03 +02:00
Sabrina Dubroca	3f0ab59e65	xfrm: validate new SA's prefixlen using SA family when sel.family is unset This expands the validation introduced in commit `07bf790895` ("xfrm: Validate address prefix lengths in the xfrm selector.") syzbot created an SA with usersa.sel.family = AF_UNSPEC usersa.sel.prefixlen_s = 128 usersa.family = AF_INET Because of the AF_UNSPEC selector, verify_newsa_info doesn't put limits on prefixlen_{s,d}. But then copy_from_user_state sets x->sel.family to usersa.family (AF_INET). Do the same conversion in verify_newsa_info before validating prefixlen_{s,d}, since that's how prefixlen is going to be used later on. Reported-by: syzbot+cc39f136925517aed571@syzkaller.appspotmail.com Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-10-07 09:09:08 +02:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Florian Westphal	645546a05b	xfrm: policy: remove last remnants of pernet inexact list xfrm_net still contained the no-longer-used inexact policy list heads, remove them. Fixes: `a54ad727f7` ("xfrm: policy: remove remaining use of inexact list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-09-24 09:58:16 +02:00
Eyal Birger	b846972103	xfrm: respect ip protocols rules criteria when performing dst lookups The series in the "fixes" tag added the ability to consider L4 attributes in routing rules. The dst lookup on the outer packet of encapsulated traffic in the xfrm code was not adapted to this change, thus routing behavior that relies on L4 information is not respected. Pass the ip protocol information when performing dst lookups. Fixes: `a25724b05a` ("Merge branch 'fib_rules-support-sport-dport-and-proto-match'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-09-23 07:02:07 +02:00
Eyal Birger	e509996b16	xfrm: extract dst lookup parameters into a struct Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-09-23 07:02:07 +02:00
Jakub Kicinski	ea403549da	Merge tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-09-10 19:00:47 -07:00
Nathan Chancellor	e62d39332d	xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() Clang warns (or errors with CONFIG_WERROR): net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized] 1286 \| if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { \| ^~~ net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning 1257 \| int dir; \| ^ \| = 0 1 error generated. A recent refactoring removed some assignments to dir because xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is used elsewhere in xfrm_hash_rebuild(), including within loops where it needs to be reloaded for each policy. Restore the assignments before the first use of dir to fix the warning and ensure dir is properly initialized throughout the function. Fixes: `08c2182cf0` ("xfrm: policy: use recently added helper in more places") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-09-09 15:30:44 +02:00
Florian Westphal	6a13f5afd3	xfrm: policy: fix null dereference Julian Wiedmann says: > + if (!xfrm_pol_hold_rcu(ret)) Coverity spotted that ^^^ needs a s/ret/pol fix-up: > CID 1599386: Null pointer dereferences (FORWARD_NULL) > Passing null pointer "ret" to "xfrm_pol_hold_rcu", which dereferences it. Ditch the bogus 'ret' variable. Fixes: `563d5ca93e` ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Closes: https://lore.kernel.org/netdev/06dc2499-c095-4bd4-aee3-a1d0e3ec87c4@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-09-09 15:30:05 +02:00
Steffen Klassert	69716a3bab	Revert "xfrm: add SA information to the offloaded packet" This reverts commit `e7cd191f83`. While supporting xfrm interfaces in the packet offload API is needed, this patch does not do the right thing. There are more things to do to really support xfrm interfaces, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-09-09 11:43:39 +02:00
Alexander Lobakin	00d066a4d4	netdev_features: convert NETIF_F_LLTX to dev->lltx NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-09-03 11:36:43 +02:00
Ido Schimmel	b261b2c6c1	xfrm: Unmask upper DSCP bits in xfrm_get_tos() The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-08-31 17:44:51 +01:00
Florian Westphal	17163f2367	xfrm: minor update to sdb and xfrm_policy comments The spd is no longer maintained as a linear list. We also haven't been caching bundles in the xfrm_policy struct since 2010. While at it, add kdoc style comments for the xfrm_policy structure and extend the description of the current rbtree based search to mention why it needs to search the candidate set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-08-28 07:37:13 +02:00
Florian Westphal	08c2182cf0	xfrm: policy: use recently added helper in more places No logical change intended. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-08-28 07:37:13 +02:00
wangfe	e7cd191f83	xfrm: add SA information to the offloaded packet In packet offload mode, append Security Association (SA) information to each packet, replicating the crypto offload implementation. The XFRM_XMIT flag is set to enable packet to be returned immediately from the validate_xmit_xfrm function, thus aligning with the existing code path for packet offload mode. This SA info helps HW offload match packets to their correct security policies. The XFRM interface ID is included, which is crucial in setups with multiple XFRM interfaces where source/destination addresses alone can't pinpoint the right policy. Signed-off-by: wangfe <wangfe@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-08-27 11:28:58 +02:00
Florian Westphal	a54ad727f7	xfrm: policy: remove remaining use of inexact list No consumers anymore, remove it. After this, insertion of policies no longer require list walk of all inexact policies but only those that are reachable via the candidate sets. This gives almost linear insertion speeds provided the inserted policies are for non-overlapping networks. Before: Inserted 1000 policies in 70 ms Inserted 10000 policies in 1155 ms Inserted 100000 policies in 216848 ms After: Inserted 1000 policies in 56 ms Inserted 10000 policies in 478 ms Inserted 100000 policies in 4580 ms Insertion of 1m entries takes about ~40s after this change on my test vm. Cc: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-08-24 09:57:55 +02:00
Florian Westphal	563d5ca93e	xfrm: switch migrate to xfrm_policy_lookup_bytype XFRM_MIGRATE still uses the old lookup method: first check the bydst hash table, then search the list of all the other policies. Switch MIGRATE to use the same lookup function as the packetpath. This is done to remove the last remaining users of the pernet xfrm.policy_inexact lists with the intent of removing this list. After this patch, policies are still added to the list on insertion and they are rehashed as-needed but no single API makes use of these anymore. This change is compile tested only. Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2024-08-24 09:56:09 +02:00

1 2 3 4 5 ...

1699 Commits