Commit Graph

1003 Commits

Author SHA1 Message Date
Kuniyuki Iwashima
2684610a9c ipv6: Fix use-after-free in inet6_addr_del().
[ Upstream commit ddf96c393a ]

syzbot reported use-after-free of inet6_ifaddr in
inet6_addr_del(). [0]

The cited commit accidentally moved ipv6_del_addr() for
mngtmpaddr before reading its ifp->flags for temporary
addresses in inet6_addr_del().

Let's move ipv6_del_addr() down to fix the UAF.

[0]:
BUG: KASAN: slab-use-after-free in inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117
Read of size 4 at addr ffff88807b89c86c by task syz.3.1618/9593

CPU: 0 UID: 0 PID: 9593 Comm: syz.3.1618 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xcd/0x630 mm/kasan/report.c:482
 kasan_report+0xe0/0x110 mm/kasan/report.c:595
 inet6_addr_del.constprop.0+0x67a/0x6b0 net/ipv6/addrconf.c:3117
 addrconf_del_ifaddr+0x11e/0x190 net/ipv6/addrconf.c:3181
 inet6_ioctl+0x1e5/0x2b0 net/ipv6/af_inet6.c:582
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f164cf8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f164de64038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f164d1e5fa0 RCX: 00007f164cf8f749
RDX: 0000200000000000 RSI: 0000000000008936 RDI: 0000000000000003
RBP: 00007f164d013f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f164d1e6038 R14: 00007f164d1e5fa0 R15: 00007ffde15c8288
 </TASK>

Allocated by task 9593:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
 kasan_save_track+0x14/0x30 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:397 [inline]
 __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:414
 kmalloc_noprof include/linux/slab.h:957 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 ipv6_add_addr+0x4e3/0x2010 net/ipv6/addrconf.c:1120
 inet6_addr_add+0x256/0x9b0 net/ipv6/addrconf.c:3050
 addrconf_add_ifaddr+0x1fc/0x450 net/ipv6/addrconf.c:3160
 inet6_ioctl+0x103/0x2b0 net/ipv6/af_inet6.c:580
 sock_do_ioctl+0x118/0x280 net/socket.c:1254
 sock_ioctl+0x227/0x6b0 net/socket.c:1375
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl fs/ioctl.c:583 [inline]
 __x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 6099:
 kasan_save_stack+0x33/0x60 mm/kasan/common.c:56
 kasan_save_track+0x14/0x30 mm/kasan/common.c:77
 kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584
 poison_slab_object mm/kasan/common.c:252 [inline]
 __kasan_slab_free+0x5f/0x80 mm/kasan/common.c:284
 kasan_slab_free include/linux/kasan.h:234 [inline]
 slab_free_hook mm/slub.c:2540 [inline]
 slab_free_freelist_hook mm/slub.c:2569 [inline]
 slab_free_bulk mm/slub.c:6696 [inline]
 kmem_cache_free_bulk mm/slub.c:7383 [inline]
 kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7362
 kfree_bulk include/linux/slab.h:830 [inline]
 kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523
 kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline]
 kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801
 process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421
 kthread+0x3c5/0x780 kernel/kthread.c:463
 ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

Fixes: 00b5b7aab9 ("net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged")
Reported-by: syzbot+72e610f4f1a930ca9d8a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/696598e9.050a0220.3be5c5.0009.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260113010538.2019411-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-23 11:18:39 +01:00
Yue Haibing
483303d94c ipv6: Add sanity checks on ipv6_devconf.rpl_seg_enabled
[ Upstream commit 3d95261eeb ]

In ipv6_rpl_srh_rcv() we use min(net->ipv6.devconf_all->rpl_seg_enabled,
idev->cnf.rpl_seg_enabled) is intended to return 0 when either value is
zero, but if one of the values is negative it will in fact return non-zero.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250901123726.1972881-3-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-11-13 15:34:18 -05:00
Kuniyuki Iwashima
347b94ee25 ipv6: mcast: Check inet6_dev->dead under idev->mc_lock in __ipv6_dev_mc_inc().
[ Upstream commit dbd40f318c ]

Since commit 63ed8de4be ("mld: add mc_lock for protecting
per-interface mld data"), every multicast resource is protected
by inet6_dev->mc_lock.

RTNL is unnecessary in terms of protection but still needed for
synchronisation between addrconf_ifdown() and __ipv6_dev_mc_inc().

Once we removed RTNL, there would be a race below, where we could
add a multicast address to a dead inet6_dev.

  CPU1                            CPU2
  ====                            ====
  addrconf_ifdown()               __ipv6_dev_mc_inc()
                                    if (idev->dead) <-- false
    dead = true                       return -ENODEV;
    ipv6_mc_destroy_dev() / ipv6_mc_down()
      mutex_lock(&idev->mc_lock)
      ...
      mutex_unlock(&idev->mc_lock)
                                    mutex_lock(&idev->mc_lock)
                                    ...
                                    mutex_unlock(&idev->mc_lock)

The race window can be easily closed by checking inet6_dev->dead
under inet6_dev->mc_lock in __ipv6_dev_mc_inc() as addrconf_ifdown()
will acquire it after marking inet6_dev dead.

Let's check inet6_dev->dead under mc_lock in __ipv6_dev_mc_inc().

Note that now __ipv6_dev_mc_inc() no longer depends on RTNL and
we can remove ASSERT_RTNL() there and the RTNL comment above
addrconf_join_solict().

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250702230210.3115355-4-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20 18:30:33 +02:00
Guillaume Nault
8f65277317 gre: Fix IPv6 multicast route creation.
commit 4e914ef063 upstream.

Use addrconf_add_dev() instead of ipv6_find_idev() in
addrconf_gre_config() so that we don't just get the inet6_dev, but also
install the default ff00::/8 multicast route.

Before commit 3e6a0243ff ("gre: Fix again IPv6 link-local address
generation."), the multicast route was created at the end of the
function by addrconf_add_mroute(). But this code path is now only taken
in one particular case (gre devices not bound to a local IP address and
in EUI64 mode). For all other cases, the function exits early and
addrconf_add_mroute() is not called anymore.

Using addrconf_add_dev() instead of ipv6_find_idev() in
addrconf_gre_config(), fixes the problem as it will create the default
multicast route for all gre devices. This also brings
addrconf_gre_config() a bit closer to the normal netdevice IPv6
configuration code (addrconf_dev_config()).

Cc: stable@vger.kernel.org
Fixes: 3e6a0243ff ("gre: Fix again IPv6 link-local address generation.")
Reported-by: Aiden Yang <ling@moedove.com>
Closes: https://lore.kernel.org/netdev/CANR=AhRM7YHHXVxJ4DmrTNMeuEOY87K2mLmo9KMed1JMr20p6g@mail.gmail.com/
Reviewed-by: Gary Guo <gary@garyguo.net>
Tested-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/027a923dcb550ad115e6d93ee8bb7d310378bd01.1752070620.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-17 18:37:09 +02:00
Guillaume Nault
35be4c0cdf gre: Fix again IPv6 link-local address generation.
[ Upstream commit 3e6a0243ff ]

Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE
devices in most cases and fall back to using add_v4_addrs() only in
case the GRE configuration is incompatible with addrconf_addr_gen().

GRE used to use addrconf_addr_gen() until commit e5dd729460 ("ip/ip6_gre:
use the same logic as SIT interfaces when computing v6LL address")
restricted this use to gretap and ip6gretap devices, and created
add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones.

The original problem came when commit 9af28511be ("addrconf: refuse
isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its
addr parameter was 0. The commit says that this would create an invalid
address, however, I couldn't find any RFC saying that the generated
interface identifier would be wrong. Anyway, since gre over IPv4
devices pass their local tunnel address to __ipv6_isatap_ifid(), that
commit broke their IPv6 link-local address generation when the local
address was unspecified.

Then commit e5dd729460 ("ip/ip6_gre: use the same logic as SIT
interfaces when computing v6LL address") tried to fix that case by
defining add_v4_addrs() and calling it to generate the IPv6 link-local
address instead of using addrconf_addr_gen() (apart for gretap and
ip6gretap devices, which would still use the regular
addrconf_addr_gen(), since they have a MAC address).

That broke several use cases because add_v4_addrs() isn't properly
integrated into the rest of IPv6 Neighbor Discovery code. Several of
these shortcomings have been fixed over time, but add_v4_addrs()
remains broken on several aspects. In particular, it doesn't send any
Router Sollicitations, so the SLAAC process doesn't start until the
interface receives a Router Advertisement. Also, add_v4_addrs() mostly
ignores the address generation mode of the interface
(/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the
IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases.

Fix the situation by using add_v4_addrs() only in the specific scenario
where the normal method would fail. That is, for interfaces that have
all of the following characteristics:

  * run over IPv4,
  * transport IP packets directly, not Ethernet (that is, not gretap
    interfaces),
  * tunnel endpoint is INADDR_ANY (that is, 0),
  * device address generation mode is EUI64.

In all other cases, revert back to the regular addrconf_addr_gen().

Also, remove the special case for ip6gre interfaces in add_v4_addrs(),
since ip6gre devices now always use addrconf_addr_gen() instead.

Note:
  This patch was originally applied as commit 183185a18f ("gre: Fix
  IPv6 link-local address generation."). However, it was then reverted
  by commit fc486c2d06 ("Revert "gre: Fix IPv6 link-local address
  generation."") because it uncovered another bug that ended up
  breaking net/forwarding/ip6gre_custom_multipath_hash.sh. That other
  bug has now been fixed by commit 4d0ab3a688 ("ipv6: Start path
  selection from the first nexthop"). Therefore we can now revive this
  GRE patch (no changes since original commit 183185a18f ("gre: Fix
  IPv6 link-local address generation.").

Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/a88cc5c4811af36007645d610c95102dccb360a6.1746225214.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18 08:24:48 +02:00
Fernando Fernandez Mancera
de579015d1 ipv6: fix omitted netlink attributes when using RTEXT_FILTER_SKIP_STATS
[ Upstream commit 7ac6ea4a3e ]

Using RTEXT_FILTER_SKIP_STATS is incorrectly skipping non-stats IPv6
netlink attributes on link dump. This causes issues on userspace tools,
e.g iproute2 is not rendering address generation mode as it should due
to missing netlink attribute.

Move the filling of IFLA_INET6_STATS and IFLA_INET6_ICMP6STATS to a
helper function guarded by a flag check to avoid hitting the same
situation in the future.

Fixes: d5566fd72e ("rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250402121751.3108-1-ffmancera@riseup.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10 14:39:34 +02:00
Guillaume Nault
a3ff812d68 Revert "gre: Fix IPv6 link-local address generation."
[ Upstream commit fc486c2d06 ]

This reverts commit 183185a18f.

This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.

Fixes: 183185a18f ("gre: Fix IPv6 link-local address generation.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/8b1ce738eb15dd841aab9ef888640cab4f6ccfea.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-28 22:03:28 +01:00
Guillaume Nault
a4d42b5901 gre: Fix IPv6 link-local address generation.
[ Upstream commit 183185a18f ]

Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE
devices in most cases and fall back to using add_v4_addrs() only in
case the GRE configuration is incompatible with addrconf_addr_gen().

GRE used to use addrconf_addr_gen() until commit e5dd729460
("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL
address") restricted this use to gretap and ip6gretap devices, and
created add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones.

The original problem came when commit 9af28511be ("addrconf: refuse
isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its
addr parameter was 0. The commit says that this would create an invalid
address, however, I couldn't find any RFC saying that the generated
interface identifier would be wrong. Anyway, since gre over IPv4
devices pass their local tunnel address to __ipv6_isatap_ifid(), that
commit broke their IPv6 link-local address generation when the local
address was unspecified.

Then commit e5dd729460 ("ip/ip6_gre: use the same logic as SIT
interfaces when computing v6LL address") tried to fix that case by
defining add_v4_addrs() and calling it to generate the IPv6 link-local
address instead of using addrconf_addr_gen() (apart for gretap and
ip6gretap devices, which would still use the regular
addrconf_addr_gen(), since they have a MAC address).

That broke several use cases because add_v4_addrs() isn't properly
integrated into the rest of IPv6 Neighbor Discovery code. Several of
these shortcomings have been fixed over time, but add_v4_addrs()
remains broken on several aspects. In particular, it doesn't send any
Router Sollicitations, so the SLAAC process doesn't start until the
interface receives a Router Advertisement. Also, add_v4_addrs() mostly
ignores the address generation mode of the interface
(/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the
IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases.

Fix the situation by using add_v4_addrs() only in the specific scenario
where the normal method would fail. That is, for interfaces that have
all of the following characteristics:

  * run over IPv4,
  * transport IP packets directly, not Ethernet (that is, not gretap
    interfaces),
  * tunnel endpoint is INADDR_ANY (that is, 0),
  * device address generation mode is EUI64.

In all other cases, revert back to the regular addrconf_addr_gen().

Also, remove the special case for ip6gre interfaces in add_v4_addrs(),
since ip6gre devices now always use addrconf_addr_gen() instead.

Fixes: e5dd729460 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/559c32ce5c9976b269e6337ac9abb6a96abe5096.1741375285.git.gnault@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22 12:54:13 -07:00
Eric Dumazet
01f95357e4 ipv6: avoid possible NULL deref in modify_prefix_route()
[ Upstream commit a747e02430 ]

syzbot found a NULL deref [1] in modify_prefix_route(), caused by one
fib6_info without a fib6_table pointer set.

This can happen for net->ipv6.fib6_null_entry

[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 1 UID: 0 PID: 5837 Comm: syz-executor888 Not tainted 6.12.0-syzkaller-09567-g7eef7e306d3c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:__lock_acquire+0xe4/0x3c40 kernel/locking/lockdep.c:5089
Code: 08 84 d2 0f 85 15 14 00 00 44 8b 0d ca 98 f5 0e 45 85 c9 0f 84 b4 0e 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 96 2c 00 00 49 8b 04 24 48 3d a0 07 7f 93 0f 84
RSP: 0018:ffffc900035d7268 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000006 RSI: 1ffff920006bae5f RDI: 0000000000000030
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff90608e17 R11: 0000000000000001 R12: 0000000000000030
R13: ffff888036334880 R14: 0000000000000000 R15: 0000000000000000
FS:  0000555579e90380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc59cc4278 CR3: 0000000072b54000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
  _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
  spin_lock_bh include/linux/spinlock.h:356 [inline]
  modify_prefix_route+0x30b/0x8b0 net/ipv6/addrconf.c:4831
  inet6_addr_modify net/ipv6/addrconf.c:4923 [inline]
  inet6_rtm_newaddr+0x12c7/0x1ab0 net/ipv6/addrconf.c:5055
  rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6920
  netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2541
  netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
  netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1347
  netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1891
  sock_sendmsg_nosec net/socket.c:711 [inline]
  __sock_sendmsg net/socket.c:726 [inline]
  ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2583
  ___sys_sendmsg+0x135/0x1e0 net/socket.c:2637
  __sys_sendmsg+0x16e/0x220 net/socket.c:2669
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd1dcef8b79
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc59cc4378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd1dcef8b79
RDX: 0000000000040040 RSI: 0000000020000140 RDI: 0000000000000004
RBP: 00000000000113fd R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 00007ffc59cc438c
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
 </TASK>

Fixes: 5eb902b8e7 ("net/ipv6: Remove expired routes with a separated list of routes.")
Reported-by: syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67461f7f.050a0220.1286eb.0021.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Kui-Feng Lee <thinker.li@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14 20:03:04 +01:00
Hangbin Liu
cb74207ef9 net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged
[ Upstream commit 00b5b7aab9 ]

RFC8981 section 3.4 says that existing temporary addresses must have their
lifetimes adjusted so that no temporary addresses should ever remain "valid"
or "preferred" longer than the incoming SLAAC Prefix Information. This would
strongly imply in Linux's case that if the "mngtmpaddr" address is deleted or
un-flagged as such, its corresponding temporary addresses must be cleared out
right away.

But now the temporary address is renewed even after ‘mngtmpaddr’ is removed
or becomes unmanaged as manage_tempaddrs() set temporary addresses
prefered/valid time to 0, and later in addrconf_verify_rtnl() all checkings
failed to remove the addresses. Fix this by deleting the temporary address
directly for these situations.

Fixes: 778964f2fd ("ipv6/addrconf: fix timing bug in tempaddr regen")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05 14:02:32 +01:00
Al Viro
5f60d5f6bb move asm/unaligned.h to linux/unaligned.h
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.

auto-generated by the following:

for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02 17:23:23 -04:00
Xi Huang
d35a3a8f1b ipv6: remove redundant check
err varibale will be set everytime,like -ENOBUFS and in if (err < 0),
 when code gets into this path. This check will just slowdown
the execution and that's all.

Signed-off-by: Xi Huang <xuiagnh@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20240820115442.49366-1-xuiagnh@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21 17:21:09 -07:00
Kuniyuki Iwashima
de67763cbd ip: Move INFINITY_LIFE_TIME to addrconf.h.
INFINITY_LIFE_TIME is the common value used in IPv4 and IPv6 but defined
in both .c files.

Also, 0xffffffff used in addrconf_timeout_fixup() is INFINITY_LIFE_TIME.

Let's move INFINITY_LIFE_TIME's definition to addrconf.h

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240809235406.50187-6-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-15 18:56:14 -07:00
Patrick Rohr
990c304930 Add support for PIO p flag
draft-ietf-6man-pio-pflag is adding a new flag to the Prefix Information
Option to signal that the network can allocate a unique IPv6 prefix per
client via DHCPv6-PD (see draft-ietf-v6ops-dhcp-pd-per-device).

When ra_honor_pio_pflag is enabled, the presence of a P-flag causes
SLAAC autoconfiguration to be disabled for that particular PIO.

An automated test has been added in Android (r.android.com/3195335) to
go along with this change.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David Lamparter <equinox@opensourcerouting.org>
Cc: Simon Horman <horms@kernel.org>
Signed-off-by: Patrick Rohr <prohr@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31 13:49:48 +01:00
Joel Granados
78eb4ea25c sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.

This patch has been generated by the following coccinelle script:

```
  virtual patch

  @r1@
  identifier ctl, write, buffer, lenp, ppos;
  identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int write, void *buffer, size_t *lenp, loff_t *ppos);

  @r2@
  identifier func, ctl, write, buffer, lenp, ppos;
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int write, void *buffer, size_t *lenp, loff_t *ppos)
  { ... }

  @r3@
  identifier func;
  @@

  int func(
  - struct ctl_table *
  + const struct ctl_table *
    ,int , void *, size_t *, loff_t *);

  @r4@
  identifier func, ctl;
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int , void *, size_t *, loff_t *);

  @r5@
  identifier func, write, buffer, lenp, ppos;
  @@

  int func(
  - struct ctl_table *
  + const struct ctl_table *
    ,int write, void *buffer, size_t *lenp, loff_t *ppos);

```

* Code formatting was adjusted in xfs_sysctl.c to comply with code
  conventions. The xfs_stats_clear_proc_handler,
  xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
  adjusted.

* The ctl_table argument in proc_watchdog_common was const qualified.
  This is called from a proc_handler itself and is calling back into
  another proc_handler, making it necessary to change it as part of the
  proc_handler migration.

Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-24 20:59:29 +02:00
Jakub Kicinski
51b35d4f9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in late fixes to prepare for the 6.11 net-next PR.

Conflicts:
  93c3a96c30 ("net: pse-pd: Do not return EOPNOSUPP if config is null")
  4cddb0f15e ("net: ethtool: pse-pd: Fix possible null-deref")
  30d7b67277 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15 13:19:17 -07:00
Nicolas Dichtel
abb9a68d2c ipv6: take care of scope when choosing the src addr
When the source address is selected, the scope must be checked. For
example, if a loopback address is assigned to the vrf device, it must not
be chosen for packets sent outside.

CC: stable@vger.kernel.org
Fixes: afbac6010a ("net: ipv6: Address selection needs to consider L3 domains")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240710081521.3809742-4-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14 07:34:16 -07:00
Thomas Weißschuh
c55eb03765 net/ipv6/addrconf: constify ctl_table arguments of utility functions
The sysctl core is preparing to only expose instances of
struct ctl_table as "const".
This will also affect the ctl_table argument of sysctl handlers.

As the function prototype of all sysctl handlers throughout the tree
needs to stay consistent that change will be done in one commit.

To reduce the size of that final commit, switch utility functions which
are not bound by "typedef proc_handler" to "const struct ctl_table".

No functional change.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20240527-sysctl-const-handler-net-v1-3-16523767d0b2@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-28 19:49:47 -07:00
Joel Granados
1c106eb01c net: ipv{6,4}: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

* Remove sentinel element from ctl_table structs.
* Remove the zeroing out of an array element (to make it look like a
  sentinel) in sysctl_route_net_init And ipv6_route_sysctl_init.
  This is not longer needed and is safe after commit c899710fe7
  ("networking: Update to register_net_sysctl_sz") added the array size
  to the ctl_table registration.
* Remove extra sentinel element in the declaration of devinet_vars.
* Removed the "-1" in __devinet_sysctl_register, sysctl_route_net_init,
  ipv6_sysctl_net_init and ipv4_sysctl_init_net that adjusted for having
  an extra empty element when looping over ctl_table arrays
* Replace the for loop stop condition in __addrconf_sysctl_register that
  tests for procname == NULL with one that depends on array size
* Removing the unprivileged user check in ipv6_route_sysctl_init is
  safe as it is replaced by calling ipv6_route_sysctl_table_size;
  introduced in commit c899710fe7 ("networking: Update to
  register_net_sysctl_sz")
* Use a table_size variable to keep the value of ARRAY_SIZE

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-03 13:29:42 +01:00
Thomas Weißschuh
bfa858f220 sysctl: treewide: constify ctl_table_header::ctl_table_arg
To be able to constify instances of struct ctl_tables it is necessary to
remove ways through which non-const versions are exposed from the
sysctl core.
One of these is the ctl_table_arg member of struct ctl_table_header.

Constify this reference as a prerequisite for the full constification of
struct ctl_table instances.
No functional change.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22 08:56:31 +01:00
Jakub Kicinski
94426ed213 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/unix/garbage.c
  47d8ac011f ("af_unix: Fix garbage collector racing against connect()")
  4090fa373f ("af_unix: Replace garbage collection algorithm.")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  faa12ca245 ("bnxt_en: Reset PTP tx_avail after possible firmware reset")
  b3d0083caf ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()")

drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
  7ac10c7d72 ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()")
  194fad5b27 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions")

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  958f56e483 ("net/mlx5e: Un-expose functions in en.h")
  49e6c93870 ("net/mlx5e: RSS, Block XOR hash with over 128 channels")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11 14:23:47 -07:00
Jiri Benc
7633c4da91 ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
still means hlist_for_each_entry_rcu can return an item that got removed
from the list. The memory itself of such item is not freed thanks to RCU
but nothing guarantees the actual content of the memory is sane.

In particular, the reference count can be zero. This can happen if
ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
timing, this can happen:

1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.

2. Then, the whole ipv6_del_addr is executed for the given entry. The
   reference count drops to zero and kfree_rcu is scheduled.

3. ipv6_get_ifaddr continues and tries to increments the reference count
   (in6_ifa_hold).

4. The rcu is unlocked and the entry is freed.

5. The freed entry is returned.

Prevent increasing of the reference count in such case. The name
in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.

[   41.506330] refcount_t: addition on 0; use-after-free.
[   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
[   41.507413] Modules linked in: veth bridge stp llc
[   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be82afa #14
[   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
[   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
[   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
[   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
[   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
[   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
[   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
[   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
[   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
[   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
[   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
[   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   41.516799] Call Trace:
[   41.517037]  <TASK>
[   41.517249]  ? __warn+0x7b/0x120
[   41.517535]  ? refcount_warn_saturate+0xa5/0x130
[   41.517923]  ? report_bug+0x164/0x190
[   41.518240]  ? handle_bug+0x3d/0x70
[   41.518541]  ? exc_invalid_op+0x17/0x70
[   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
[   41.521325]  ? refcount_warn_saturate+0xa5/0x130
[   41.521708]  ipv6_get_ifaddr+0xda/0xe0
[   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
[   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
[   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
[   41.523102]  ? netlink_unicast+0x30f/0x390
[   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   41.523832]  netlink_rcv_skb+0x53/0x100
[   41.524157]  netlink_unicast+0x23b/0x390
[   41.524484]  netlink_sendmsg+0x1f2/0x440
[   41.524826]  __sys_sendto+0x1d8/0x1f0
[   41.525145]  __x64_sys_sendto+0x1f/0x30
[   41.525467]  do_syscall_64+0xa5/0x1b0
[   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
[   41.526213] RIP: 0033:0x7fbc4cfcea9a
[   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
[   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
[   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
[   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
[   41.531573]  </TASK>

Fixes: 5c578aedcb ("IPv6: convert addrconf hash list to RCU")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09 17:09:05 -07:00
Alexander Lobakin
117aef12a7 ip_tunnel: use a separate struct to store tunnel params in the kernel
Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure
to store params inside the kernel, IPv4 tunnel code uses the same
ip_tunnel_parm which is being used to talk with the userspace.
This makes it difficult to alter or add any fields or use a
different format for whatever data.
Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for
now, and use it throughout the code. Define the pieces, where the copy
user <-> kernel happens, as standalone functions, and copy the data
there field-by-field, so that the kernel-side structure could be easily
modified later on and the users wouldn't have to care about this.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01 10:49:28 +01:00
Ido Schimmel
c04f7dfe6e ipv6: Fix address dump when IPv6 is disabled on an interface
Cited commit started returning an error when user space requests to dump
the interface's IPv6 addresses and IPv6 is disabled on the interface.
Restore the previous behavior and do not return an error.

Before cited commit:

 # ip address show dev dummy1
 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff
     inet6 fe80::1852:2ff:fe5a:c26e/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
 # ip link set dev dummy1 mtu 1000
 # ip address show dev dummy1
 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff

After cited commit:

 # ip address show dev dummy1
 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 1e:9b:94:00:ac:e8 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::1c9b:94ff:fe00:ace8/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
 # ip link set dev dummy1 mtu 1000
 # ip address show dev dummy1
 RTNETLINK answers: No such device
 Dump terminated

With this patch:

 # ip address show dev dummy1
 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff
     inet6 fe80::4035:fcff:fe53:66cf/64 scope link proto kernel_ll
        valid_lft forever preferred_lft forever
 # ip link set dev dummy1 mtu 1000
 # ip address show dev dummy1
 2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff

Fixes: 9cc4cc329d ("ipv6: use xa_array iterator to implement inet6_dump_addr()")
Reported-by: Gal Pressman <gal@nvidia.com>
Closes: https://lore.kernel.org/netdev/7e261328-42eb-411d-b1b4-ad884eeaae4d@linux.dev/
Tested-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240321173042.2151756-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-22 18:03:40 -07:00
Eric Dumazet
155549a668 ipv6: remove RTNL protection from inet6_dump_addr()
We can now remove RTNL acquisition while running
inet6_dump_addr(), inet6_dump_ifmcaddr()
and inet6_dump_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
9cc4cc329d ipv6: use xa_array iterator to implement inet6_dump_addr()
inet6_dump_addr() can use the new xa_array iterator
for better scalability.

Make it ready for RCU-only protection.
RTNL use is removed in the following patch.

Also properly return 0 at the end of a dump to avoid
and extra recvmsg() to get NLMSG_DONE.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:36 +00:00
Eric Dumazet
46f5182dd7 ipv6: make in6_dump_addrs() lockless
in6_dump_addrs() is called with RCU protection.

There is no need holding idev->lock to iterate through unicast addresses.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:35 +00:00
Eric Dumazet
f0a7da7020 ipv6: make inet6_fill_ifaddr() lockless
Make inet6_fill_ifaddr() lockless, and add approriate annotations
on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto
and ifa->rt_priority.

Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr()
and inet6_fill_ifacaddr().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08 11:15:35 +00:00
Eric Dumazet
02e24903e5 netlink: let core handle error cases in dump operations
After commit b5a899154a ("netlink: handle EMSGSIZE errors
in the core"), we can remove some code that was not 100 % correct
anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07 20:48:22 -08:00
Eric Dumazet
2a02f8379b ipv6: use xa_array iterator to implement inet6_netconf_dump_devconf()
1) inet6_netconf_dump_devconf() can run under RCU protection
   instead of RTNL.

2) properly return 0 at the end of a dump, avoiding an
   an extra recvmsg() system call.

3) Do not use inet6_base_seq() anymore, for_each_netdev_dump()
   has nice properties. Restarting a GETDEVCONF dump if a device has
   been added/removed or if net->ipv6.dev_addr_genid has changed is moot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:33 +00:00
Eric Dumazet
2f0ff05a44 ipv6/addrconf: annotate data-races around devconf fields (II)
Final (?) round of this series.

Annotate lockless reads on following devconf fields,
because they be changed concurrently from /proc/net/ipv6/conf.

- accept_dad
- optimistic_dad
- use_optimistic
- use_oif_addrs_only
- ra_honor_pio_life
- keep_addr_on_down
- ndisc_notify
- ndisc_evict_nocarrier
- suppress_frag_ndisc
- addr_gen_mode
- seg6_enabled
- ioam6_enabled
- ioam6_id
- ioam6_id_wide
- drop_unicast_in_l2_multicast
- mldv[12]_unsolicited_report_interval
- force_mld_version
- force_tllao
- accept_untracked_na
- drop_unsolicited_na
- accept_source_route

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:33 +00:00
Eric Dumazet
2aba913f99 ipv6/addrconf: annotate data-races around devconf fields (I)
Annotate lockless reads and writes on following devconf fields:

- regen_min_advance
- regen_max_retry
- dad_transmits
- use_tempaddr
- max_addresses
- max_desync_factor
- temp_valid_lft
- rtr_solicits
- rtr_solicit_max_interval
- rtr_solicit_interval
- rtr_solicit_delay
- enhanced_dad
- accept_redirects

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:32 +00:00
Eric Dumazet
45b90ec9a2 ipv6: addrconf_disable_policy() optimization
Writing over /proc/sys/net/ipv6/conf/default/disable_policy
does not need to hold RTNL.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:32 +00:00
Eric Dumazet
624d5aec48 ipv6: annotate data-races around devconf->disable_policy
idev->cnf.disable_policy and net->ipv6.devconf_all->disable_policy
can be read locklessly. Add appropriate annotations on reads
and writes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:32 +00:00
Eric Dumazet
a8fbd4d907 ipv6: annotate data-races around devconf->proxy_ndp
devconf->proxy_ndp can be read and written locklessly,
add appropriate annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:32 +00:00
Eric Dumazet
fca34cc075 ipv6: annotate data-races around idev->cnf.ignore_routes_with_linkdown
idev->cnf.ignore_routes_with_linkdown can be used without any locks,
add appropriate annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:32 +00:00
Eric Dumazet
32f754176e ipv6: annotate data-races around cnf.forwarding
idev->cnf.forwarding and net->ipv6.devconf_all->forwarding
might be read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:31 +00:00
Eric Dumazet
e7135f4849 ipv6: annotate data-races around cnf.mtu6
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:31 +00:00
Eric Dumazet
553ced03b2 ipv6: addrconf_disable_ipv6() optimization
Writing over /proc/sys/net/ipv6/conf/default/disable_ipv6
does not need to hold RTNL.

v3: remove a wrong change (Jakub Kicinski feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:31 +00:00
Eric Dumazet
d289ab65b8 ipv6: annotate data-races around cnf.disable_ipv6
disable_ipv6 is read locklessly, add appropriate READ_ONCE()
and WRITE_ONCE() annotations.

v2: do not preload net before rtnl_trylock() in
    addrconf_disable_ipv6() (Jiri)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01 08:42:30 +00:00
Jakub Kicinski
65f5dd4f02 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/mptcp/protocol.c
  adf1bb78da ("mptcp: fix snd_wnd initialization for passive socket")
  9426ce476a ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/

Adjacent changes:

drivers/dpll/dpll_core.c
  0d60d8df6f ("dpll: rely on rcu for netdev_dpll_pin()")
  e7f8df0e81 ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")

drivers/net/veth.c
  1ce7d306ea ("veth: try harder when allocating queue memory")
  0bef512012 ("net: add netdev_lockdep_set_classes() to virtual drivers")

drivers/net/wireless/intel/iwlwifi/mvm/d3.c
  8c9bef26e9 ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
  78f65fbf42 ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")

net/wireless/nl80211.c
  f78c137533 ("wifi: nl80211: reject iftype change with mesh ID change")
  414532d8aa ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29 14:24:56 -08:00
Eric Dumazet
67ea41d19d inet6: expand rcu_read_lock() scope in inet6_dump_addr()
I missed that inet6_dump_addr() is calling in6_dump_addrs()
from two points.

First one under RTNL protection, and second one under rcu_read_lock().

Since we want to remove RTNL use from inet6_dump_addr() very soon,
no longer assume in6_dump_addrs() is protected by RTNL (even
if this is still the case).

Use rcu_read_lock() earlier to fix this lockdep splat:

WARNING: suspicious RCU usage
6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Not tainted

net/ipv6/addrconf.c:5317 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
3 locks held by syz-executor.2/8834:
  #0: ffff88802f554678 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x119/0x780 net/netlink/af_netlink.c:2338
  #1: ffffffff8f377a88 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0x676/0xda0 net/netlink/af_netlink.c:2265
  #2: ffff88807e5f0580 (&ndev->lock){++--}-{2:2}, at: in6_dump_addrs+0xb8/0x1de0 net/ipv6/addrconf.c:5279

stack backtrace:
CPU: 1 PID: 8834 Comm: syz-executor.2 Not tainted 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
  lockdep_rcu_suspicious+0x220/0x340 kernel/locking/lockdep.c:6712
  in6_dump_addrs+0x1b47/0x1de0 net/ipv6/addrconf.c:5317
  inet6_dump_addr+0x1597/0x1690 net/ipv6/addrconf.c:5428
  netlink_dump+0x6a6/0xda0 net/netlink/af_netlink.c:2266
  __netlink_dump_start+0x59d/0x780 net/netlink/af_netlink.c:2374
  netlink_dump_start include/linux/netlink.h:340 [inline]
  rtnetlink_rcv_msg+0xcf7/0x10d0 net/core/rtnetlink.c:6555
  netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2547
  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
  netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
  netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1902
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg+0x221/0x270 net/socket.c:745
  ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
  ___sys_sendmsg net/socket.c:2638 [inline]
  __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667

Fixes: c3718936ec ("ipv6: anycast: complete RCU handling of struct ifacaddr6")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240227222259.4081489-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-28 20:22:28 -08:00
Eric Dumazet
c3718936ec ipv6: anycast: complete RCU handling of struct ifacaddr6
struct ifacaddr6 are already freed after RCU grace period.

Add __rcu qualifier to aca_next pointer, and idev->ac_list

Add relevant rcu_assign_pointer() and dereference accessors.

ipv6_chk_acast_dev() no longer needs to acquire idev->lock.

/proc/net/anycast6 is now purely RCU protected, it no
longer acquires idev->lock.

Similarly in6_dump_addrs() can use RCU protection to iterate
through anycast addresses. It was relying on a mixture of RCU
and RTNL but next patches will get rid of RTNL there.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26 18:40:34 -08:00
Eric Dumazet
10bfd453da ipv6: fix potential "struct net" leak in inet6_rtm_getaddr()
It seems that if userspace provides a correct IFA_TARGET_NETNSID value
but no IFA_ADDRESS and IFA_LOCAL attributes, inet6_rtm_getaddr()
returns -EINVAL with an elevated "struct net" refcount.

Fixes: 6ecf4c37eb ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:56:23 +00:00
Eric Dumazet
69fdb7e411 ipv6: switch inet6_dump_ifinfo() to RCU protection
No longer hold RTNL while calling inet6_dump_ifinfo()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet
ac14ad9755 ipv6: use xarray iterator to implement inet6_dump_ifinfo()
Prepare inet6_dump_ifinfo() to run with RCU protection
instead of RTNL and use for_each_netdev_dump() interface.

Also properly return 0 at the end of a dump, avoiding
an extra recvmsg() system call and RTNL acquisition.

Note that RTNL-less dumps need core changes, coming later
in the series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet
8afc7a78d5 ipv6: prepare inet6_fill_ifinfo() for RCU protection
We want to use RCU protection instead of RTNL
for inet6_fill_ifinfo().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Eric Dumazet
4ad2681364 ipv6: prepare inet6_fill_ifla6_attrs() for RCU
We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs()
in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26 11:46:12 +00:00
Jakub Kicinski
fecc51559a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

net/ipv4/udp.c
  f796feabb9 ("udp: add local "peek offset enabled" flag")
  56667da739 ("net: implement lockless setsockopt(SO_PEEK_OFF)")

Adjacent changes:

net/unix/garbage.c
  aa82ac51d6 ("af_unix: Drop oob_skb ref before purging queue in GC.")
  11498715f2 ("af_unix: Remove io_uring code for GC.")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22 15:29:26 -08:00
Eric Dumazet
e898e4cd1a ipv6: properly combine dev_base_seq and ipv6.dev_addr_genid
net->dev_base_seq and ipv6.dev_addr_genid are monotonically increasing.

If we XOR their values, we could miss to detect if both values
were changed with the same amount.

Fixes: 63998ac24f ("ipv6: provide addr and netconf dump consistency info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-18 10:22:27 +00:00