linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-29 12:28:27 +02:00

Author	SHA1	Message	Date
Heiner Kallweit	e211c463b7	net: phy: stop exporting phy_driver_unregister After `42e2a9e11a` ("net: phy: dp83640: improve phydev and driver removal handling") we can stop exporting also phy_driver_unregister(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/2bab950e-4b70-4030-b997-03f48379586f@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 13:17:16 +02:00
Dragos Tatulea	a1b501a8c6	page_pool: Clamp pool size to max 16K pages page_pool_init() returns E2BIG when the page_pool size goes above 32K pages. As some drivers are configuring the page_pool size according to the MTU and ring size, there are cases where this limit is exceeded and the queue creation fails. The page_pool size doesn't have to cover a full queue, especially for larger ring size. So clamp the size instead of returning an error. Do this in the core to avoid having each driver do the clamping. The current limit was deemed to high [1] so it was reduced to 16K to avoid page waste. [1] https://lore.kernel.org/all/1758532715-820422-3-git-send-email-tariqt@nvidia.com/ Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250926131605.2276734-2-dtatulea@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 12:16:23 +02:00
Dmitry Antipov	2ade91705b	tipc: adjust tipc_nodeid2string() to return string length Since the value returned by 'tipc_nodeid2string()' is not used, the function may be adjusted to return the length of the result, which is helpful to drop a few calls to 'strlen()' in 'tipc_link_create()' and 'tipc_link_bc_create()'. Compile tested only. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250926074113.914399-1-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 11:22:39 +02:00
Qingfang Deng	38b04ed707	6pack: drop redundant locking and refcounting The TTY layer already serializes line discipline operations with tty->ldisc_sem, so the extra disc_data_lock and refcnt in 6pack are unnecessary. Removing them simplifies the code and also resolves a lockdep warning reported by syzbot. The warning did not indicate a real deadlock, since the write-side lock was only taken in process context with hardirqs disabled. Reported-by: syzbot+5fd749c74105b0e1b302@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68c858b0.050a0220.3c6139.0d1c.GAE@google.com/ Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/20250925051059.26876-1-dqfext@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 10:10:59 +02:00
Oleksij Rempel	7bd80ed89d	Documentation: net: add flow control guide and document ethtool API Introduce a new document, flow_control.rst, to provide a comprehensive guide on Ethernet Flow Control in Linux. The guide explains how flow control works, how autonegotiation resolves pause capabilities, and how to configure it using ethtool and Netlink. In parallel, document the pause and pause-stat attributes in the ethtool.yaml netlink spec. This enables the ynl tool to generate kernel-doc comments for the corresponding enums in the UAPI header, making the C interface self-documenting. Finally, replace the legacy flow control section in phy.rst with a reference to the new document and add pointers in the relevant C source files. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250924120241.724850-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-30 09:48:31 +02:00
Jakub Kicinski	c5cb31c992	Merge branch 'dpll-add-phase-offset-averaging-factor' Ivan Vecera says: ==================== dpll: add phase offset averaging factor For some hardware, the phase shift may result from averaging previous values and the newly measured value. In this case, the averaging is controlled by a configurable averaging factor. Add new device level attribute phase-offset-avg-factor, appropriate callbacks and implement them in zl3073x driver. ==================== Link: https://patch.msgid.link/20250927084912.2343597-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:57:43 -07:00
Ivan Vecera	9363b48376	dpll: zl3073x: Allow to configure phase offset averaging factor The DPLL phase measurement block uses an exponential moving average with a configurable averaging factor. Measurements are taken at approximately 40 Hz or at the reference frequency, whichever is lower. Currently, factor=2 is used to prioritize fast response for dynamic phase changes. For applications needing a stable, precise average phase offset where rapid changes are unlikely, a higher factor is recommended. Implement the .phase_offset_avg_factor_get/set callbacks to allow a user to adjust this factor. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250927084912.2343597-4-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:57:41 -07:00
Ivan Vecera	e28d5a68b6	dpll: add phase_offset_avg_factor_get/set callback ops Add new callback operations for a dpll device: - phase_offset_avg_factor_get(...) - to obtain current phase offset averaging factor from dpll device, - phase_offset_avg_factor_set(...) - to set phase offset averaging factor Obtain the factor value using the get callback and provide it to the user if the device driver implement this callback. Execute the set callback upon user requests, if the driver implement it. Signed-off-by: Ivan Vecera <ivecera@redhat.com> v2: * do not require 'set' callback to retrieve current value * always call 'set' callback regardless of current value Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250927084912.2343597-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:57:41 -07:00
Ivan Vecera	a680581f6a	dpll: add phase-offset-avg-factor device attribute to netlink spec Add dpll device level attribute DPLL_A_PHASE_OFFSET_AVG_FACTOR to allow control over a calculation of reported phase offset value. Attribute is present, if the driver provides such capability, otherwise attribute shall not be present. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250927084912.2343597-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:57:41 -07:00
Jakub Kicinski	377ea33128	Merge tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-09-28 * tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: IFC add balance ID and LAG per MP group bits net/mlx5: Add IFC bit for TIR/SQ order capability ==================== Link: https://patch.msgid.link/1759093989-841873-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:49:59 -07:00
Russell King (Oracle)	6d3728d424	net: stmmac: remove stmmac_hw_setup() excess documentation parameter The kernel build bot reports: Warning: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3438 Excess function parameter 'ptp_register' description in 'stmmac_hw_setup' Fix it. Reported-by: kernel test robot <lkp@intel.com> Fixes: `98d8ea566b` ("net: stmmac: move timestamping/ptp init to stmmac_hw_setup() caller") Closes: https://lore.kernel.org/oe-kbuild-all/202509290927.svDd6xuw-lkp@intel.com/ Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1v38Y7-00000008UCQ-3w27@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:47:15 -07:00
Jakub Kicinski	4363d18219	Merge branch 'selftest-packetdrill-import-tfo-server-tests' Kuniyuki Iwashima says: ==================== selftest: packetdrill: Import TFO server tests. The series imports 15 TFO server tests from google/packetdrill and adds 2 more tests. The repository has two versions of tests for most scenarios; one uses the non-experimental option (34), and the other uses the experimental option (255) with 0xF989. Basically, we only import the non-experimental version of tests, and for the experimental option, tcp_fastopen_server_experimental_option.pkt is added. The following tests are not (yet) imported: * icmp-baseline.pkt * simple1.pkt / simple2.pkt / simple3.pkt The former is completely covered by icmp-before-accept.pkt. The later's delta is the src/dst IP pair to generate a different cookie, but supporting dualstack requires churn in ksft_runner.sh, so defered to future series. Also, sockopt-fastopen-key.pkt covers the same function. The following tests have the experimental version only, so converted to the non-experimental option: * client-ack-dropped-then-recovery-ms-timestamps.pkt * sockopt-fastopen-key.pkt For the imported tests, these common changes are applied. * Add SPDX header * Adjust path to default.sh * Adjust sysctl w/ set_sysctls.py * Use TFO_COOKIE instead of a raw hex value * Use SOCK_NONBLOCK for socket() not to block accept() * Add assertions for TCP state if commented * Remove unnecessary delay (e.g. +0.1 setsockopt(SO_REUSEADDR), etc) With this series, except for simple{1,2,3}.pkt, we can remove TFO server tests in google/packetdrill. ==================== Link: https://patch.msgid.link/20250927213022.1850048-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:40 -07:00
Kuniyuki Iwashima	9b62d53cc8	selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt This also does not have the non-experimental version, so converted to FO. The comment in .pkt explains the detailed scenario. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	05b9f505fb	selftest: packetdrill: Import sockopt-fastopen-key.pkt sockopt-fastopen-key.pkt does not have the non-experimental version, so the Experimental version is converted, FOEXP -> FO. The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead sets another key via setsockopt(TCP_FASTOPEN_KEY). The first listener generates a valid cookie in response to TFO option without cookie, and the second listner creates a TFO socket using the valid cookie. TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh so that we can use TFO_COOKIE and support dualstack. Similarly, TFO_COOKIE_ZERO for the 0-0-0-0 key is defined. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	be90c7b3d5	selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt. These changes are applied to follow the imported packetdrill tests. * Call setsockopt(TCP_FASTOPEN) * Remove unnecessary accept() delay * Add assertion for TCP states * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	21f7fb31ae	selftest: packetdrill: Import opt34/-trigger-rst.pkt. This imports the non-experimental version of opt34/-trigger-rst.pkt. \| accept() \| SYN data \| -----------------------------------+----------+----------+ listener-closed-trigger-rst.pkt \| no \| unread \| unread-data-closed-trigger-rst.pkt \| yes \| unread \| Both files test that close()ing a SYN_RECV socket with unread SYN data triggers RST. The files are renamed to have the common prefix, trigger-rst. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	5920f154e1	selftest: packetdrill: Import opt34/reset-* tests. This imports the non-experimental version of opt34/reset-*.pkt. \| Child \| RST \| sk_err \| ---------------------------------+---------+-------------------------------+---------+ reset-after-accept.pkt \| TFO \| after accept(), SYN_RECV \| read() \| reset-close-with-unread-data.pkt \| TFO \| after accept(), SYN_RECV \| write() \| reset-before-accept.pkt \| TFO \| before accept(), SYN_RECV \| read() \| reset-non-tfo-socket.pkt \| non-TFO \| before accept(), ESTABLISHED \| write() \| The first 3 files test scenarios where a SYN_RECV socket receives RST before/after accept() and data in SYN must be read() without error, but the following read() or fist write() will return ECONNRESET. The last test is similar but with non-TFO socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	a8b1750e68	selftest: packetdrill: Import opt34/icmp-before-accept.pkt. This imports the non-experimental version of icmp-before-accept.pkt. This file tests the scenario where an ICMP unreachable packet for a not-yet-accept()ed socket changes its state to TCP_CLOSE, but the SYN data must be read without error, and the following read() returns EHOSTUNREACH. Note that this test support only IPv4 as icmp is used. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	5ed080f85a	selftest: packetdrill: Import opt34/fin-close-socket.pkt. This imports the non-experimental version of fin-close-socket.pkt. This file tests the scenario where a TFO child socket's state transitions from SYN_RECV to CLOSE_WAIT before accept()ed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	e57b3933ab	selftest: packetdrill: Add test for experimental option. The only difference between non-experimental vs experimental TFO option handling is SYN+ACK generation. When tcp_parse_fastopen_option() parses a TFO option, it sets tcp_fastopen_cookie.exp to false if the option number is 34, and true if 255. The value is carried to tcp_options_write() to generate a TFO option with the same option number. Other than that, all the TFO handling is the same and the kernel must generate the same cookie regardless of the option number. Let's add a test for the handling so that we can consolidate fastopen/server/ tests and fastopen/server/opt34 tests. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:39 -07:00
Kuniyuki Iwashima	399e0a7ed9	selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1. TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and each server test requires setsockopt(TCP_FASTOPEN). Let's add a basic test for TFO_SERVER_WO_SOCKOPT1. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima	0b8f164eb2	selftest: packetdrill: Import TFO server basic tests. This imports basic TFO server tests from google/packetdrill. The repository has two versions of tests for most scenarios; one uses the non-experimental option (34), and the other uses the experimental option (255) with 0xF989. This only imports the following tests of the non-experimental version placed in [0]. I will add a specific test for the experimental option handling later. \| TFO \| Cookie \| Payload \| ---------------------------+-----+--------+---------+ basic-rw.pkt \| yes \| yes \| yes \| basic-zero-payload.pkt \| yes \| yes \| no \| basic-cookie-not-reqd.pkt \| yes \| no \| yes \| basic-non-tfo-listener.pkt \| no \| yes \| yes \| pure-syn-data.pkt \| yes \| no \| yes \| The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did not test TFO server in some scenarios unintentionally, so setsockopt() is added where needed. In addition, non-TFO scenario is stripped as it is covered by basic-non-tfo-listener.pkt. Also, I added basic- prefix. Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:38 -07:00
Kuniyuki Iwashima	97b3b8306f	selftest: packetdrill: Define common TCP Fast Open cookie. TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher(). The cookie value is generated from src/dst IPs and a key configured by setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key. The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill defines the corresponding cookie as TFO_COOKIE in run_all.py. [0] Then, each test does not need to care about the value, and we can easily update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the algorithm. However, some tests use the bare hex value for specific IPv4 addresses and do not support IPv6. Let's define the same TFO_COOKIE in ksft_runner.sh. We will replace such bare hex values with TFO_COOKIE except for a single test for setsockopt(TCP_FASTOPEN_KEY). Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:19 -07:00
Kuniyuki Iwashima	261cb8b123	selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN). To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must have 0x2 (TFO_SERVER_ENABLE), and we need to do either 1. Call setsockopt(TCP_FASTOPEN) for the socket 2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen The default.sh sets 0x70403 so that each test does not need setsockopt(). (0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???) However, some tests overwrite net.ipv4.tcp_fastopen without TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN). For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally, except in the first scenario. To prevent such an accident, let's require explicit setsockopt(). TFO_CLIENT_ENABLE is necessary for tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt. Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:07 -07:00
Kuniyuki Iwashima	70dd4775db	selftest: packetdrill: Set ktap_set_plan properly for single protocol test. The cited commit forgot to update the ktap_set_plan call. ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP + KTAP_CNT_XFAIL) in ktap_finished. Otherwise, the selftest exit()s with 1. Let's adjust KSFT_NUM_TESTS based on supported protocols. While at it, misalignment is fixed up. Fixes: `a5c10aa3d1` ("selftests/net: packetdrill: Support single protocol test.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:41:06 -07:00
Alok Tiwari	4ed9db2dc5	net: rtnetlink: fix typo in rtnl_unregister_all() comment Corrected "rtnl_unregster()" -> "rtnl_unregister()" in the documentation comment of "rtnl_unregister_all()" Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250929085418.49200-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:31:08 -07:00
Eric Dumazet	7d452516b6	Revert "net: group sk_backlog and sk_receive_queue" This reverts commit `4effb335b5`. This was a benefit for UDP flood case, which was later greatly improved with commits `6471658dc6` ("udp: use skb_attempt_defer_free()") and `b650bf0977` ("udp: remove busylock and add per NUMA queues"). Apparently blamed commit added a regression for RAW sockets, possibly because they do not use the dual RX queue strategy that UDP has. sock_queue_rcv_skb_reason() and RAW recvmsg() compete for sk_receive_buf and sk_rmem_alloc changes, and them being in the same cache line reduce performance. Fixes: `4effb335b5` ("net: group sk_backlog and sk_receive_queue") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509281326.f605b4eb-lkp@intel.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250929182112.824154-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:30:32 -07:00
Furong Xu	9dd4e022bf	net: stmmac: Convert open-coded register polling to helper macro Drop the open-coded register polling routines. Use readl_poll_timeout_atomic() in atomic state. Also adjust the delay time to 10us which seems more reasonable. Tested on NXP i.MX8MP and ROCKCHIP RK3588 boards, the break condition was met right after the first polling, no delay involved at all. So the 10us delay should be long enough for most cases. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250927081036.10611-1-0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:27:45 -07:00
Jakub Kicinski	74f7c5233e	Merge branch 'mptcp-receive-path-improvement' Matthieu Baerts says: ==================== mptcp: receive path improvement This series includes several changes to the MPTCP RX path. The main goals are improving the RX performances, and increase the long term maintainability. Some changes reflects recent(ish) improvements introduced in the TCP stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free and auto-tuning improvements. Note that patch 3 could possibly fix additional issues, and overall such patch should protect from similar issues to arise in the future. Patches 4-7 are aimed at introducing the socket backlog usage which will be done in a later series to process the packets received by the different subflows while the msk socket is owned. Patch 8 is not related to the RX path, but it contains additional tests for new features recently introduced in net-next. ==================== Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-0-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:37 -07:00
Matthieu Baerts (NGI0)	c912f935a5	selftests: mptcp: join: validate new laminar endp Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar' endpoint type. In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified: - Without any 'laminar' endpoints: no new subflows are created. - With one 'laminar' endpoint: a second subflow is created. - With multiple 'laminar' endpoints: 2 IPv4 subflows are created. - With one 'laminar' endpoint, but the server announcing a second IP address, only one subflow is created. - With one 'laminar' + 'subflow' endpoint, the same endpoint is only used once. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:36 -07:00
Paolo Abeni	59701b1870	mptcp: minor move_skbs_to_msk() cleanup Such function is called only by __mptcp_data_ready(), which in turn is always invoked when msk is not owned by the user: we can drop the redundant, related check. Additionally mptcp needs to propagate the socket error only for current subflow. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-7-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:36 -07:00
Paolo Abeni	68c7af988b	mptcp: factor out a basic skb coalesce helper The upcoming patch will introduced backlog processing for MPTCP socket, and we want to leverage coalescing in such data path. Factor out the relevant bits not touching memory accounting to deal with such use-case. Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-6-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:35 -07:00
Paolo Abeni	c4ebc4ee4e	mptcp: remove unneeded mptcp_move_skb() Since commit `b7535cfed2` ("mptcp: drop legacy code around RX EOF"), sk_shutdown can't change during the main recvmsg loop, we can drop the related race breaker. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-5-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:35 -07:00
Paolo Abeni	9a0afe0db4	mptcp: introduce the mptcp_init_skb helper Factor out all the skb initialization step in a new helper and use it. Note that this change moves the MPTCP CB initialization earlier: we can do such step as soon as the skb leaves the subflow socket receive queues. Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-4-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:35 -07:00
Paolo Abeni	e118cdc34d	mptcp: rcvbuf auto-tuning improvement Apply to the MPTCP auto-tuning the same improvements introduced for the TCP protocol by the merge commit `2da35e4b4d` ("Merge branch 'tcp-receive-side-improvements'"). The main difference is that TCP subflow and the main MPTCP socket need to account separately for OoO: MPTCP does not care for TCP-level OoO and vice versa, as a consequence do not reflect MPTCP-level rcvbuf increase due to OoO packets at the subflow level. This refeactor additionally allow dropping the msk receive buffer update at receive time, as the latter only intended to cope with subflow receive buffer increase due to OoO packets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559 Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-3-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:35 -07:00
Paolo Abeni	a755677974	tcp: make tcp_rcvbuf_grow() accessible to mptcp code To leverage the auto-tuning improvements brought by commit `2da35e4b4d` ("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need to access the mentioned helper. Acked-by: Geliang Tang <geliang@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-2-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:35 -07:00
Paolo Abeni	9aa59323f2	mptcp: leverage skb deferral free Usage of the skb deferral API is straight-forward; with multiple subflows actives this allow moving part of the received application load into multiple CPUs. Also fix a typo in the related comment. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-1-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:23:34 -07:00
Eric Dumazet	f017c1f768	tcp: use skb->len instead of skb->truesize in tcp_can_ingest() Some applications are stuck to the 20th century and still use small SO_RCVBUF values. After the blamed commit, we can drop packets especially when using LRO/hw-gro enabled NIC and small MSS (1500) values. LRO/hw-gro NIC pack multiple segments into pages, allowing tp->scaling_ratio to be set to a high value. Whenever the receive queue gets full, we can receive a small packet filling RWIN, but with a high skb->truesize, because most NIC use 4K page plus sk_buff metadata even when receiving less than 1500 bytes of payload. Even if we refine how tp->scaling_ratio is estimated, we could have an issue at the start of the flow, because the first round of packets (IW10) will be sent based on the initial tp->scaling_ratio (1/2) Relax tcp_can_ingest() to use skb->len instead of skb->truesize, allowing the peer to use final RWIN, assuming a 'perfect' scaling_ratio of 1. Fixes: `1d2fbaad7c` ("tcp: stronger sk_rcvbuf checks") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250927092827.2707901-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:20:35 -07:00
Jakub Kicinski	d210ee58da	Merge tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver - Avoid a couple dozen -Wflex-array-member-not-at-end warnings - bcsp: receive data only if registered - HCI: Fix using LE/ACL buffers for ISO packets - hci_core: Detect if an ISO link has stalled - ISO: Don't initiate CIS connections if there are no buffers - ISO: Use sk_sndtimeo as conn_timeout drivers: - btusb: Check for unexpected bytes when defragmenting HCI frames - btusb: Add new VID/PID 13d3/3627 for MT7925 - btusb: Add new VID/PID 13d3/3633 for MT7922 - btusb: Add USB ID 2001:332a for D-Link AX9U rev. A1 - btintel: Add support for BlazarIW core - btintel_pcie: Add support for _suspend() / _resume() - btintel_pcie: Define hdev->wakeup() callback - btintel_pcie: Add Bluetooth core/platform as comments - btintel_pcie: Add id of Scorpious, Panther Lake-H484 - btintel_pcie: Refactor Device Coredump * tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (30 commits) Bluetooth: Avoid a couple dozen -Wflex-array-member-not-at-end warnings Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements Bluetooth: ISO: don't leak skb in ISO_CONT RX Bluetooth: ISO: free rx_skb if not consumed Bluetooth: ISO: Fix possible UAF on iso_conn_free Bluetooth: SCO: Fix UAF on sco_conn_free Bluetooth: bcsp: receive data only if registered Bluetooth: btusb: Add new VID/PID 13d3/3633 for MT7922 Bluetooth: btusb: Add new VID/PID 13d3/3627 for MT7925 Bluetooth: remove duplicate h4_recv_buf() in header Bluetooth: btusb: Check for unexpected bytes when defragmenting HCI frames Bluetooth: hci_core: Print information of hcon on hci_low_sent Bluetooth: hci_core: Print number of packets in conn->data_q Bluetooth: Add function and line information to bt_dbg Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO Bluetooth: hci_core: Detect if an ISO link has stalled Bluetooth: ISO: Use sk_sndtimeo as conn_timeout Bluetooth: HCI: Fix using LE/ACL buffers for ISO packets Bluetooth: ISO: Don't initiate CIS connections if there are no buffers MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver ... ==================== Link: https://patch.msgid.link/20250927154616.1032839-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:13:51 -07:00
Michael S. Tsirkin	c39d6d4d93	ptr_ring: __ptr_ring_zero_tail micro optimization __ptr_ring_zero_tail currently does the - 1 operation twice: - during initialization of head - at each loop iteration Let's just do it in one place, all we need to do is adjust the loop condition. this is better: - a slightly clearer logic with less duplication - uses prefix -- we don't need to save the old value - one less - 1 operation - for example, when ring is empty we now don't do - 1 at all, existing code does it once Text size shrinks from 15081 to 15050 bytes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/bcd630c7edc628e20d4f8e037341f26c90ab4365.1758976026.git.mst@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:13:10 -07:00
Jakub Kicinski	e8c4840d0c	Merge branch 'net-wangxun-support-to-configure-rss' Jiawen Wu says: ==================== net: wangxun: support to configure RSS Implement ethtool ops for RSS configuration, and support multiple RSS for multiple pools. ==================== Link: https://patch.msgid.link/20250926023843.34340-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:11:18 -07:00
Jiawen Wu	2a251b85ce	net: libwx: restrict change user-set RSS configuration Enable/disable SR-IOV will change the number of rings, thereby changing the RSS configuration that the user has set. So reject these attempts if netif_is_rxfh_configured() returns true. And remind the user to reset the RSS configuration. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-5-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:11:16 -07:00
Jiawen Wu	2556f80a6a	net: wangxun: add RSS reta and rxfh fields support Add ethtool ops for Rx flow hashing, query and set RSS indirection table and hash key. Disable UDP RSS by default, and support to configure L4 header fields with TCP/UDP/SCTP for flow hasing. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:11:16 -07:00
Jiawen Wu	58f244b256	net: libwx: move rss_field to struct wx For global RSS and multiple RSS scheme, the RSS type fields are defined identically in the registers. So they can be defined as the macros WX_RSS_FIELD_* to cleanup the codes. And to prepare for the RXFH support in the next patch, move the rss_field to struct wx. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:11:16 -07:00
Jiawen Wu	1be6db0497	net: libwx: support separate RSS configuration for every pool For those devices which support 64 pools, they also support PF and VF (i.e. different pools) to configure different RSS key and hash table. Enable multiple RSS, use up to 64 RSS configurations and each pool has a specific configuration. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:11:16 -07:00
Eric Dumazet	1fb0e47161	net: remove one stac/clac pair from move_addr_to_user() Convert the get_user() and __put_user() code to the fast masked_user_access_begin()/unsafe_{get\|put}_user() variant. This patch increases the performance of an UDP recvfrom() receiver (netserver) on 120 bytes messages by 7 % on an AMD EPYC 7B12 64-Core Processor platform. Presence of audit_sockaddr() makes difficult to avoid the stac/clac pair in the copy_to_user() call, this is left for a future patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250925230929.3727873-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:04:37 -07:00
Eric Dumazet	2b235765e9	scm: use masked_user_access_begin() in put_cmsg() Use the greatest and latest uaccess construct to get an optimal code. Before : lea (%r9,%rcx,1),%r10 movabs $<USER_PTR_MAX>,%r11 mov $0xfffffff2,%eax cmp %rcx,%r10 jb ffffffff81cdc312 <put_cmsg+0x152> cmp %r11,%r10 ja ffffffff81cdc312 <put_cmsg+0x152> stac lfence mov %r9,(%rcx) After: movabs $<USER_PTR_MAX>,%r9 cmp %r9,%rax cmova %r9,%rax stac mov %rcx,(%rax) Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250925224914.3590290-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 18:03:42 -07:00
Jakub Kicinski	3806446f60	Merge branch 'net-stmmac-drop-frames-causing-hlbs-error' Rohan G Thomas says: ==================== net: stmmac: Drop frames causing HLBS error This patchset consists of following patchset to avoid netdev watchdog reset due to Head-of-Line Blocking due to EST scheduling error. 1. Drop those frames causing HLBS error 2. Add HLBS frame drops to taprio stats v2: https://lore.kernel.org/r/20250915-hlbs_2-v2-1-27266b2afdd9@altera.com ==================== Link: https://patch.msgid.link/20250925-hlbs_2-v3-0-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 17:49:36 -07:00
Rohan G Thomas	de17376cad	net: stmmac: tc: Add HLBS drop count to taprio stats Add the count of the frames dropped by Head-Of-Line Blocking due to Scheduling(HLBS) error to taprio window drop count stats. Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250925-hlbs_2-v3-2-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 17:49:34 -07:00
Rohan G Thomas	7ce48d4974	net: stmmac: est: Drop frames causing HLBS error Drop those frames causing Head-of-Line Blocking due to Scheduling (HLBS) error to avoid HLBS interrupt flooding and netdev watchdog timeouts due to blocked packets. Tx queues can be configured to drop those blocked packets by setting Drop Frames causing Scheduling Error (DFBS) bit of EST_CONTROL register. Also, add per queue HLBS drop count. Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250925-hlbs_2-v3-1-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-29 17:49:34 -07:00

1 2 3 4 5 ...

1385352 Commits