mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2026-03-03 18:28:01 +01:00
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
249 lines
8.7 KiB
C
249 lines
8.7 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _LINUX_RCULIST_NULLS_H
|
|
#define _LINUX_RCULIST_NULLS_H
|
|
|
|
#ifdef __KERNEL__
|
|
|
|
/*
|
|
* RCU-protected list version
|
|
*/
|
|
#include <linux/list_nulls.h>
|
|
#include <linux/rcupdate.h>
|
|
|
|
/**
|
|
* hlist_nulls_del_init_rcu - deletes entry from hash list with re-initialization
|
|
* @n: the element to delete from the hash list.
|
|
*
|
|
* Note: hlist_nulls_unhashed() on the node return true after this. It is
|
|
* useful for RCU based read lockfree traversal if the writer side
|
|
* must know if the list entry is still hashed or already unhashed.
|
|
*
|
|
* In particular, it means that we can not poison the forward pointers
|
|
* that may still be used for walking the hash list and we can only
|
|
* zero the pprev pointer so list_unhashed() will return true after
|
|
* this.
|
|
*
|
|
* The caller must take whatever precautions are necessary (such as
|
|
* holding appropriate locks) to avoid racing with another
|
|
* list-mutation primitive, such as hlist_nulls_add_head_rcu() or
|
|
* hlist_nulls_del_rcu(), running on this same list. However, it is
|
|
* perfectly legal to run concurrently with the _rcu list-traversal
|
|
* primitives, such as hlist_nulls_for_each_entry_rcu().
|
|
*/
|
|
static inline void hlist_nulls_del_init_rcu(struct hlist_nulls_node *n)
|
|
{
|
|
if (!hlist_nulls_unhashed(n)) {
|
|
__hlist_nulls_del(n);
|
|
WRITE_ONCE(n->pprev, NULL);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* hlist_nulls_first_rcu - returns the first element of the hash list.
|
|
* @head: the head of the list.
|
|
*/
|
|
#define hlist_nulls_first_rcu(head) \
|
|
(*((struct hlist_nulls_node __rcu __force **)&(head)->first))
|
|
|
|
/**
|
|
* hlist_nulls_next_rcu - returns the element of the list after @node.
|
|
* @node: element of the list.
|
|
*/
|
|
#define hlist_nulls_next_rcu(node) \
|
|
(*((struct hlist_nulls_node __rcu __force **)&(node)->next))
|
|
|
|
/**
|
|
* hlist_nulls_pprev_rcu - returns the dereferenced pprev of @node.
|
|
* @node: element of the list.
|
|
*/
|
|
#define hlist_nulls_pprev_rcu(node) \
|
|
(*((struct hlist_nulls_node __rcu __force **)(node)->pprev))
|
|
|
|
/**
|
|
* hlist_nulls_del_rcu - deletes entry from hash list without re-initialization
|
|
* @n: the element to delete from the hash list.
|
|
*
|
|
* Note: hlist_nulls_unhashed() on entry does not return true after this,
|
|
* the entry is in an undefined state. It is useful for RCU based
|
|
* lockfree traversal.
|
|
*
|
|
* In particular, it means that we can not poison the forward
|
|
* pointers that may still be used for walking the hash list.
|
|
*
|
|
* The caller must take whatever precautions are necessary
|
|
* (such as holding appropriate locks) to avoid racing
|
|
* with another list-mutation primitive, such as hlist_nulls_add_head_rcu()
|
|
* or hlist_nulls_del_rcu(), running on this same list.
|
|
* However, it is perfectly legal to run concurrently with
|
|
* the _rcu list-traversal primitives, such as
|
|
* hlist_nulls_for_each_entry().
|
|
*/
|
|
static inline void hlist_nulls_del_rcu(struct hlist_nulls_node *n)
|
|
{
|
|
__hlist_nulls_del(n);
|
|
WRITE_ONCE(n->pprev, LIST_POISON2);
|
|
}
|
|
|
|
/**
|
|
* hlist_nulls_add_head_rcu
|
|
* @n: the element to add to the hash list.
|
|
* @h: the list to add to.
|
|
*
|
|
* Description:
|
|
* Adds the specified element to the specified hlist_nulls,
|
|
* while permitting racing traversals.
|
|
*
|
|
* The caller must take whatever precautions are necessary
|
|
* (such as holding appropriate locks) to avoid racing
|
|
* with another list-mutation primitive, such as hlist_nulls_add_head_rcu()
|
|
* or hlist_nulls_del_rcu(), running on this same list.
|
|
* However, it is perfectly legal to run concurrently with
|
|
* the _rcu list-traversal primitives, such as
|
|
* hlist_nulls_for_each_entry_rcu(), used to prevent memory-consistency
|
|
* problems on Alpha CPUs. Regardless of the type of CPU, the
|
|
* list-traversal primitive must be guarded by rcu_read_lock().
|
|
*/
|
|
static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
|
|
struct hlist_nulls_head *h)
|
|
{
|
|
struct hlist_nulls_node *first = h->first;
|
|
|
|
WRITE_ONCE(n->next, first);
|
|
WRITE_ONCE(n->pprev, &h->first);
|
|
rcu_assign_pointer(hlist_nulls_first_rcu(h), n);
|
|
if (!is_a_nulls(first))
|
|
WRITE_ONCE(first->pprev, &n->next);
|
|
}
|
|
|
|
/**
|
|
* hlist_nulls_add_tail_rcu
|
|
* @n: the element to add to the hash list.
|
|
* @h: the list to add to.
|
|
*
|
|
* Description:
|
|
* Adds the specified element to the specified hlist_nulls,
|
|
* while permitting racing traversals.
|
|
*
|
|
* The caller must take whatever precautions are necessary
|
|
* (such as holding appropriate locks) to avoid racing
|
|
* with another list-mutation primitive, such as hlist_nulls_add_head_rcu()
|
|
* or hlist_nulls_del_rcu(), running on this same list.
|
|
* However, it is perfectly legal to run concurrently with
|
|
* the _rcu list-traversal primitives, such as
|
|
* hlist_nulls_for_each_entry_rcu(), used to prevent memory-consistency
|
|
* problems on Alpha CPUs. Regardless of the type of CPU, the
|
|
* list-traversal primitive must be guarded by rcu_read_lock().
|
|
*/
|
|
static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n,
|
|
struct hlist_nulls_head *h)
|
|
{
|
|
struct hlist_nulls_node *i, *last = NULL;
|
|
|
|
/* Note: write side code, so rcu accessors are not needed. */
|
|
for (i = h->first; !is_a_nulls(i); i = i->next)
|
|
last = i;
|
|
|
|
if (last) {
|
|
WRITE_ONCE(n->next, last->next);
|
|
WRITE_ONCE(n->pprev, &last->next);
|
|
rcu_assign_pointer(hlist_nulls_next_rcu(last), n);
|
|
} else {
|
|
hlist_nulls_add_head_rcu(n, h);
|
|
}
|
|
}
|
|
|
|
/* after that hlist_nulls_del will work */
|
|
static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
|
|
{
|
|
WRITE_ONCE(n->pprev, &n->next);
|
|
WRITE_ONCE(n->next, (struct hlist_nulls_node *)NULLS_MARKER(NULL));
|
|
}
|
|
|
|
/**
|
|
* hlist_nulls_replace_rcu - replace an old entry by a new one
|
|
* @old: the element to be replaced
|
|
* @new: the new element to insert
|
|
*
|
|
* Description:
|
|
* Replace the old entry with the new one in a RCU-protected hlist_nulls, while
|
|
* permitting racing traversals.
|
|
*
|
|
* The caller must take whatever precautions are necessary (such as holding
|
|
* appropriate locks) to avoid racing with another list-mutation primitive, such
|
|
* as hlist_nulls_add_head_rcu() or hlist_nulls_del_rcu(), running on this same
|
|
* list. However, it is perfectly legal to run concurrently with the _rcu
|
|
* list-traversal primitives, such as hlist_nulls_for_each_entry_rcu().
|
|
*/
|
|
static inline void hlist_nulls_replace_rcu(struct hlist_nulls_node *old,
|
|
struct hlist_nulls_node *new)
|
|
{
|
|
struct hlist_nulls_node *next = old->next;
|
|
|
|
WRITE_ONCE(new->next, next);
|
|
WRITE_ONCE(new->pprev, old->pprev);
|
|
rcu_assign_pointer(hlist_nulls_pprev_rcu(new), new);
|
|
if (!is_a_nulls(next))
|
|
WRITE_ONCE(next->pprev, &new->next);
|
|
}
|
|
|
|
/**
|
|
* hlist_nulls_replace_init_rcu - replace an old entry by a new one and
|
|
* initialize the old
|
|
* @old: the element to be replaced
|
|
* @new: the new element to insert
|
|
*
|
|
* Description:
|
|
* Replace the old entry with the new one in a RCU-protected hlist_nulls, while
|
|
* permitting racing traversals, and reinitialize the old entry.
|
|
*
|
|
* Note: @old must be hashed.
|
|
*
|
|
* The caller must take whatever precautions are necessary (such as holding
|
|
* appropriate locks) to avoid racing with another list-mutation primitive, such
|
|
* as hlist_nulls_add_head_rcu() or hlist_nulls_del_rcu(), running on this same
|
|
* list. However, it is perfectly legal to run concurrently with the _rcu
|
|
* list-traversal primitives, such as hlist_nulls_for_each_entry_rcu().
|
|
*/
|
|
static inline void hlist_nulls_replace_init_rcu(struct hlist_nulls_node *old,
|
|
struct hlist_nulls_node *new)
|
|
{
|
|
hlist_nulls_replace_rcu(old, new);
|
|
WRITE_ONCE(old->pprev, NULL);
|
|
}
|
|
|
|
/**
|
|
* hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type
|
|
* @tpos: the type * to use as a loop cursor.
|
|
* @pos: the &struct hlist_nulls_node to use as a loop cursor.
|
|
* @head: the head of the list.
|
|
* @member: the name of the hlist_nulls_node within the struct.
|
|
*
|
|
* The barrier() is needed to make sure compiler doesn't cache first element [1],
|
|
* as this loop can be restarted [2]
|
|
* [1] Documentation/memory-barriers.txt around line 1533
|
|
* [2] Documentation/RCU/rculist_nulls.rst around line 146
|
|
*/
|
|
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
|
|
for (({barrier();}), \
|
|
pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
|
|
(!is_a_nulls(pos)) && \
|
|
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
|
|
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
|
|
|
|
/**
|
|
* hlist_nulls_for_each_entry_safe -
|
|
* iterate over list of given type safe against removal of list entry
|
|
* @tpos: the type * to use as a loop cursor.
|
|
* @pos: the &struct hlist_nulls_node to use as a loop cursor.
|
|
* @head: the head of the list.
|
|
* @member: the name of the hlist_nulls_node within the struct.
|
|
*/
|
|
#define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \
|
|
for (({barrier();}), \
|
|
pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
|
|
(!is_a_nulls(pos)) && \
|
|
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); \
|
|
pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });)
|
|
#endif
|
|
#endif
|