Commit Graph

1398251 Commits

Author SHA1 Message Date
Linus Torvalds 01814e11e5 Merge tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:

 - gpd-fan: Fix compilation error for non-ACPI builds, and initialize EC
   when loading the driver

* tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (gpd-fan) initialize EC on driver load for Win 4
  hwmon: (gpd-fan) Fix compilation error in non-ACPI builds
2025-11-13 16:54:36 -08:00
Linus Torvalds aecba2e013 Merge tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These fix issues related to the handling of compressed hibernation
  images and a recent intel_pstate driver regression:

   - Fix issues related to using inadequate data types and incorrect use
     of atomic variables in the compressed hibernation images handling
     code that were introduced during the 6.9 development cycle (Mario
     Limonciello)

   - Move a X86_FEATURE_IDA check from turbo_is_disabled() to the places
     where a new value for MSR_IA32_PERF_CTL is computed in intel_pstate
     to address a regression preventing users from enabling turbo
     frequencies post-boot (Srinivas Pandruvada)"

* tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
  PM: hibernate: Fix style issues in save_compressed_image()
  PM: hibernate: Use atomic64_t for compressed_size variable
  PM: hibernate: Emit an error when image writing fails
2025-11-13 16:31:07 -08:00
Linus Torvalds 6a3cc1b749 Merge tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These fix issues in the ACPI CPPC library and in the recently added
  parser for the ACPI MRRM table:

   - Limit some checks in the ACPI CPPC library to online CPUs to avoid
     accessing uninitialized per-CPU variables when some CPUs are
     offline to start with, like during boot with 'nosmt=force' (Gautham
     Shenoy)

   - Rework add_boot_memory_ranges() in the ACPI MRRM table parser to
     fix memory leaks and improve error handling (Kaushlendra Kumar)"

* tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: MRRM: Fix memory leaks and improve error handling
  ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs
  ACPI: CPPC: Perform fast check switch only for online CPUs
  ACPI: CPPC: Check _CPC validity for only the online CPUs
  ACPI: CPPC: Detect preferred core availability on online CPUs
2025-11-13 16:22:36 -08:00
Martin KaFai Lau 91a78ce994 Merge branch 'mptcp-fix-conflicts-between-mptcp-and-sockmap'
Jiayuan Chen says:

====================
mptcp: Fix conflicts between MPTCP and sockmap

Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.

sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and
implementing fast socket-level forwarding logic:
1. Users can obtain file descriptors through userspace socket()/accept()
   interfaces, then call BPF syscall to perform these replacements.
2. Users can also use the bpf_sock_hash_update helper (in sockops programs)
   to replace handlers when TCP connections enter ESTABLISHED state
  (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)

However, when combined with MPTCP, an issue arises: MPTCP creates subflow
sk's and performs TCP handshakes, so the BPF program obtains subflow sk's
and may incorrectly replace their sk_prot. We need to reject such
operations. In patch 1, we set psock_update_sk_prot to NULL in the
subflow's custom sk_prot.

Additionally, if the server's listening socket has MPTCP enabled and the
client's TCP also uses MPTCP, we should allow the combination of subflow
and sockmap. This is because the latest Golang programs have enabled MPTCP
for listening sockets by default [2]. For programs already using sockmap,
upgrading Golang should not cause sockmap functionality to fail.

Patch 2 prevents the WARNING from occurring.

Despite these patches fixing stream corruption, users of sockmap must set
GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it.

[1] truncated warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
Modules linked in:
RIP: 0010:mptcp_stream_accept+0x34c/0x380
RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
PKRU: 55555554
Call Trace:
 <TASK>
 do_accept+0xeb/0x190
 ? __x64_sys_pselect6+0x61/0x80
 ? _raw_spin_unlock+0x12/0x30
 ? alloc_fd+0x11e/0x190
 __sys_accept4+0x8c/0x100
 __x64_sys_accept+0x1f/0x30
 x64_sys_call+0x202f/0x20f0
 do_syscall_64+0x72/0x9a0
 ? switch_fpu_return+0x60/0xf0
 ? irqentry_exit_to_user_mode+0xdb/0x1e0
 ? irqentry_exit+0x3f/0x50
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 ? clear_bhb_loop+0x50/0xa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
 </TASK>
---[ end trace 0000000000000000 ]---

[2]: https://go-review.googlesource.com/c/go/+/607715
====================

Link: https://patch.msgid.link/20251111060307.194196-1-jiayuan.chen@linux.dev
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-11-13 13:18:27 -08:00
Jiayuan Chen cb730e4ac1 selftests/bpf: Add mptcp test with sockmap
Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.

Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-4-jiayuan.chen@linux.dev
2025-11-13 13:18:25 -08:00
Jiayuan Chen c77b3b79a9 mptcp: Fix proto fallback detection with BPF
The sockmap feature allows bpf syscall from userspace, or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  syn_recv_sock()/subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''

When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_syn_recv_sock()
  subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
'''

Then, this subflow can be normally used by sockmap, which replaces the
native sk_prot with sockmap's custom sk_prot. The issue occurs when the
user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
Here, it uses sk->sk_prot to compare with the native sk_prot, but this
is incorrect when sockmap is used, as we may incorrectly set
sk->sk_socket->ops.

This fix uses the more generic sk_family for the comparison instead.

Additionally, this also prevents a WARNING from occurring:

result from ./scripts/decode_stacktrace.sh:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
(net/mptcp/protocol.c:4005)
Modules linked in:
...

PKRU: 55555554
Call Trace:
<TASK>
do_accept (net/socket.c:1989)
__sys_accept4 (net/socket.c:2028 net/socket.c:2057)
__x64_sys_accept (net/socket.c:2067)
x64_sys_call (arch/x86/entry/syscall_64.c:41)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f87ac92b83d

---[ end trace 0000000000000000 ]---

Fixes: 0b4f33def7 ("mptcp: fix tcp fallback crash")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-3-jiayuan.chen@linux.dev
2025-11-13 13:11:15 -08:00
Ian Rogers b72b8132d8 perf libbfd: Ensure libbfd is initialized prior to use
Multiple threads may be creating and destroying BFD objects in
situations like `perf top`.

Without appropriate initialization crashes may occur during libbfd's
cache management.

BFD's locks require recursive mutexes, add support for these.

Committer testing:

This happens only when building with 'make BUILD_NONDISTRO=1' and having
the binutils-devel package (or equivalent) installed, i.e. linking with
binutils devel files, an opt-in perf build.

Before:

  root@x1:~# perf top
  perf: Segmentation fault
  -------- backtrace --------
  <SNIP multiple failed attempts at printing a backtrace>
  root@x1:~#

After this patch it works as before.

Closes: https://lore.kernel.org/lkml/aQt66zhfxSA80xwt@gentoo.org/
Fixes: 95931d9a59 ("perf libbfd: Move libbfd functionality to its own file")
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13 17:55:19 -03:00
Ravi Bangoria 3c723f4497 perf test: Fix lock contention test
Couple of independent fixes:

1. Wire in SIGSEGV handler that terminates the test with a failure code.

2. Use "--lock-cgroup" instead of "-g"; "-g" was proposed but never
   merged. See commit 4d1792d0a2 ("perf lock contention: Add
   --lock-cgroup option")

3. Call cleanup() on every normal exit so trap_cleanup() doesn't mistake
   it for an unexpected signal and emit a false-negative "Unexpected
   signal in main" message.

Before patch:

  # ./perf test -vv "lock contention"
   85: kernel lock contention analysis test:
  --- start ---
  test child forked, pid 610711
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --lock-cgroup
  Unexpected signal in test_aggr_cgroup
  ---- end(0) ----
   85: kernel lock contention analysis test                            : Ok

After patch:

  # ./perf test -vv "lock contention"
   85: kernel lock contention analysis test:
  --- start ---
  test child forked, pid 602637
  Testing perf lock record and perf lock contention
  Testing perf lock contention --use-bpf
  Testing perf lock record and perf lock contention at the same time
  Testing perf lock contention --threads
  Testing perf lock contention --lock-addr
  Testing perf lock contention --lock-cgroup
  Testing perf lock contention --type-filter (w/ spinlock)
  Testing perf lock contention --lock-filter (w/ tasklist_lock)
  Testing perf lock contention --callstack-filter (w/ unix_stream)
  [Skip] Could not find 'unix_stream'
  Testing perf lock contention --callstack-filter with task aggregation
  [Skip] Could not find 'unix_stream'
  Testing perf lock contention --cgroup-filter
  Testing perf lock contention CSV output
  ---- end(0) ----
   85: kernel lock contention analysis test                            : Ok

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Tycho Andersen <tycho@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13 17:29:00 -03:00
Ravi Bangoria d0206db94b perf lock: Fix segfault due to missing kernel map
Kernel maps are encoded in PERF_RECORD_MMAP2 samples but "perf lock
report" and "perf lock contention" do not process MMAP2 samples.

Because of that, machine->vmlinux_map stays NULL and any later access
triggers a segmentation fault.

Fix it by adding ->mmap2() callbacks.

Fixes: 53b00ff358 ("perf record: Make --buildid-mmap the default")
Reported-by: Tycho Andersen (AMD) <tycho@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Tycho Andersen (AMD) <tycho@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13 17:17:41 -03:00
Arnaldo Carvalho de Melo 84003ab3d0 tools headers UAPI: Sync KVM's vmx.h with the kernel to pick SEAMCALL exit reason
To pick the changes in:

  9d7dfb95da ("KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL")

The 'perf kvm-stat' tool uses the exit reasons that are included in the
VMX_EXIT_REASONS define, this new SEAMCALL isn't included there (TDCALL
is), so shouldn't be causing any change in behaviour, this patch ends up
being just addressess the following perf build warning:

  Warning: Kernel ABI header differences:
    diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h

Please see tools/include/uapi/README for further details.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13 17:16:34 -03:00
Arnaldo Carvalho de Melo a09e5967ad perf build: Don't fail fast path feature detection when binutils-devel is not available
This is one more remnant of the BUILD_NONDISTRO series to make building
with binutils-devel opt-in due to license incompatibility.

In this case just the references at link time were still in place, which
make building the test-all.bin file fail, which wasn't detected before
probably because the last test was done with binutils-devel available,
doh.

Now:

  $ rpm -q binutils-devel
  package binutils-devel is not installed
  $ file /tmp/build/perf-tools/feature/test-all.bin
  /tmp/build/perf-tools/feature/test-all.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
  dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
  BuildID[sha1]=4b5388a346b51f1b993f0b0dbd49f4570769b03c, for GNU/Linux 3.2.0, not stripped
  $

Fixes: 970ae86307 ("perf build: The bfd features are opt-in, stop testing for them by default")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13 17:16:34 -03:00
Thomas Falcon 85c894a80a perf header: Write bpf_prog (infos|btfs)_cnt to data file
With commit f0d0f978f3 ("perf header: Don't write empty BPF/BTF
info"), the write_bpf_( prog_info() | btf() ) functions exit without
writing anything if env->bpf_prog.(infos| btfs)_cnt is zero.

process_bpf_( prog_info() | btf() ), however, still expect a "count"
value to exist in the data file. If btf information is empty, for
example, process_bpf_btf will read garbage or some other data as the
number of btf nodes in the data file. As a result, the data file will
not be processed correctly.

Instead, write the count to the data file and exit if it is zero.

Fixes: f0d0f978f3 ("perf header: Don't write empty BPF/BTF info")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13 17:16:33 -03:00
Rafael J. Wysocki 161284b26f Merge branch 'pm-sleep'
Merge fixes for issues related to the handling of compressed hibernation
images that were introduced during the 6.9 development cycle.

* pm-sleep:
  PM: hibernate: Fix style issues in save_compressed_image()
  PM: hibernate: Use atomic64_t for compressed_size variable
  PM: hibernate: Emit an error when image writing fails
2025-11-13 21:05:46 +01:00
Linus Torvalds 9b9e43704d Merge tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:

 - Fix memory leak of objects from remote NUMA node when bulk freeing to
   a cache with sheaves (Harry Yoo)

* tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slub: fix memory leak in free_to_pcs_bulk()
2025-11-13 11:42:44 -08:00
Rafael J. Wysocki 7564f3543c Merge branches 'acpi-cppc' and 'acpi-tables'
Merge ACPI CPPC library fixes and an ACPI MRRM table parser fix for
6.18-rc6.

* acpi-cppc:
  ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs
  ACPI: CPPC: Perform fast check switch only for online CPUs
  ACPI: CPPC: Check _CPC validity for only the online CPUs
  ACPI: CPPC: Detect preferred core availability on online CPUs

* acpi-tables:
  ACPI: MRRM: Fix memory leaks and improve error handling
2025-11-13 20:40:51 +01:00
Linus Torvalds 8b4a014e28 Merge tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
 "Fixes event-filter-function.tc tracing test failure caused when a
  first run to sample events triggers kmem_cache_free which interferes
  with the rest of the test.

  Fix this by calling sample_events twice to eliminate the
  kmem_cache_free related noise from the sampling"

* tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/tracing: Run sample events to clear page cache events
2025-11-13 11:37:40 -08:00
Linus Torvalds d0309c0543 Merge tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from Bluetooth and Wireless. No known outstanding
  regressions.

  Current release - regressions:

   - eth:
      - bonding: fix mii_status when slave is down
      - mlx5e: fix missing error assignment in mlx5e_xfrm_add_state()

  Previous releases - regressions:

   - sched: limit try_bulk_dequeue_skb() batches

   - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe

   - af_unix: initialise scc_index in unix_add_edge()

   - netpoll: fix incorrect refcount handling causing incorrect cleanup

   - bluetooth: don't hold spin lock over sleeping functions

   - hsr: Fix supervision frame sending on HSRv0

   - sctp: prevent possible shift out-of-bounds

   - tipc: fix use-after-free in tipc_mon_reinit_self().

   - dsa: tag_brcm: do not mark link local traffic as offloaded

   - eth: virtio-net: fix incorrect flags recording in big mode

  Previous releases - always broken:

   - sched: initialize struct tc_ife to fix kernel-infoleak

   - wifi:
      - mac80211: reject address change while connecting
      - iwlwifi: avoid toggling links due to wrong element use

   - bluetooth: cancel mesh send timer when hdev removed

   - strparser: fix signed/unsigned mismatch bug

   - handshake: fix memory leak in tls_handshake_accept()

  Misc:

   - selftests: mptcp: fix some flaky tests"

* tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
  hsr: Follow standard for HSRv0 supervision frames
  hsr: Fix supervision frame sending on HSRv0
  virtio-net: fix incorrect flags recording in big mode
  ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe
  wifi: iwlwifi: mld: always take beacon ies in link grading
  wifi: iwlwifi: mvm: fix beacon template/fixed rate
  wifi: iwlwifi: fix aux ROC time event iterator usage
  net_sched: limit try_bulk_dequeue_skb() batches
  selftests: mptcp: join: properly kill background tasks
  selftests: mptcp: connect: trunc: read all recv data
  selftests: mptcp: join: userspace: longer transfer
  selftests: mptcp: join: endpoints: longer transfer
  selftests: mptcp: join: rm: set backup flag
  selftests: mptcp: connect: fix fallback note due to OoO
  ethtool: fix incorrect kernel-doc style comment in ethtool.h
  mlx5: Fix default values in create CQ
  Bluetooth: btrtl: Avoid loading the config file on security chips
  net/mlx5e: Fix potentially misleading debug message
  net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps
  net/mlx5e: Fix maxrate wraparound in threshold between units
  ...
2025-11-13 11:20:25 -08:00
Harry Yoo cbcff934fa mm/slub: fix memory leak in free_to_pcs_bulk()
The commit 989b09b739 ("slab: skip percpu sheaves for remote object
freeing") introduced the remote_objects array in free_to_pcs_bulk() to
skip sheaves when objects from a remote node are freed.

However, the array is flushed only when:
  1) the array becomes full (++remote_nr >= PCS_BATCH_MAX), or
  2) slab_free_hook() returns false and size becomes zero.

When neither of the conditions is met, objects in the array are leaked.
This resulted in a memory leak [1], where 82 GiB of memory was allocated
for the maple_node cache.

Flush the array after successfully freeing objects to sheaves
in the do_free: path.

In the meantime, move the snippet if (!size) goto flush_remote; outside
the while loop for readability. Let's say all objects in the array are
from a remote node: then we acquire s->cpu_sheaves->lock and try to free
an object even when size is zero. This doesn't appear to be harmful,
but isn't really readable.

Reported-by: Tytus Rogalewski <admin@simplepod.ai>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220765 [1]
Closes: https://lore.kernel.org/linux-mm/20251107094809.12e9d705b7bf4815783eb184@linux-foundation.org
Closes: https://lore.kernel.org/all/aRGDTwbt2EIz2CYn@hyeyoo
Fixes: 989b09b739 ("slab: skip percpu sheaves for remote object freeing")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Link: https://patch.msgid.link/20251111125331.12246-1-harry.yoo@oracle.com
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Tytus Rogalewski <admin@simplepod.ai>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13 19:56:46 +01:00
Jiayuan Chen fbade4bd08 mptcp: Disallow MPTCP subflows from sockmap
The sockmap feature allows bpf syscall from userspace, or based on bpf
sockops, replacing the sk_prot of sockets during protocol stack processing
with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''
Consider two scenarios:

1. When the server has MPTCP enabled and the client also requests MPTCP,
   the sk passed to the BPF program is a subflow sk. Since subflows only
   handle partial data, replacing their sk_prot is meaningless and will
   cause traffic disruption.

2. When the server has MPTCP enabled but the client sends a TCP SYN
   without MPTCP, subflow_syn_recv_sock() performs a fallback on the
   subflow, replacing the subflow sk's sk_prot with the native sk_prot.
   '''
   subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
   '''
   Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
   converts the subflow to plain TCP.

For the first case, we should prevent it from being combined with sockmap
by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
sockmap's own flow.

For the second case, since subflow_syn_recv_sock() has already restored
sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.

Fixes: cec37a6e41 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251111060307.194196-2-jiayuan.chen@linux.dev
2025-11-13 09:15:41 -08:00
Kiryl Shutsemau 0a8fb03fe7 MAINTAINERS: Update name spelling
Use transliteration from the Belarusian language instead of Russian.

Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251113121006.651992-1-kas%40kernel.org
2025-11-13 07:58:47 -08:00
Andrew Donnellan ebd4469e7a entry: Fix ifndef around arch_xfer_to_guest_mode_handle_work() stub
The stub implementation of arch_xfer_to_guest_mode_handle_work() is
guarded by an #ifndef that incorrectly checks for the name
arch_xfer_to_guest_mode_work instead. It seems the function was renamed
to add "_handle" as a late change to the original patch, and the #ifndef
wasn't updated to go with it.

Change the #ifndef to match the name of the function. No users right now,
so no need to update any architecture code.

Fixes: 935ace2fb5 ("entry: Provide infrastructure for work before transitioning to guest mode")
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251105-entry-fix-ifndef-v1-1-d8d28045b627@linux.ibm.com
2025-11-13 16:27:56 +01:00
Paolo Abeni 94909c53e4 Merge branch 'hsr-send-correct-hsrv0-supervision-frames'
Felix Maurer says:

====================
hsr: Send correct HSRv0 supervision frames

Hangbin recently reported that the hsr selftests were failing and noted
that the entries in the node table were not merged, i.e., had
00:00:00:00:00:00 as MacAddressB forever [1].

This failure only occured with HSRv0 because it was not sending
supervision frames anymore. While debugging this I found that we were
not really following the HSRv0 standard for the supervision frames we
sent, so I additionally made a few changes to get closer to the standard
and restore a more correct behavior we had a while ago.

The selftests can still fail because they take a while and run into the
timeout. I did not include a change of the timeout because I have more
improvements to the selftests mostly ready that change the test duration
but are net-next material.

[1]: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/
====================

Link: https://patch.msgid.link/cover.1762876095.git.fmaurer@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13 15:55:06 +01:00
Felix Maurer b2c26c82f7 hsr: Follow standard for HSRv0 supervision frames
For HSRv0, the path_id has the following meaning:
- 0000: PRP supervision frame
- 0001-1001: HSR ring identifier
- 1010-1011: Frames from PRP network (A/B, with RedBoxes)
- 1111: HSR supervision frame

Follow the IEC 62439-3:2010 standard more closely by setting the right
path_id for HSRv0 supervision frames (actually, it is correctly set when
the frame is constructed, but hsr_set_path_id() overwrites it) and set a
fixed HSR ring identifier of 1. The ring identifier seems to be generally
unused and we ignore it anyways on reception, but some fixed identifier is
definitely better than using one identifier in one direction and a wrong
identifier in the other.

This was also the behavior before commit f266a683a4 ("net/hsr: Better
frame dispatch") which introduced the alternating path_id. This was later
moved to hsr_set_path_id() in commit 451d8123f8 ("net: prp: add packet
handling support").

The IEC 62439-3:2010 also contains 6 unused bytes after the MacAddressA in
the HSRv0 supervision frames. Adjust a TODO comment accordingly.

Fixes: f266a683a4 ("net/hsr: Better frame dispatch")
Fixes: 451d8123f8 ("net: prp: add packet handling support")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/ea0d5133cd593856b2fa673d6e2067bf1d4d1794.1762876095.git.fmaurer@redhat.com
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13 15:55:04 +01:00
Felix Maurer 96a3a03abf hsr: Fix supervision frame sending on HSRv0
On HSRv0, no supervision frames were sent. The supervison frames were
generated successfully, but failed the check for a sufficiently long mac
header, i.e., at least sizeof(struct hsr_ethhdr), in hsr_fill_frame_info()
because the mac header only contained the ethernet header.

Fix this by including the HSR header in the mac header when generating HSR
supervision frames. Note that the mac header now also includes the TLV
fields. This matches how we set the headers on rx and also the size of
struct hsrv0_ethhdr_sp.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Closes: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/
Fixes: 9cfb5e7f0d ("net: hsr: fix hsr_init_sk() vs network/transport headers.")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/4354114fea9a642fe71f49aeeb6c6159d1d61840.1762876095.git.fmaurer@redhat.com
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13 15:55:04 +01:00
Randy Dunlap 0a4a18e888 drm/client: fix MODULE_PARM_DESC string for "active"
The MODULE_PARM_DESC string for the "active" parameter is missing a
space and has an extraneous trailing ']' character. Correct these.

Before patch:
$ modinfo -p ./drm_client_lib.ko
active:Choose which drm client to start, default isfbdev] (string)

After patch:
$ modinfo -p ./drm_client_lib.ko
active:Choose which drm client to start, default is fbdev (string)

Fixes: f7b42442c4 ("drm/log: Introduce a new boot logger to draw the kmsg on the screen")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251112010920.2355712-1-rdunlap@infradead.org
2025-11-13 14:15:24 +01:00
Linus Torvalds 2ccec59446 Merge tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:

 - Add Chunhai Guo as a EROFS reviewer to get more eyes from interested
   industry vendors

 - Fix infinite loop caused by incomplete crafted zstd-compressed data
   (thanks to Robert again!)

* tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: avoid infinite loop due to incomplete zstd-compressed data
  MAINTAINERS: erofs: add myself as reviewer
2025-11-13 05:02:59 -08:00
Linus Torvalds 967a72fa7f Merge tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:

 - Fix smbdirect (RDMA) disconnect hang bug

 - Fix potential Denial of Service when connection limit exceeded

 - Fix smbdirect (RDMA) connection (potentially accessing freed memory)
   bug

* tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd:
  smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED
  ksmbd: close accepted socket when per-IP limit rejects connection
  smb: server: rdma: avoid unmapping posted recv on accept failure
2025-11-13 04:57:38 -08:00
Shawn Lin 921b3f59b7 PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi
This Wi-Fi advertises the L0s and L1 capabilities but actually it doesn't
support them. This is confirmed by HiSilicon team in actual productization.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/1762916319-139532-1-git-send-email-shawn.lin@rock-chips.com
2025-11-13 06:17:23 -06:00
Xuan Zhuo 0eff2eaa53 virtio-net: fix incorrect flags recording in big mode
The purpose of commit 703eec1b24 ("virtio_net: fixing XDP for fully
checksummed packets handling") is to record the flags in advance, as
their value may be overwritten in the XDP case. However, the flags
recorded under big mode are incorrect, because in big mode, the passed
buf does not point to the rx buffer, but rather to the page of the
submitted buffer. This commit fixes this issue.

For the small mode, the commit c11a49d58a ("virtio_net: Fix mismatched
buf address when unmapping for small packets") fixed it.

Tested-by: Alyssa Ross <hi@alyssa.is>
Fixes: 703eec1b24 ("virtio_net: fixing XDP for fully checksummed packets handling")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13 13:16:30 +01:00
Linus Torvalds 6fa9041b71 Merge tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
 "Address recently reported issues or issues found at the recent NFS
  bake-a-thon held in Raleigh, NC.

  Issues reported with v6.18-rc:
   - Address a kernel build issue
   - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED

  Issues that need expedient stable backports:
   - Close a refcount leak exposure
   - Report support for NFSv4.2 CLONE correctly
   - Fix oops during COPY_NOTIFY processing
   - Prevent rare crash after XDR encoding failure
   - Prevent crash due to confused or malicious NFSv4.1 client"

* tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it"
  nfsd: ensure SEQUENCE replay sends a valid reply.
  NFSD: Never cache a COMPOUND when the SEQUENCE operation fails
  NFSD: Skip close replay processing if XDR encoding fails
  NFSD: free copynotify stateid in nfs4_free_ol_stateid()
  nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes
  nfsd: fix refcount leak in nfsd_set_fh_dentry()
2025-11-12 18:41:01 -08:00
Linus Torvalds 92385a075a Merge tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:

 - two minor fixes for DMA API infrastructure: restoring proper
   structure padding used in benchmark tests (Qinxin Xia) and global
   DMA_BIT_MASK macro rework to make it a bit more clang friendly (James
   Clark)

* tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope
  dma-mapping: benchmark: Restore padding to ensure uABI remained consistent
2025-11-12 18:31:22 -08:00
Linus Torvalds e927c520e1 Merge tag 'loongarch-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:

 - Fix a Rust build error

 - Fix exception/interrupt, memory management, perf event, hardware
   breakpoint, kexec and KVM bugs

* tag 'loongarch-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Fix max supported vCPUs set with EIOINTC
  LoongArch: KVM: Skip PMU checking on vCPU context switch
  LoongArch: KVM: Restore guest PMU if it is enabled
  LoongArch: KVM: Add delay until timer interrupt injected
  LoongArch: KVM: Set page with write attribute if dirty track disabled
  LoongArch: kexec: Print out debugging message if required
  LoongArch: kexec: Initialize the kexec_buf structure
  LoongArch: Use correct accessor to read FWPC/MWPC
  LoongArch: Refine the init_hw_perf_events() function
  LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one()
  LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY
  LoongArch: Consolidate max_pfn & max_low_pfn calculation
  LoongArch: Consolidate early_ioremap()/ioremap_prot()
  LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY
  LoongArch: Clarify 3 MSG interrupt features
  rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags
2025-11-12 18:21:30 -08:00
Linus Torvalds 89ee862a4d Merge tag 'alpha-fixes-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha fix from Matt Turner:
 "Add Magnus as a maintainer of the alpha port"

* tag 'alpha-fixes-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  MAINTAINERS: Add Magnus Lindholm as maintainer for alpha port
2025-11-12 18:18:12 -08:00
Bjorn Helgaas 823576c894 PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports
Christian reported that f3ac2ff148 ("PCI/ASPM: Enable all ClockPM and
ASPM states for devicetree platforms") broke booting on the A-EON AmigaOne
X1000.

Override the L0s and L1 Support advertised in Link Capabilities by the
X1000 Root Ports ([1959:a002]) so we don't try to enable those states.

Fixes: f3ac2ff148 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
Fixes: df5192d9bb ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lore.kernel.org/r/a41d2ca1-fcd9-c416-b111-a958e92e94bf@xenosoft.de
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-11-12 18:51:39 -06:00
Bjorn Helgaas 5b40a5080c PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports
Christian reported that f3ac2ff148 ("PCI/ASPM: Enable all ClockPM and
ASPM states for devicetree platforms") broke booting on the A-EON X5000.

Override the L0s and L1 Support advertised in Link Capabilities by the
X5000 Root Ports ([1957:0451]) so we don't try to enable those states.

Fixes: f3ac2ff148 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
Fixes: df5192d9bb ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lore.kernel.org/r/db5c95a1-cf3e-46f9-8045-a1b04908051a@xenosoft.de
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-5-helgaas@kernel.org
2025-11-12 18:51:39 -06:00
Bjorn Helgaas 30579eebba PCI/ASPM: Convert quirks to override advertised link states
Existing quirks to disable ASPM L0s and L1 use pci_disable_link_state(),
which disables ASPM states and prevents their use in the future.  But since
they are FINAL quirks, they happen after ASPM has already been enabled.
Here's a typical call path:

  pci_host_probe
    pci_scan_root_bus_bridge
      pci_scan_child_bus
        pci_scan_slot
          pci_scan_single_device
            pci_device_add
              pci_fixup_device(pci_fixup_header)  # HEADER quirks
          pcie_aspm_init_link_state
            pcie_config_aspm_path
              pcie_config_aspm_link
                pcie_config_aspm_dev              # ASPM may be enabled
    pci_bus_add_devices
      pci_bus_add_devices
        pci_fixup_device(pci_fixup_final)         # FINAL quirks
          quirk_disable_aspm_l0s
            pci_disable_link_state(dev, PCIE_LINK_STATE_L0S)

Sometimes enabling ASPM can make the link non-functional, so if we know
ASPM is broken on a device, we shouldn't enable it at all, even
temporarily.

Convert the existing quirks to use pcie_aspm_remove_cap() instead, which
overrides the ASPM Support advertised in PCIe Link Capabilities, and make
them HEADER quirks so they run before pcie_aspm_init_link_state() has a
chance to enable ASPM.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-4-helgaas@kernel.org
2025-11-12 18:51:39 -06:00
Bjorn Helgaas 575b98e39d PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states
Add pcie_aspm_remove_cap().  A quirk can use this to prevent use of ASPM
L0s or L1 link states, even if the device advertised support for them.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-3-helgaas@kernel.org
2025-11-12 18:51:27 -06:00
Bjorn Helgaas 4495bffd86 PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
Defective devices sometimes advertise support for ASPM L0s or L1 states
even if they don't work correctly.

Cache the L0s Supported and L1 Supported bits early in enumeration so
HEADER quirks can override the ASPM states advertised in Link Capabilities
before pcie_aspm_cap_init() enables ASPM.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Link: https://patch.msgid.link/20251110222929.2140564-2-helgaas@kernel.org
2025-11-12 18:47:16 -06:00
Haotian Zhang 360b3730f8 ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe()
rsnd_ssiu_probe() leaks an OF node reference obtained by
rsnd_ssiu_of_node(). The node reference is acquired but
never released across all return paths.

Fix it by declaring the device node with the __free(device_node)
cleanup construct to ensure automatic release when the variable goes
out of scope.

Fixes: 4e7788fb80 ("ASoC: rsnd: add SSIU BUSIF support")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://patch.msgid.link/20251112065709.1522-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-13 00:36:01 +00:00
Dave Jiang 214291cbaa acpi/hmat: Fix lockdep warning for hmem_register_resource()
The following lockdep splat was observed while kernel auto-online a CXL
memory region:

======================================================
WARNING: possible circular locking dependency detected
6.17.0djtest+ #53 Tainted: G        W
------------------------------------------------------
systemd-udevd/3334 is trying to acquire lock:
ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at: hmem_register_resource+0x31/0x50

but task is already holding lock:
ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x2e/0x70

which lock already depends on the new lock.
[..]
Chain exists of:
  hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock((node_chain).rwsem);
                               lock(mem_hotplug_lock);
                               lock((node_chain).rwsem);
  lock(hmem_resource_lock);

The lock ordering can cause potential deadlock. There are instances
where hmem_resource_lock is taken after (node_chain).rwsem, and vice
versa.

Split out the target update section of hmat_register_target() so that
hmat_callback() only envokes that section instead of attempt to register
hmem devices that it does not need to.

[ dj: Fix up comment to be closer to 80cols. (Jonathan) ]

Fixes: cf8741ac57 ("ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device")
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20251105235115.85062-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-12 14:47:55 -07:00
Cryolitia PukNgae c55a8e24cd hwmon: (gpd-fan) initialize EC on driver load for Win 4
The original implement will re-init the EC when it reports a zero
value, and it's a workaround for the black box buggy firmware.

Now a contributer test and report that, the bug is that, the firmware
won't initialize the EC on boot, so the EC ramains in unusable status.
And it won't need to re-init it during runtime. The original implement
is not perfect, any write command will be ignored until we first read
it. Just re-init it unconditionally when the driver load could work.

Fixes: 0ab88e2394 ("hwmon: add GPD devices sensor driver")
Co-developed-by: kylon <3252255+kylon@users.noreply.github.com>
Signed-off-by: kylon <3252255+kylon@users.noreply.github.com>
Link: https://github.com/Cryolitia/gpd-fan-driver/pull/20
Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Link: https://lore.kernel.org/r/20251030-win4-v1-1-c374dcb86985@uniontech.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-11-12 11:54:37 -08:00
Gopi Krishna Menon 9efb297c52 hwmon: (gpd-fan) Fix compilation error in non-ACPI builds
Building gpd-fan driver without CONFIG_ACPI results in the following
build errors:

drivers/hwmon/gpd-fan.c: In function ‘gpd_ecram_read’:
drivers/hwmon/gpd-fan.c:228:9: error: implicit declaration of function ‘outb’ [-Werror=implicit-function-declaration]
  228 |         outb(0x2E, addr_port);
      |         ^~~~
drivers/hwmon/gpd-fan.c:241:16: error: implicit declaration of function ‘inb’ [-Werror=implicit-function-declaration]
  241 |         *val = inb(data_port);

The definitions for inb() and outb() come from <linux/io.h>
(specifically through <asm/io.h>), which is implicitly included via
<acpi_io.h>. When CONFIG_ACPI is not set, <acpi_io.h> is not included
resulting in <linux/io.h> to be omitted as well.

Since the driver does not depend on ACPI, remove <linux/acpi.h> and add
<linux/io.h> directly to fix the compilation errors.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Link: https://lore.kernel.org/r/20251024202042.752160-1-krishnagopi487@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-11-12 11:54:37 -08:00
Jakub Kicinski fe82c4f8a2 Merge tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:

====================
Couple more fixes:
 - mwl8k: work around FW expecting a DSSS element in beacons
 - ath11k: report correct TX status
 - iwlwifi: avoid toggling links due to wrong element use
 - iwlwifi: fix beacon template rate on older devices
 - iwlwifi: fix loop iterator being used after loop
 - mac80211: disallow address changes while using the address
 - mac80211: avoid bad rate warning in monitor/sniffer mode
 - hwsim: fix potential NULL deref (on monitor injection)

* tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mld: always take beacon ies in link grading
  wifi: iwlwifi: mvm: fix beacon template/fixed rate
  wifi: iwlwifi: fix aux ROC time event iterator usage
  wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing
  wifi: mac80211_hwsim: Fix possible NULL dereference
  wifi: mac80211: skip rate verification for not captured PSDUs
  wifi: mac80211: reject address change while connecting
  wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp()
====================

Link: https://patch.msgid.link/20251112114621.15716-5-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 09:33:09 -08:00
Srinivas Pandruvada 4b747cc628 cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
Commit ac4e04d9e3 ("cpufreq: intel_pstate: Unchecked MSR aceess in
legacy mode") introduced a check for feature X86_FEATURE_IDA to verify
turbo mode support. Although this is the correct way to check for turbo
mode support, it causes issues on some platforms that disable turbo
during OS boot, but enable it later [1]. Before adding this feature
check, users were able to get turbo mode frequencies by writing 0 to
/sys/devices/system/cpu/intel_pstate/no_turbo post-boot.

To restore the old behavior on the affected systems while still
addressing the unchecked MSR issue on some Skylake-X systems, check
X86_FEATURE_IDA only immediately before updates of MSR_IA32_PERF_CTL
that may involve setting the Turbo Engage Bit (bit 32).

Fixes: ac4e04d9e3 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode")
Reported-by: Aaron Rainbolt <arainbolt@kfocus.org>
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2122531 [1]
Tested-by: Aaron Rainbolt <arainbolt@kfocus.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20251111010840.141490-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12 17:59:37 +01:00
Caleb Sander Mateos 2d0e88f3fd io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
the number of bvecs in the request. However, bvecs may be split into
multiple segments depending on the queue limits. Thus, the number of
segments may overestimate the number of bvecs. For ublk devices, the
only current users of io_buffer_register_bvec(), virt_boundary_mask,
seg_boundary_mask, max_segments, and max_segment_size can all be set
arbitrarily by the ublk server process.
Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
loop actually yields. However, continue using blk_rq_nr_phys_segments()
as an upper bound on the number of bvecs when allocating imu to avoid
needing to iterate the bvecs a second time.

Link: https://lore.kernel.org/io-uring/20251111191530.1268875-1-csander@purestorage.com/
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 27cb27b6d5 ("io_uring: add support for kernel registered bvecs")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-12 08:25:33 -07:00
Alex Mastro d323ad7396 vfio: selftests: replace iova=vaddr with allocated iovas
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr
as part of DMA mapping operations. However, not all IOMMUs support the
same virtual address width as the processor. For instance, older Intel
consumer platforms only support 39-bits of IOMMU address space. On such
platforms, using the virtual address as the IOVA fails.

Make the tests more robust by using iova_allocator to vend IOVAs, which
queries legally accessible IOVAs from the underlying IOMMUFD or VFIO
container.

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Alex Mastro ce0e3c403e vfio: selftests: add iova allocator
Add struct iova_allocator, which gives tests a convenient way to generate
legally-accessible IOVAs to map. This allocator traverses the sorted
available IOVA ranges linearly, requires power-of-two size allocations,
and does not support freeing iova allocations. The assumption is that
tests are not IOVA space-bounded, and will not need to recycle IOVAs.

This is based on Alex Williamson's patch series for adding an IOVA
allocator [1].

[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-3-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Alex Mastro a77fa0b922 vfio: selftests: fix map limit tests to use last available iova
Use the newly available vfio_pci_iova_ranges() to determine the last
legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.

Fixes: de8d1f2fd5 ("vfio: selftests: add end of address space DMA map/unmap tests")
Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-2-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Alex Mastro 7c44656ab3 vfio: selftests: add iova range query helpers
VFIO selftests need to map IOVAs from legally accessible ranges, which
could vary between hardware. Tests in vfio_dma_mapping_test.c are making
excessively strong assumptions about which IOVAs can be mapped.

Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the
IOMMUFD or VFIO container associated with the device. The queried ranges
are normalized to IOMMUFD's iommu_iova_range representation so that
handling of IOVA ranges up the stack can be implementation-agnostic.
iommu_iova_range and vfio_iova_range are equivalent, so bias to using the
new interface's struct.

Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES.
Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and
VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.

The underlying vfio_iommu_type1_info buffer-related functionality has
been kept generic so the same helpers can be used to query other
capability chain information, if needed.

Reviewed-by: David Matlack <dmatlack@google.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Alex Mastro <amastro@fb.com>
Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12 08:04:42 -07:00
Chuang Wang ac1499fcd4 ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe
The sit driver's packet transmission path calls: sit_tunnel_xmit() ->
update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called
to delete entries exceeding FNHE_RECLAIM_DEPTH+random.

The race window is between fnhe_remove_oldest() selecting fnheX for
deletion and the subsequent kfree_rcu(). During this time, the
concurrent path's __mkroute_output() -> find_exception() can fetch the
soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a
new dst using a dst_hold(). When the original fnheX is freed via RCU,
the dst reference remains permanently leaked.

CPU 0                             CPU 1
__mkroute_output()
  find_exception() [fnheX]
                                  update_or_create_fnhe()
                                    fnhe_remove_oldest() [fnheX]
  rt_bind_exception() [bind dst]
                                  RCU callback [fnheX freed, dst leak]

This issue manifests as a device reference count leak and a warning in
dmesg when unregistering the net device:

  unregister_netdevice: waiting for sitX to become free. Usage count = N

Ido Schimmel provided the simple test validation method [1].

The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes().
Since rt_bind_exception() checks this field, setting it to zero prevents
the stale fnhe from being reused and bound to a new dst just before it
is freed.

[1]
ip netns add ns1
ip -n ns1 link set dev lo up
ip -n ns1 address add 192.0.2.1/32 dev lo
ip -n ns1 link add name dummy1 up type dummy
ip -n ns1 route add 192.0.2.2/32 dev dummy1
ip -n ns1 link add name gretap1 up arp off type gretap \
    local 192.0.2.1 remote 192.0.2.2
ip -n ns1 route add 198.51.0.0/16 dev gretap1
taskset -c 0 ip netns exec ns1 mausezahn gretap1 \
    -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
taskset -c 2 ip netns exec ns1 mausezahn gretap1 \
    -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q &
sleep 10
ip netns pids ns1 | xargs kill
ip netns del ns1

Cc: stable@vger.kernel.org
Fixes: 67d6d681e1 ("ipv4: make exception cache less predictible")
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251111064328.24440-1-nashuiliang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12 06:46:36 -08:00