Commit Graph

3844 Commits

Author SHA1 Message Date
Heiko Carstens
3120d12adf s390/idle: Fix cpu idle exit cpu time accounting
[ Upstream commit 0d785e2c32 ]

With the conversion to generic entry [1] cpu idle exit cpu time accounting
was converted from assembly to C. This introduced an reversed order of cpu
time accounting.

On cpu idle exit the current accounting happens with the following call
chain:

-> do_io_irq()/do_ext_irq()
 -> irq_enter_rcu()
  -> account_hardirq_enter()
   -> vtime_account_irq()
    -> vtime_account_kernel()

vtime_account_kernel() accounts the passed cpu time since last_update_timer
as system time, and updates last_update_timer to the current cpu timer
value.

However the subsequent call of

 -> account_idle_time_irq()

will incorrectly subtract passed cpu time from timer_idle_enter to the
updated last_update_timer value from system_timer. Then last_update_timer
is updated to a sys_enter_timer, which means that last_update_timer goes
back in time.

Subsequently account_hardirq_exit() will account too much cpu time as
hardirq time. The sum of all accounted cpu times is still correct, however
some cpu time which was previously accounted as system time is now
accounted as hardirq time, plus there is the oddity that last_update_timer
goes back in time.

Restore previous behavior by extracting cpu time accounting code from
account_idle_time_irq() into a new update_timer_idle() function and call it
before irq_enter_rcu().

Fixes: 56e62a7370 ("s390: convert to generic entry") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-13 17:20:18 +01:00
Masami Hiramatsu (Google)
3eb79d8d02 tracing: Add ftrace_fill_perf_regs() for perf event
[ Upstream commit d5d01b7199 ]

Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/173518997908.391279.15910334347345106424.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: aea2517999 ("x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:19:37 -05:00
Masami Hiramatsu (Google)
33d4e904e2 fgraph: Replace fgraph_ret_regs with ftrace_regs
[ Upstream commit a3ed4157b7 ]

Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.

The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: aea2517999 ("x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:19:36 -05:00
Steven Rostedt
a701884d13 ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs
[ Upstream commit e4cf33ca48 ]

Most architectures use pt_regs within ftrace_regs making a lot of the
accessor functions just calls to the pt_regs internally. Instead of
duplication this effort, use a HAVE_ARCH_FTRACE_REGS for architectures
that have their own ftrace_regs that is not based on pt_regs and will
define all the accessor functions, and for the architectures that just use
pt_regs, it will leave it undefined, and the default accessor functions
will be used.

Note, this will also make it easier to add new accessor functions to
ftrace_regs as it will mean having to touch less architectures.

Cc: <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241010202114.2289f6fd@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> # powerpc
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: aea2517999 ("x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:19:36 -05:00
Steven Rostedt
550a16d87d ftrace: Make ftrace_regs abstract from direct use
[ Upstream commit 7888af4166 ]

ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!

Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.

Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/

Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul  Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas  Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav  Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: aea2517999 ("x86/fgraph,bpf: Switch kprobe_multi program stack unwind to hw_regs path")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04 07:19:36 -05:00
Sven Schnelle
b56975f463 s390/ipl: Clear SBP flag when bootprog is set
commit b1aa01d312 upstream.

With z16 a new flag 'search boot program' was introduced for
list-directed IPL (SCSI, NVMe, ECKD DASD). If this flag is set,
e.g. via selecting the "Automatic" value for the "Boot program
selector" control on an HMC load panel, it is copied to the reipl
structure from the initial ipl structure. When a user now sets a
boot prog via sysfs, the flag is not cleared and the bootloader
will again automatically select the boot program, ignoring user
configuration.

To avoid that, clear the SBP flag when a bootprog sysfs file is
written.

Cc: stable@vger.kernel.org
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08 10:14:25 +01:00
Heiko Carstens
b514ad872a s390/mm: Fix __ptep_rdp() inline assembly
[ Upstream commit 31475b8811 ]

When a zero ASCE is passed to the __ptep_rdp() inline assembly, the
generated instruction should have the R3 field of the instruction set to
zero. However the inline assembly is written incorrectly: for such cases a
zero is loaded into a register allocated by the compiler and this register
is then used by the instruction.

This means that selected TLB entries may not be flushed since the specified
ASCE does not match the one which was used when the selected TLB entries
were created.

Fix this by removing the asce and opt parameters of __ptep_rdp(), since
all callers always pass zero, and use a hard-coded register zero for
the R3 field.

Fixes: 0807b85652 ("s390/mm: add support for RDP (Reset DAT-Protection)")
Cc: stable@vger.kernel.org
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-01 11:43:37 +01:00
Farhan Ali
d13fe1d330 s390/pci: Restore IRQ unconditionally for the zPCI device
commit b45873c3f0 upstream.

Commit c1e18c17bd ("s390/pci: add zpci_set_irq()/zpci_clear_irq()"),
introduced the zpci_set_irq() and zpci_clear_irq(), to be used while
resetting a zPCI device.

Commit da995d538d ("s390/pci: implement reset_slot for hotplug
slot"), mentions zpci_clear_irq() being called in the path for
zpci_hot_reset_device().  But that is not the case anymore and these
functions are not called outside of this file. Instead
zpci_hot_reset_device() relies on zpci_disable_device() also clearing
the IRQs, but misses to reset the zdev->irqs_registered flag.

However after a CLP disable/enable reset, the device's IRQ are
unregistered, but the flag zdev->irq_registered does not get cleared. It
creates an inconsistent state and so arch_restore_msi_irqs() doesn't
correctly restore the device's IRQ. This becomes a problem when a PCI
driver tries to restore the state of the device through
pci_restore_state(). Restore IRQ unconditionally for the device and remove
the irq_registered flag as its redundant.

Fixes: c1e18c17bd ("s390/pci: add zpci_set_irq()/zpci_clear_irq()")
Cc: stable@vger.kernnel.org
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-13 15:33:56 -05:00
Sven Schnelle
decbacd6a9 s390/time: Use monotonic clock in get_cycles()
[ Upstream commit 09e7e29d2b ]

Otherwise the code might not work correctly when the clock
is changed.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20 18:30:29 +02:00
Harald Freudenberger
7175bf8a2a s390/ap: Unmask SLCF bit in card and queue ap functions sysfs
[ Upstream commit 123b7c7c2b ]

The SLCF bit ("stateless command filtering") introduced with
CEX8 cards was because of the function mask's default value
suppressed when user space read the ap function for an AP
card or queue. Unmask this bit so that user space applications
like lszcrypt can evaluate and list this feature.

Fixes: d4c53ae8e4 ("s390/ap: store TAPQ hwinfo in struct ap_card")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-15 12:14:05 +02:00
Heiko Carstens
f3ea633a11 s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()
[ Upstream commit 9291ea091b ]

An mm has pgstes only after s390_enable_sie() has been called, while
mm_alloc_pgste() may be always true (e.g. via sysctl setting).

Limit the calls to gmap_unlink() in pte_free_tlb() to those cases
where there might be something to unlink.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29 11:02:13 +02:00
Niklas Schnelle
088a200ebf s390: Remove ioremap_wt() and pgprot_writethrough()
[ Upstream commit c94bff63e4 ]

It turns out that while s390 architecture calls its memory-I/O mapping
variants write-through and write-back the implementation of ioremap_wt()
and pgprot_writethrough() does not match Linux notion of ioremap_wt().

In particular Linux expects ioremap_wt() to be weaker still than
ioremap_wc(), allowing not just gathering and re-ordering but also reads
to be served from cache. Instead s390's implementation is equivalent to
normal ioremap() while its ioremap_wc() allows re-ordering.

Note that there are no known users of ioremap_wt() on s390 and the
resulting behavior is in line with asm-generic defining ioremap_wt() as
ioremap(), if undefined, so no breakage is expected.

As s390 does not have a mapping type matching the Linux notion of
ioremap_wt() and pgprot_writethrough(), simply drop them and rely on the
asm-generic fallbacks instead.

Fixes: b02002cc4c ("s390/pci: Implement ioremap_wc/prot() with MIO")
Fixes: b43b3fff04 ("s390: mm: convert to GENERIC_IOREMAP")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10 14:39:18 +02:00
Ryan Roberts
a684bad77e mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()
commit 02410ac72a upstream.

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the huge_pte is being cleared in huge_ptep_get_and_clear().
Provide for this by adding an `unsigned long sz` parameter to the
function. This follows the same pattern as huge_pte_clear() and
set_huge_pte_at().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, loongarch, mips,
parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
in a separate commit.

Cc: stable@vger.kernel.org
Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13 13:02:17 +01:00
Heiko Carstens
4801e961be s390/fpu: Add fpc exception handler / remove fixup section again
commit ae02615b7f upstream.

The fixup section was added again by mistake when test_fp_ctl() was
removed. The reason for the removal of the fixup section is described in
commit 484a8ed8b7 ("s390/extable: add dedicated uaccess handler").
Remove it again for the same reason.

Add an exception handler which handles exceptions when the floating point
control register is attempted to be set to invalid values. The exception
handler sets the floating point control register to zero and continues
execution at the specified address.

The new sfpc inline assembly is open-coded to make back porting a bit
easier.

Fixes: 702644249d ("s390/fpu: get rid of test_fp_ctl()")
Cc: stable@vger.kernel.org
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 10:05:44 +01:00
Heiko Carstens
6e83f167bb s390/futex: Fix FUTEX_OP_ANDN implementation
commit 26701574ce upstream.

The futex operation FUTEX_OP_ANDN is supposed to implement

*(int *)UADDR2 &= ~OPARG;

The s390 implementation just implements an AND instead of ANDN.
Add the missing bitwise not operation to oparg to fix this.

This is broken since nearly 19 years, so it looks like user space is
not making use of this operation.

Fixes: 3363fbdd6f ("[PATCH] s390: futex atomic operations")
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 10:05:04 +01:00
Sven Schnelle
46e9c4a376 s390/stackleak: Use exrl instead of ex in __stackleak_poison()
[ Upstream commit a88c26bb8e ]

exrl is present in all machines currently supported, therefore prefer
it over ex. This saves one instruction and doesn't need an additional
register to hold the address of the target instruction.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17 10:04:43 +01:00
Heiko Carstens
59f60af34a s390/sclp: Initialize sclp subsystem via arch_cpu_finalize_init()
[ Upstream commit 3bcc8a1af5 ]

With the switch to GENERIC_CPU_DEVICES an early call to the sclp subsystem
was added to smp_prepare_cpus(). This will usually succeed since the sclp
subsystem is implicitly initialized early enough if an sclp based console
is present.

If no such console is present the initialization happens with an
arch_initcall(); in such cases calls to the sclp subsystem will fail.
For CPU detection this means that the fallback sigp loop will be used
permanently to detect CPUs instead of the preferred READ_CPU_INFO sclp
request.

Fix this by adding an explicit early sclp_init() call via
arch_cpu_finalize_init().

Reported-by: Sheshu Ramanandan <sheshu.ramanandan@ibm.com>
Fixes: 4a39f12e75 ("s390/smp: Switch to GENERIC_CPU_DEVICES")
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08 09:58:07 +01:00
Niklas Schnelle
fbb370c01e s390/pci: Use topology ID for multi-function devices
[ Upstream commit 126034faaa ]

The newly introduced topology ID (TID) field in the CLP Query PCI
Function explicitly identifies groups of PCI functions whose RIDs belong
to the same (sub-)topology. When available use the TID instead of the
PCHID to match zPCI busses/domains for multi-function devices. Note that
currently only a single PCI bus per TID is supported. This change is
required because in future machines the PCHID will not identify a PCI
card but a specific port in the case of some multi-port NICs while from
a PCI point of view the entire card is a subtopology.

Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14 20:03:35 +01:00
Niklas Schnelle
1f3b309108 s390/pci: Sort PCI functions prior to creating virtual busses
[ Upstream commit 0467cdde8c ]

Instead of relying on the observed but not architected firmware behavior
that PCI functions from the same card are listed in ascending RID order
in clp_list_pci() ensure this by sorting. To allow for sorting separate
the initial clp_list_pci() and creation of the virtual PCI busses.

Note that fundamentally in our per-PCI function hotplug design non RID
order of discovery is still possible. For example when the two PFs of
a two port NIC are hotplugged after initial boot and in descending RID
order. In this case the virtual PCI bus would be created by the second
PF using that PF's UID as domain number instead of that of the first PF.
Thus the domain number would then change from the UID of the second PF
to that of the first PF on reboot but there is really nothing we can do
about that since changing domain numbers at runtime seems even worse.
This only impacts the domain number as the RIDs are consistent and thus
even with just the second PF visible it will show up in the correct
position on the virtual bus.

Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14 20:03:34 +01:00
Matthew Rosato
bd89d94f3e iommu/s390: Implement blocking domain
[ Upstream commit ecda483339 ]

This fixes a crash when surprise hot-unplugging a PCI device. This crash
happens because during hot-unplug __iommu_group_set_domain_nofail()
attaching the default domain fails when the platform no longer
recognizes the device as it has already been removed and we end up with
a NULL domain pointer and UAF. This is exactly the case referred to in
the second comment in __iommu_device_set_domain() and just as stated
there if we can instead attach the blocking domain the UAF is prevented
as this can handle the already removed device. Implement the blocking
domain to use this handling.  With this change, the crash is fixed but
we still hit a warning attempting to change DMA ownership on a blocked
device.

Fixes: c76c067e48 ("s390/pci: Use dma-iommu layer")
Co-developed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240910211516.137933-1-mjrosato@linux.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05 14:02:03 +01:00
Heiko Carstens
a9681f745f s390/pageattr: Implement missing kernel_page_present()
[ Upstream commit 2835f8bf55 ]

kernel_page_present() was intentionally not implemented when adding
ARCH_HAS_SET_DIRECT_MAP support, since it was only used for suspend/resume
which is not supported anymore on s390.

A new bpf use case led to a compile error specific to s390. Even though
this specific use case went away implement kernel_page_present(), so that
the API is complete and potential future users won't run into this problem.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/all/045de961-ac69-40cc-b141-ab70ec9377ec@iogearbox.net
Fixes: 0490d6d7ba ("s390/mm: enable ARCH_HAS_SET_DIRECT_MAP")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05 14:01:11 +01:00
Gerd Bayer
50b81b50a9 s390/facilities: Fix warning about shadow of global variable
[ Upstream commit 61997c1e94 ]

Compiling the kernel with clang W=2 produces a warning that the
parameter declarations in some routines would shadow the definition of
the global variable stfle_fac_list. Address this warning by renaming the
parameters to fac_list.

Fixes: 17e89e1340 ("s390/facilities: move stfl information from lowcore to global data")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05 14:01:10 +01:00
Linus Torvalds
c91c14618f Merge tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:

 - Fix PCI error recovery by handling error events correctly

 - Fix CCA crypto card behavior within protected execution environment

 - Two KVM commits which fix virtual vs physical address handling bugs
   in KVM pfault handling

 - Fix return code handling in pckmo_key2protkey()

 - Deactivate sclp console as late as possible so that outstanding
   messages appear on the console instead of being dropped on reboot

 - Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
   as required by the vt220 specification

 - Initialize also psw mask in perf_arch_fetch_caller_regs() to make
   sure that user_mode(regs) will return false

 - Update defconfigs

* tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Update defconfigs
  s390: Initialize psw mask in perf_arch_fetch_caller_regs()
  s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
  s390/sclp: Deactivate sclp after all its users
  s390/pkey_pckmo: Return with success for valid protected key types
  KVM: s390: Change virtual to physical address access in diag 0x258 handler
  KVM: s390: gaccess: Check if guest address is in memslot
  s390/ap: Fix CCA crypto card behavior within protected execution environment
  s390/pci: Handle PCI error codes other than 0x3a
2024-10-18 07:01:59 -07:00
Heiko Carstens
223e7fb979 s390: Initialize psw mask in perf_arch_fetch_caller_regs()
Also initialize regs->psw.mask in perf_arch_fetch_caller_regs().
This way user_mode(regs) will return false, like it should.

It looks like all current users initialize regs to zero, so that this
doesn't fix a bug currently. However it is better to not rely on callers
to do this.

Fixes: 914d52e464 ("s390: implement perf_arch_fetch_caller_regs")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-10-16 11:32:32 +02:00
Alexander Gordeev
3d5854d75e fs/proc/kcore.c: allow translation of physical memory addresses
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead.  That leads to a wrong read.

Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks.  That
way an architecture can deal with specific physical memory ranges.

Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.

For other architectures the default callback is basically NOP.  It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.

Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09 12:47:19 -07:00
Linus Torvalds
3a37872316 Merge tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
 "Enumeration:

   - Wait for device readiness after reset by polling Vendor ID and
     looking for Configuration RRS instead of polling the Command
     register and looking for non-error completions, to avoid hardware
     retries done for RRS on non-Vendor ID reads (Bjorn Helgaas)

   - Rename CRS Completion Status to RRS ('Request Retry Status') to
     match PCIe r6.0 spec usage (Bjorn Helgaas)

   - Clear LBMS bit after a manual link retrain so we don't try to
     retrain a link when there's no downstream device anymore (Maciej W.
     Rozycki)

   - Revert to the original link speed after retraining fails instead of
     leaving it restricted to 2.5GT/s, so a future device has a chance
     to use higher speeds (Maciej W. Rozycki)

   - Wait for each level of downstream bus, not just the first, to
     become accessible before restoring devices on that bus (Ilpo
     Järvinen)

   - Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups
     without having to stomp on the core's pdev->dev.groups (Lukas
     Wunner)

  Driver binding:

   - Export pcim_request_region(), a managed counterpart of
     pci_request_region(), for use by drivers (Philipp Stanner)

   - Export pcim_iomap_region() and deprecate pcim_iomap_regions()
     (Philipp Stanner)

   - Request the PCI BAR used by xboxvideo (Philipp Stanner)

   - Request and map drm/ast BARs with pcim_iomap_region() (Philipp
     Stanner)

  MSI:

   - Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a
     single IRQ line and cannot set the affinity of each MSI to a
     specific CPU core (Marek Vasut)

   - Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity()
     implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3,
     mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl,
     xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed'
     warnings (Marek Vasut)

  Power management:

   - Add pwrctl support for ATH11K inside the WCN6855 package (Konrad
     Dybcio)

  PCI device hotplug:

   - Remove unnecessary hpc_ops struct from shpchp (ngn)

   - Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp
     (weiyufeng)

  Virtualization:

   - Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson)

   - Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS
     but does provide ACS-like features (Subramanian Ananthanarayanan)

  IOMMU:

   - Add function 0 DMA alias quirk for Glenfly Arise audio function,
     which uses the function 0 Requester ID (WangYuli)

  NPEM:

   - Add Native PCIe Enclosure Management (NPEM) support for sysfs
     control of NVMe RAID storage indicators (ok/fail/locate/
     rebuild/etc) (Mariusz Tkaczyk)

   - Add support for the ACPI _DSM PCIe SSD status LED management, which
     is functionally similar to NPEM but mediated by platform firmware
     (Mariusz Tkaczyk)

  Device trees:

   - Drop minItems and maxItems from ranges in PCI generic host binding
     since host bridges may have several MMIO and I/O port apertures
     (Frank Li)

   - Add kirin, rcar-gen2, uniphier DT binding top-level constraints for
     clocks (Krzysztof Kozlowski)

  Altera PCIe controller driver:

   - Convert altera DT bindings from text to YAML (Matthew Gerlach)

   - Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same
     thing and is what other drivers use (Jinjie Ruan)

  Broadcom STB PCIe controller driver:

   - Add DT binding maxItems for reset controllers (Jim Quinlan)

   - Use the 'bridge' reset method if described in the DT (Jim Quinlan)

   - Use the 'swinit' reset method if described in the DT (Jim Quinlan)

   - Add 'has_phy' so the existence of a 'rescal' reset controller
     doesn't imply software control of it (Jim Quinlan)

   - Add support for many inbound DMA windows (Jim Quinlan)

   - Rename SoC 'type' to 'soc_base' express the fact that SoCs come in
     families of multiple similar devices (Jim Quinlan)

   - Add Broadcom 7712 DT description and driver support (Jim Quinlan)

   - Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings for
     maintainability (Bjorn Helgaas)

  Freescale i.MX6 PCIe controller driver:

   - Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints
     (Richard Zhu)

   - Fix a code restructuring error that caused i.MX8MM and i.MX8MP
     Endpoints to fail to establish link (Richard Zhu)

   - Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing
     outbound alignment requirement (Richard Zhu)

   - Call phy_power_off() in the .probe() error path (Frank Li)

   - Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also
     supported (Frank Li)

   - Manage Refclk by using SoC-specific callbacks instead of switch
     statements (Frank Li)

   - Manage core reset by using SoC-specific callbacks instead of switch
     statements (Frank Li)

   - Expand comments for erratum ERR010728 workaround (Frank Li)

   - Use generic PHY APIs to configure mode, speed, and submode, which
     is harmless for devices that implement their own internal PHY
     management and don't set the generic imx_pcie->phy (Frank Li)

   - Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver
     Root Complex support (Richard Zhu)

  Freescale Layerscape PCIe controller driver:

   - Replace layerscape-pcie DT binding compatible fsl,lx2160a-pcie with
     fsl,lx2160ar2-pcie (Frank Li)

   - Add layerscape-pcie DT binding deprecated 'num-viewport' property
     to address a DT checker warning (Frank Li)

   - Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array
     (Frank Li)

  Loongson PCIe controller driver:

   - Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets
     (Huacai Chen)

  Marvell Aardvark PCIe controller driver:

   - Fix issue with emulating Configuration RRS for two-byte reads of
     Vendor ID; previously it only worked for four-byte reads (Bjorn
     Helgaas)

  MediaTek PCIe Gen3 controller driver:

   - Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC
     types (Lorenzo Bianconi)

   - Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi)

   - Add DT and driver support for Airoha EN7581 PCIe controller
     (Lorenzo Bianconi)

  Qualcomm PCIe controller driver:

   - Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan
     Ansari)

   - Add back DT 'vddpe-3v3-supply', which was incorrectly removed
     earlier (Johan Hovold)

   - Drop endpoint redundant masking of global IRQ events (Manivannan
     Sadhasivam)

   - Clarify unknown global IRQ message and only log it once to avoid a
     flood (Manivannan Sadhasivam)

   - Add 'linux,pci-domain' property to endpoint DT binding (Manivannan
     Sadhasivam)

   - Assign PCI domain number for endpoint controllers (Manivannan
     Sadhasivam)

   - Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for
     endpoint controller (Manivannan Sadhasivam)

   - Add global SPI interrupt for PCIe link events to DT binding
     (Manivannan Sadhasivam)

   - Add global RC interrupt handler to handle 'Link up' events and
     automatically enumerate hot-added devices (Manivannan Sadhasivam)

   - Avoid mirroring of DBI and iATU register space so it doesn't
     overlap BAR MMIO space (Prudhvi Yarlagadda)

   - Enable controller resources like PHY only after PERST# is
     deasserted to partially avoid the problem that the endpoint SoC
     crashes when accessing things when Refclk is absent (Manivannan
     Sadhasivam)

   - Add 16.0 GT/s equalization and RX lane margining settings (Shashank
     Babu Chinta Venkata)

   - Pass domain number to pci_bus_release_domain_nr() explicitly to
     avoid a NULL pointer dereference (Manivannan Sadhasivam)

  Renesas R-Car PCIe controller driver:

   - Make the read-only const array 'check_addr' static (Colin Ian King)

   - Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding
     (Yoshihiro Shimoda)

  TI DRA7xx PCIe controller driver:

   - Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary
     handler is NULL (Siddharth Vadapalli)

   - Handle IRQ request errors during root port and endpoint probe
     (Siddharth Vadapalli)

  TI J721E PCIe driver:

   - Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable
     the ACSPCIE module to drive Refclk for the Endpoint (Siddharth
     Vadapalli)

   - Extract the cadence link setup from cdns_pcie_host_setup() so link
     setup can be done separately during resume (Thomas Richard)

   - Add T_PERST_CLK_US definition for the mandatory delay between
     Refclk becoming stable and PERST# being deasserted (Thomas Richard)

   - Add j721e suspend and resume support (Théo Lebrun)

  TI Keystone PCIe controller driver:

   - Fix NULL pointer checking when applying MRRS limitation quirk for
     AM65x SR 1.0 Errata #i2037 (Dan Carpenter)

  Xilinx NWL PCIe controller driver:

   - Fix off-by-one error in INTx IRQ handler that caused INTx
     interrupts to be lost or delivered as the wrong interrupt (Sean
     Anderson)

   - Rate-limit misc interrupt messages (Sean Anderson)

   - Turn off the clock on probe failure and device removal (Sean
     Anderson)

   - Add DT binding and driver support for enabling/disabling PHYs (Sean
     Anderson)

   - Add PCIe phy bindings for the ZCU102 (Sean Anderson)

  Xilinx XDMA PCIe controller driver:

   - Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT
     binding and xilinx-dma-pl driver (Thippeswamy Havalige)

  Miscellaneous:

   - Fix buffer overflow in kirin_pcie_parse_port() (Alexandra Diupina)

   - Fix minor kerneldoc issues and typos (Bjorn Helgaas)

   - Use PCI_DEVID() macro in aer_inject() instead of open-coding it
     (Jinjie Ruan)

   - Check pcie_find_root_port() return in x86 fixups to avoid NULL
     pointer dereferences (Samasth Norway Ananda)

   - Make pci_bus_type constant (Kunwu Chan)

   - Remove unused declarations of __pci_pme_wakeup() and
     pci_vpd_release() (Yue Haibing)

   - Remove any leftover .*.cmd files with make clean (zhang jiao)

   - Remove unused BILLION macro (zhang jiao)"

* tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (132 commits)
  PCI: Fix typos
  dt-bindings: PCI: qcom: Allow 'vddpe-3v3-supply' again
  tools: PCI: Remove unused BILLION macro
  tools: PCI: Remove .*.cmd files with make clean
  PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
  PCI: dra7xx: Fix error handling when IRQ request fails in probe
  PCI: dra7xx: Fix threaded IRQ request for "dra7xx-pcie-main" IRQ
  PCI: qcom: Add RX lane margining settings for 16.0 GT/s
  PCI: qcom: Add equalization settings for 16.0 GT/s
  PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed
  PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed'
  PCI: qcom-ep: Enable controller resources like PHY only after refclk is available
  PCI: Mark Creative Labs EMU20k2 INTx masking as broken
  dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint
  dt-bindings: PCI: altera: msi: Convert to YAML
  PCI: imx6: Add i.MX8Q PCIe Root Complex (RC) support
  PCI: Rename CRS Completion Status to RRS
  PCI: aardvark: Correct Configuration RRS checking
  PCI: Wait for device readiness with Configuration RRS
  PCI: brcmstb: Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings
  ...
2024-09-23 12:47:06 -07:00
Linus Torvalds
1ec6d09789 Merge tag 's390-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:

 - Optimize ftrace and kprobes code patching and avoid stop machine for
   kprobes if sequential instruction fetching facility is available

 - Add hiperdispatch feature to dynamically adjust CPU capacity in
   vertical polarization to improve scheduling efficiency and overall
   performance. Also add infrastructure for handling warning track
   interrupts (WTI), allowing for graceful CPU preemption

 - Rework crypto code pkey module and split it into separate,
   independent modules for sysfs, PCKMO, CCA, and EP11, allowing modules
   to load only when the relevant hardware is available

 - Add hardware acceleration for HMAC modes and the full AES-XTS cipher,
   utilizing message-security assist extensions (MSA) 10 and 11. It
   introduces new shash implementations for HMAC-SHA224/256/384/512 and
   registers the hardware-accelerated AES-XTS cipher as the preferred
   option. Also add clear key token support

 - Add MSA 10 and 11 processor activity instrumentation counters to perf
   and update PAI Extension 1 NNPA counters

 - Cleanup cpu sampling facility code and rework debug/WARN_ON_ONCE
   statements

 - Add support for SHA3 performance enhancements introduced with MSA 12

 - Add support for the query authentication information feature of MSA
   13 and introduce the KDSA CPACF instruction. Provide query and query
   authentication information in sysfs, enabling tools like cpacfinfo to
   present this data in a human-readable form

 - Update kernel disassembler instructions

 - Always enable EXPOLINE_EXTERN if supported by the compiler to ensure
   kpatch compatibility

 - Add missing warning handling and relocated lowcore support to the
   early program check handler

 - Optimize ftrace_return_address() and avoid calling unwinder

 - Make modules use kernel ftrace trampolines

 - Strip relocs from the final vmlinux ELF file to make it roughly 2
   times smaller

 - Dump register contents and call trace for early crashes to the
   console

 - Generate ptdump address marker array dynamically

 - Fix rcu_sched stalls that might occur when adding or removing large
   amounts of pages at once to or from the CMM balloon

 - Fix deadlock caused by recursive lock of the AP bus scan mutex

 - Unify sync and async register save areas in entry code

 - Cleanup debug prints in crypto code

 - Various cleanup and sanitizing patches for the decompressor

 - Various small ftrace cleanups

* tag 's390-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (84 commits)
  s390/crypto: Display Query and Query Authentication Information in sysfs
  s390/crypto: Add Support for Query Authentication Information
  s390/crypto: Rework RRE and RRF CPACF inline functions
  s390/crypto: Add KDSA CPACF Instruction
  s390/disassembler: Remove duplicate instruction format RSY_RDRU
  s390/boot: Move boot_printk() code to own file
  s390/boot: Use boot_printk() instead of sclp_early_printk()
  s390/boot: Rename decompressor_printk() to boot_printk()
  s390/boot: Compile all files with the same march flag
  s390: Use MARCH_HAS_*_FEATURES defines
  s390: Provide MARCH_HAS_*_FEATURES defines
  s390/facility: Disable compile time optimization for decompressor code
  s390/boot: Increase minimum architecture to z10
  s390/als: Remove obsolete comment
  s390/sha3: Fix SHA3 selftests failures
  s390/pkey: Add AES xts and HMAC clear key token support
  s390/cpacf: Add MSA 10 and 11 new PCKMO functions
  s390/mm: Add cond_resched() to cmm_alloc/free_pages()
  s390/pai_ext: Update PAI extension 1 counters
  s390/pai_crypto: Add support for MSA 10 and 11 pai counters
  ...
2024-09-21 09:02:54 -07:00
Linus Torvalds
617a814f14 Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "Along with the usual shower of singleton patches, notable patch series
  in this pull request are:

   - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
     consistency to the APIs and behaviour of these two core allocation
     functions. This also simplifies/enables Rustification.

   - "Some cleanups for shmem" from Baolin Wang. No functional changes -
     mode code reuse, better function naming, logic simplifications.

   - "mm: some small page fault cleanups" from Josef Bacik. No
     functional changes - code cleanups only.

   - "Various memory tiering fixes" from Zi Yan. A small fix and a
     little cleanup.

   - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
     simplifications and .text shrinkage.

   - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
     Butt. This is a feature, it adds new feilds to /proc/vmstat such as

       $ grep kstack /proc/vmstat
       kstack_1k 3
       kstack_2k 188
       kstack_4k 11391
       kstack_8k 243
       kstack_16k 0

     which tells us that 11391 processes used 4k of stack while none at
     all used 16k. Useful for some system tuning things, but
     partivularly useful for "the dynamic kernel stack project".

   - "kmemleak: support for percpu memory leak detect" from Pavel
     Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.

   - "mm: memcg: page counters optimizations" from Roman Gushchin. "3
     independent small optimizations of page counters".

   - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
     David Hildenbrand. Improves PTE/PMD splitlock detection, makes
     powerpc/8xx work correctly by design rather than by accident.

   - "mm: remove arch_make_page_accessible()" from David Hildenbrand.
     Some folio conversions which make arch_make_page_accessible()
     unneeded.

   - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
     Finkel. Cleans up and fixes our handling of the resetting of the
     cgroup/process peak-memory-use detector.

   - "Make core VMA operations internal and testable" from Lorenzo
     Stoakes. Rationalizaion and encapsulation of the VMA manipulation
     APIs. With a view to better enable testing of the VMA functions,
     even from a userspace-only harness.

   - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
     issues in the zswap global shrinker, resulting in improved
     performance.

   - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
     in some missing info in /proc/zoneinfo.

   - "mm: replace follow_page() by folio_walk" from David Hildenbrand.
     Code cleanups and rationalizations (conversion to folio_walk())
     resulting in the removal of follow_page().

   - "improving dynamic zswap shrinker protection scheme" from Nhat
     Pham. Some tuning to improve zswap's dynamic shrinker. Significant
     reductions in swapin and improvements in performance are shown.

   - "mm: Fix several issues with unaccepted memory" from Kirill
     Shutemov. Improvements to the new unaccepted memory feature,

   - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
     DAX PUDs. This was missing, although nobody seems to have notied
     yet.

   - "Introduce a store type enum for the Maple tree" from Sidhartha
     Kumar. Cleanups and modest performance improvements for the maple
     tree library code.

   - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
     more cgroup v1 remnants away from the v2 memcg code.

   - "memcg: initiate deprecation of v1 features" from Shakeel Butt.
     Adds various warnings telling users that memcg v1 features are
     deprecated.

   - "mm: swap: mTHP swap allocator base on swap cluster order" from
     Chris Li. Greatly improves the success rate of the mTHP swap
     allocation.

   - "mm: introduce numa_memblks" from Mike Rapoport. Moves various
     disparate per-arch implementations of numa_memblk code into generic
     code.

   - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
     improves the performance of munmap() of swap-filled ptes.

   - "support large folio swap-out and swap-in for shmem" from Baolin
     Wang. With this series we no longer split shmem large folios into
     simgle-page folios when swapping out shmem.

   - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
     performance improvements and code reductions for gigantic folios.

   - "support shmem mTHP collapse" from Baolin Wang. Adds support for
     khugepaged's collapsing of shmem mTHP folios.

   - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
     performance regression due to the addition of mseal().

   - "Increase the number of bits available in page_type" from Matthew
     Wilcox. Increases the number of bits available in page_type!

   - "Simplify the page flags a little" from Matthew Wilcox. Many legacy
     page flags are now folio flags, so the page-based flags and their
     accessors/mutators can be removed.

   - "mm: store zero pages to be swapped out in a bitmap" from Usama
     Arif. An optimization which permits us to avoid writing/reading
     zero-filled zswap pages to backing store.

   - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
     window which occurs when a MAP_FIXED operqtion is occurring during
     an unrelated vma tree walk.

   - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
     the vma_merge() functionality, making ot cleaner, more testable and
     better tested.

   - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
     Minor fixups of DAMON selftests and kunit tests.

   - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
     Code cleanups and folio conversions.

   - "Shmem mTHP controls and stats improvements" from Ryan Roberts.
     Cleanups for shmem controls and stats.

   - "mm: count the number of anonymous THPs per size" from Barry Song.
     Expose additional anon THP stats to userspace for improved tuning.

   - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
     folio conversions and removal of now-unused page-based APIs.

   - "replace per-quota region priorities histogram buffer with
     per-context one" from SeongJae Park. DAMON histogram
     rationalization.

   - "Docs/damon: update GitHub repo URLs and maintainer-profile" from
     SeongJae Park. DAMON documentation updates.

   - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
     improve related doc and warn" from Jason Wang: fixes usage of page
     allocator __GFP_NOFAIL and GFP_ATOMIC flags.

   - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
     This was overprovisioning THPs in sparsely accessed memory areas.

   - "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
     Add support for zram run-time compression algorithm tuning.

   - "mm: Care about shadow stack guard gap when getting an unmapped
     area" from Mark Brown. Fix up the various arch_get_unmapped_area()
     implementations to better respect guard areas.

   - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
     of mem_cgroup_iter() and various code cleanups.

   - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
     pfnmap support.

   - "resource: Fix region_intersects() vs add_memory_driver_managed()"
     from Huang Ying. Fix a bug in region_intersects() for systems with
     CXL memory.

   - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
     a couple more code paths to correctly recover from the encountering
     of poisoned memry.

   - "mm: enable large folios swap-in support" from Barry Song. Support
     the swapin of mTHP memory into appropriately-sized folios, rather
     than into single-page folios"

* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
  zram: free secondary algorithms names
  uprobes: turn xol_area->pages[2] into xol_area->page
  uprobes: introduce the global struct vm_special_mapping xol_mapping
  Revert "uprobes: use vm_special_mapping close() functionality"
  mm: support large folios swap-in for sync io devices
  mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
  mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
  mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
  set_memory: add __must_check to generic stubs
  mm/vma: return the exact errno in vms_gather_munmap_vmas()
  memcg: cleanup with !CONFIG_MEMCG_V1
  mm/show_mem.c: report alloc tags in human readable units
  mm: support poison recovery from copy_present_page()
  mm: support poison recovery from do_cow_fault()
  resource, kunit: add test case for region_intersects()
  resource: make alloc_free_mem_region() works for iomem_resource
  mm: z3fold: deprecate CONFIG_Z3FOLD
  vfio/pci: implement huge_fault support
  mm/arm64: support large pfn mappings
  mm/x86: support large pfn mappings
  ...
2024-09-21 07:29:05 -07:00
Peter Xu
0515e022e1 mm: always define pxx_pgprot()
There're:

  - 8 archs (arc, arm64, include, mips, powerpc, s390, sh, x86) that
  support pte_pgprot().

  - 2 archs (x86, sparc) that support pmd_pgprot().

  - 1 arch (x86) that support pud_pgprot().

Always define them to be used in generic code, and then we don't need to
fiddle with "#ifdef"s when doing so.

Link: https://lkml.kernel.org/r/20240826204353.2228736-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17 01:06:59 -07:00
Heiko Carstens
b920aa77be s390/vdso: Wire up getrandom() vdso implementation
Provide the s390 specific vdso getrandom() architecture backend.

_vdso_rng_data required data is placed within the _vdso_data vvar page,
by using a hardcoded offset larger than vdso_data.

As required the chacha20 implementation does not write to the stack.

The implementation follows more or less the arm64 implementations and
makes use of vector instructions. It has a fallback to the getrandom()
system call for machines where the vector facility is not installed.

The check if the vector facility is installed, as well as an
optimization for machines with the vector-enhancements facility 2, is
implemented with alternatives, avoiding runtime checks.

Note that __kernel_getrandom() is implemented without the vdso user
wrapper which would setup a stack frame for odd cases (aka very old
glibc variants) where the caller has not done that. All callers of
__kernel_getrandom() are required to setup a stack frame, like the C ABI
requires it.

The vdso testcases vdso_test_getrandom and vdso_test_chacha pass.

Benchmark on a z16:

    $ ./vdso_test_getrandom bench-single
       vdso: 25000000 times in 0.493703559 seconds
    syscall: 25000000 times in 6.584025337 seconds

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 20:57:31 +02:00
Heiko Carstens
c1ae1b4ef5 s390/vdso: Move vdso symbol handling to separate header file
The vdso.h header file, which is included at many places, includes
generated header files. This can easily lead to recursive header file
inclusions if the vdso code is changed.

Therefore move the vdso symbol code, which requires the generated
header files, to a separate header file, and include it at the two
locations which require it.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:36 +02:00
Heiko Carstens
a919390e91 s390/module: Provide find_section() helper
Provide find_section() helper function which can be used to find a
section by name, similar to other architectures.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:36 +02:00
Heiko Carstens
94c7755b1e s390/facility: Let test_facility() generate static branch if possible
Let test_facility() generate a branch instruction if the tested facility is
a constant, and where the result cannot be evaluated during compile
time. The branch instruction defaults to "false" and is patched to nop
(branch not taken) if the tested facility is available.

This avoids runtime checks and is similar to x86's static_cpu_has() and
arm64's alternative_has_cap_likely().

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:36 +02:00
Heiko Carstens
013e984397 s390/alternatives: Remove ALT_FACILITY_EARLY
Patch all alternatives which depend on facilities from the decompressor.
There is no technical reason which enforces to split patching of such
alternatives to the decompressor and the kernel.

This simplifies alternative handling a bit, since one alternative type is
removed.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:36 +02:00
Heiko Carstens
26d4959681 s390/facility: Disable compile time optimization for decompressor code
Disable compile time optimizations of test_facility() for the
decompressor. The decompressor should not contain any optimized code
depending on the architecture level set the kernel image is compiled
for to avoid unexpected operation exceptions.

Add a __DECOMPRESSOR check to test_facility() to enforce that
facilities are always checked during runtime for the decompressor.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13 17:28:36 +02:00
Finn Callies
9bbd1bfb86 s390/crypto: Add Support for Query Authentication Information
Introduce functions __cpacf_qai() and wrapper cpacf_qai() to the
respective existing functions __cpacf_query() and cpacf_query() are
introduced to support the Query Authentication Information feature of
MSA 13.

Suggested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Finn Callies <fcallies@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-09-12 14:13:27 +02:00
Finn Callies
27aad7f7a4 s390/crypto: Rework RRE and RRF CPACF inline functions
Rework of the __cpacf_query_rre() and __cpacf_query_rrf() functions
to support additional function codes. A function code is passed as a
new parameter to specify which subfunction of the supplied Instruction
is to be called.

Suggested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Finn Callies <fcallies@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-09-12 14:13:27 +02:00
Finn Callies
d2dec49d76 s390/crypto: Add KDSA CPACF Instruction
Add the function code definitions for using the KDSA function to the
CPACF header file.

Suggested-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Finn Callies <fcallies@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-09-12 14:13:27 +02:00
Heiko Carstens
ebcc369f18 s390: Use MARCH_HAS_*_FEATURES defines
Replace CONFIG_HAVE_MARCH_*_FEATURES with MARCH_HAS_*_FEATURES
everywhere so code gets compiled correctly depending on if the
target is the kernel or the decompressor.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-07 17:12:42 +02:00
Heiko Carstens
697b37371f s390: Provide MARCH_HAS_*_FEATURES defines
Provide MARCH_HAS_*_FEATURES defines which are supposed to be used
everywhere instead of the CONFIG_HAVE_MARCH_*_FEATURES defines.

Various header files contain code which depend on the
CONFIG_HAVE_MARCH_*_FEATURES defines, allowing for compile time
optimizations. If such code is used within the decompressor wrong code may
be generated (the compiler may generate instructions which are not
available for the minimum architecture level of the decompressor).

Therefore provide a new header file with MARCH_HAS_*_FEATURES defines,
which are only available if __DECOMPRESSOR is not defined. This way code
generation for the kernel image is still optimized depending on
CONFIG_HAVE_MARCH_*_FEATURES, while code generated for the decompressor is
compiled for the minimum architecture level.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-07 17:12:42 +02:00
Heiko Carstens
0147addc4f s390/facility: Disable compile time optimization for decompressor code
Disable compile time optimizations of test_facility() for the
decompressor. The decompressor should not contain any optimized code
depending on the architecture level set the kernel image is compiled
for to avoid unexpected operation exceptions.

Add a __DECOMPRESSOR check to test_facility() to enforce that
facilities are always checked during runtime for the decompressor.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-07 17:12:42 +02:00
Harald Freudenberger
fd197556ee s390/pkey: Add AES xts and HMAC clear key token support
Add support for deriving protected keys from clear key token for
AES xts and HMAC keys via PCKMO instruction. Add support for
protected key generation and unwrap of protected key tokens for
these key types. Furthermore 4 new sysfs attributes are introduced:

- /sys/devices/virtual/misc/pkey/protkey/protkey_aes_xts_128
- /sys/devices/virtual/misc/pkey/protkey/protkey_aes_xts_256
- /sys/devices/virtual/misc/pkey/protkey/protkey_hmac_512
- /sys/devices/virtual/misc/pkey/protkey/protkey_hmac_1024

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05 15:17:23 +02:00
Harald Freudenberger
8fe32188f9 s390/cpacf: Add MSA 10 and 11 new PCKMO functions
Add the defines for the new PCKMO functions covering
MSA 10 (AES XTS "double" keys) and MSA 11 (HMAC keys)
support.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-05 15:17:23 +02:00
Mike Rapoport (Microsoft)
46bcce5031 arch, mm: move definition of node_data to generic code
Every architecture that supports NUMA defines node_data in the same way:

	struct pglist_data *node_data[MAX_NUMNODES];

No reason to keep multiple copies of this definition and its forward
declarations, especially when such forward declaration is the only thing
in include/asm/mmzone.h for many architectures.

Add definition and declaration of node_data to generic code and drop
architecture-specific versions.

Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03 21:15:28 -07:00
David Hildenbrand
3290ef3c7f s390/uv: drop arch_make_page_accessible()
All code was converted to using arch_make_folio_accessible(), let's drop
arch_make_page_accessible().

Link: https://lkml.kernel.org/r/20240729183844.388481-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:53 -07:00
Mete Durlu
1e5aa12d47 s390/hiperdispatch: Add trace events
Add trace events to debug hiperdispatch behavior and track domain
rebuilding. Two events provide information about the decision making of
hiperdispatch and the adjustments made.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:35 +02:00
Mete Durlu
6843d6d97c s390/hiperdispatch: Introduce hiperdispatch
When LPAR is in vertical polarization, CPUs get different polarization
values, namely vertical high, vertical medium and vertical low. These
values represent the likelyhood of the CPU getting physical runtime.
Vertical high CPUs will always get runtime and others get varying
runtime depending on the load the CEC is under.

Vertical high and vertical medium CPUs are considered the CPUs which the
current LPAR has the entitlement to run on. The vertical lows are on the
other hand are borrowed CPUs which would only be given to the LPAR by
hipervisor when the other LPARs are not utilizing them.

Using the CPU capacities, hint linux scheduler when it should prioritise
vertical high and vertical medium CPUs over vertical low CPUs.
By tracking various system statistics hiperdispatch determines when to
adjust cpu capacities.
After each adjustment, rebuilding of scheduler domains is necessary to
notify the scheduler about capacity changes but since this operation is
costly it should be done as sparsely as possible.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Co-developed-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:35 +02:00
Mete Durlu
26ceef523d s390/smp: Add cpu capacities
Linux scheduler allows architectures to assign capacity values to
individual CPUs. This hints scheduler the performance difference between
CPUs and allows more efficient task distribution them. Implement
helper methods to set and get CPU capacities for s390. This is
particularly helpful in vertical polarization configurations of LPARs.
On vertical polarization an LPARs CPUs can get different polarization
values depending on the CEC configuration. CPUs with different
polarization values can perform different from each other, using CPU
capacities this can be reflected to linux scheduler.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:35 +02:00
Tobias Huschle
2c6c9ccc76 s390/wti: Introduce infrastructure for warning track interrupt
The warning-track interrupt (wti) provides a notification that the
receiving CPU will be pre-empted from its physical CPU within a short
time frame. This time frame is called grace period and depends on the
machine type. Giving up the CPU on time may prevent a task to get stuck
while holding a resource.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:34 +02:00
Vasily Gorbik
bb91ed0ee3 s390/setup: Recognize sequential instruction fetching facility
When sequential instruction fetching facility is present,
certain guarantees are provided for code patching. In particular,
atomic overwrites within 8 aligned bytes is safe from an
instruction-fetching point of view.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-08-29 22:56:34 +02:00