Commit Graph

2449 Commits

Author SHA1 Message Date
Sebastian Andrzej Siewior 1b645cd729 ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels
[ Upstream commit fedadc4137 ]

gup_pgd_range() is invoked with disabled interrupts and invokes
__kmap_local_page_prot() via pte_offset_map(), gup_p4d_range().
With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get()
which uses a spinlock_t via lock_kmap_any(). This leads to an
sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a
sleeping lock and must not be acquired in atomic context.

The loop in map_new_virtual() uses wait_queue_head_t for wake up which
also is using a spinlock_t.

Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT
to allow the use of get_user_pages_fast().

[arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]

Co-developed-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17 16:31:18 +01:00
Nathan Chancellor 6ed79cf118 ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36
commit 53e7e1fb81 upstream.

Commit e7607f7d6d ("ARM: 9443/1: Require linker to support KEEP within
OVERLAY for DCE") accidentally broke the binutils version restriction
that was added in commit 0d437918fb ("ARM: 9414/1: Fix build issue
with LD_DEAD_CODE_DATA_ELIMINATION"), reintroducing the segmentation
fault addressed by that workaround.

Restore the binutils version dependency by using
CONFIG_LD_CAN_USE_KEEP_IN_OVERLAY as an additional condition to ensure
that CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION is only enabled with
binutils >= 2.36 and ld.lld >= 21.0.0.

Closes: https://lore.kernel.org/6739da7d-e555-407a-b5cb-e5681da71056@landley.net/
Closes: https://lore.kernel.org/CAFERDQ0zPoya5ZQfpbeuKVZEo_fKsonLf6tJbp32QnSGAtbi+Q@mail.gmail.com/

Cc: stable@vger.kernel.org
Fixes: e7607f7d6d ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE")
Reported-by: Rob Landley <rob@landley.net>
Tested-by: Rob Landley <rob@landley.net>
Reported-by: Martin Wetterwald <martin@wetterwald.eu>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-01 09:48:42 +01:00
Nathan Chancellor 59fc423183 ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE
commit e7607f7d6d upstream.

ld.lld prior to 21.0.0 does not support using the KEEP keyword within an
overlay description, which may be needed to avoid discarding necessary
sections within an overlay with '--gc-sections', which can be enabled
for the kernel via CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.

Disallow CONFIG_LD_DEAD_CODE_DATA_ELIMINATION without support for KEEP
within OVERLAY and introduce a macro, OVERLAY_KEEP, that can be used to
conditionally add KEEP when it is properly supported to avoid breaking
old versions of ld.lld.

Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/381599f1fe973afad3094e55ec99b1620dba7d8c
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[nathan: Fix conflict in init/Kconfig due to lack of RUSTC symbols]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10 14:39:40 +02:00
Dave Vasilevsky 31daa34315 crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware. 
On these machines, the kernel refuses to boot from non-zero
PHYSICAL_START, which occurs when CRASH_DUMP is on.

Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should
default to off for them.  Users booting via some other mechanism can still
turn it on explicitly.

Does not change the default on any other architectures for the
time being.

Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca
Fixes: 75bc255a74 ("crash: clean up kdump related config items")
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Reported-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-14 22:43:48 -08:00
Linus Torvalds 726e2d0cf2 Merge tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - support DMA zones for arm64 systems where memory starts at > 4GB
   (Baruch Siach, Catalin Marinas)

 - support direct calls into dma-iommu and thus obsolete dma_map_ops for
   many common configurations (Leon Romanovsky)

 - add DMA-API tracing (Sean Anderson)

 - remove the not very useful return value from various dma_set_* APIs
   (Christoph Hellwig)

 - misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph
   Hellwig)

* tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: reflow dma_supported
  dma-mapping: reliably inform about DMA support for IOMMU
  dma-mapping: add tracing for dma-mapping API calls
  dma-mapping: use IOMMU DMA calls for common alloc/free page calls
  dma-direct: optimize page freeing when it is not addressable
  dma-mapping: clearly mark DMA ops as an architecture feature
  vdpa_sim: don't select DMA_OPS
  arm64: mm: keep low RAM dma zone
  dma-mapping: don't return errors from dma_set_max_seg_size
  dma-mapping: don't return errors from dma_set_seg_boundary
  dma-mapping: don't return errors from dma_set_min_align_mask
  scsi: check that busses support the DMA API before setting dma parameters
  arm64: mm: fix DMA zone when dma-ranges is missing
  dma-mapping: direct calls for dma-iommu
  dma-mapping: call ->unmap_page and ->unmap_sg unconditionally
  arm64: support DMA zone above 4GB
  dma-mapping: replace zone_dma_bits by zone_dma_limit
  dma-mapping: use bit masking to check VM_DMA_COHERENT
2024-09-19 11:12:49 +02:00
Linus Torvalds 1636f57c78 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM updates from Russell King:

 - clean up TTBCR magic numbers and use u32 for this register

 - fix clang issue in VFP code leading to kernel oops, caused by
   compiler instruction scheduling.

 - switch 32-bit Arm to use GENERIC_CPU_DEVICES and use the
   arch_cpu_is_hotpluggable() hook.

 - pass struct device to arm_iommu_create_mapping() and move over to use
   iommu_paging_domain_alloc() rather than iommu_domain_alloc()

 - make amba_bustype constant

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9418/1: dma-mapping: Use iommu_paging_domain_alloc()
  ARM: 9417/1: dma-mapping: Pass device to arm_iommu_create_mapping()
  ARM: 9416/1: amba: make amba_bustype constant
  ARM: 9412/1: Convert to arch_cpu_is_hotpluggable()
  ARM: 9411/1: Switch over to GENERIC_CPU_DEVICES using arch_register_cpu()
  ARM: 9410/1: vfp: Use asm volatile in fmrx/fmxr macros
  ARM: 9409/1: mmu: Do not use magic number for TTBCR settings
2024-09-16 06:32:08 +02:00
Yuntao Liu 0d437918fb ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION
There is a build issue with LD segmentation fault, while
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled, as bellow.

scripts/link-vmlinux.sh: line 49:  3796 Segmentation fault
 (core dumped) ${ld} ${ldflags} -o ${output} ${wl}--whole-archive
 ${objs} ${wl}--no-whole-archive ${wl}--start-group
 ${libs} ${wl}--end-group ${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs}

The error occurs in older versions of the GNU ld with version earlier
than 2.36. It makes most sense to have a minimum LD version as
a dependency for HAVE_LD_DEAD_CODE_DATA_ELIMINATION and eliminate
the impact of ".reloc  .text, R_ARM_NONE, ." when
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled.

Fixes: ed0f941022 ("ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION")
Reported-by: Harith George <mail2hgg@gmail.com>
Tested-by: Harith George <mail2hgg@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
Link: https://lore.kernel.org/all/14e9aefb-88d1-4eee-8288-ef15d4a9b059@gmail.com/
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-09-04 14:47:42 +01:00
Christoph Hellwig de6c85bf91 dma-mapping: clearly mark DMA ops as an architecture feature
DMA ops are a helper for architectures and not for drivers to override
the DMA implementation.

Unfortunately driver authors keep ignoring this.  Make the fact more
clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers
overriding their dma_ops depend on that.  These drivers should probably be
marked broken, but we can give them a bit of a grace period for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6
Acked-by: Robin Murphy <robin.murphy@arm.com>
2024-09-04 07:08:51 +03:00
Jinjie Ruan 609face018 ARM: 9411/1: Switch over to GENERIC_CPU_DEVICES using arch_register_cpu()
Currently, almost all architectures have switched to GENERIC_CPU_DEVICES,
except for arm32. Also switch over to GENERIC_CPU_DEVICES, and provide an
arch_register_cpu() that populates the hotpluggable flag for arm32.

The struct cpu in struct cpuinfo_arm is never used directly, remove
it to use the one GENERIC_CPU_DEVICES provides.

This also has the effect of moving the registration of CPUs from subsys to
driver core initialisation, prior to any initcalls running.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-08-20 11:18:49 +01:00
Linus Torvalds 3894840a7a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM updates from Russell King:

 - ftrace: don't assume stack frames are contiguous in memory

 - remove unused mod_inwind_map structure

 - spelling fixes

 - allow use of LD dead code/data elimination

 - fix callchain_trace() return value

 - add support for stackleak gcc plugin

 - correct some reset asm function prototypes for CFI

[ Missed the merge window because Russell forgot to push out ]

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9408/1: mm: CFI: Fix some erroneous reset prototypes
  ARM: 9407/1: Add support for STACKLEAK gcc plugin
  ARM: 9406/1: Fix callchain_trace() return value
  ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
  ARM: 9403/1: Alpine: Spelling s/initialiing/initializing/
  ARM: 9402/1: Kconfig: Spelling s/Cortex A-/Cortex-A/
  ARM: 9400/1: Remove unused struct 'mod_unwind_map'
2024-07-29 10:33:51 -07:00
Linus Torvalds ca83c61cb3 Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Remove tristate choice support from Kconfig

 - Stop using the PROVIDE() directive in the linker script

 - Reduce the number of links for the combination of CONFIG_KALLSYMS and
   CONFIG_DEBUG_INFO_BTF

 - Enable the warning for symbol reference to .exit.* sections by
   default

 - Fix warnings in RPM package builds

 - Improve scripts/make_fit.py to generate a FIT image with separate
   base DTB and overlays

 - Improve choice value calculation in Kconfig

 - Fix conditional prompt behavior in choice in Kconfig

 - Remove support for the uncommon EMAIL environment variable in Debian
   package builds

 - Remove support for the uncommon "name <email>" form for the DEBEMAIL
   environment variable

 - Raise the minimum supported GNU Make version to 4.0

 - Remove stale code for the absolute kallsyms

 - Move header files commonly used for host programs to scripts/include/

 - Introduce the pacman-pkg target to generate a pacman package used in
   Arch Linux

 - Clean up Kconfig

* tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits)
  kbuild: doc: gcc to CC change
  kallsyms: change sym_entry::percpu_absolute to bool type
  kallsyms: unify seq and start_pos fields of struct sym_entry
  kallsyms: add more original symbol type/name in comment lines
  kallsyms: use \t instead of a tab in printf()
  kallsyms: avoid repeated calculation of array size for markers
  kbuild: add script and target to generate pacman package
  modpost: use generic macros for hash table implementation
  kbuild: move some helper headers from scripts/kconfig/ to scripts/include/
  Makefile: add comment to discourage tools/* addition for kernel builds
  kbuild: clean up scripts/remove-stale-files
  kconfig: recursive checks drop file/lineno
  kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec
  kallsyms: get rid of code for absolute kallsyms
  kbuild: Create INSTALL_PATH directory if it does not exist
  kbuild: Abort make on install failures
  kconfig: remove 'e1' and 'e2' macros from expression deduplication
  kconfig: remove SYMBOL_CHOICEVAL flag
  kconfig: add const qualifiers to several function arguments
  kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups()
  ...
2024-07-23 14:32:21 -07:00
Masahiro Yamada b9d73218d7 treewide: change conditional prompt for choices to 'depends on'
While Documentation/kbuild/kconfig-language.rst provides a brief
explanation, there are recurring confusions regarding the usage of a
prompt followed by 'if <expr>'. This conditional controls _only_ the
prompt.

A typical usage is as follows:

    menuconfig BLOCK
            bool "Enable the block layer" if EXPERT
            default y

When EXPERT=n, the prompt is hidden, but this config entry is still
active, and BLOCK is set to its default value 'y'. This is reasonable
because you are likely want to enable the block device support. When
EXPERT=y, the prompt is shown, allowing you to toggle BLOCK.

Please note that it is different from 'depends on EXPERT', which would
enable and disable the entire config entry.

However, this conditional prompt has never worked in a choice block.

The following two work in the same way: when EXPERT is disabled, the
choice block is entirely disabled.

[Test Code 1]

    choice
            prompt "choose" if EXPERT

    config A
            bool "A"

    config B
            bool "B"

    endchoice

[Test Code 2]

    choice
            prompt "choose"
            depends on EXPERT

    config A
            bool "A"

    config B
            bool "B"

    endchoice

I believe the first case should hide only the prompt, producing the
default:

   CONFIG_A=y
   # CONFIG_B is not set

The next commit will change (fix) the behavior of the conditional prompt
in choice blocks.

I see several choice blocks wrongly using a conditional prompt, where
'depends on' makes more sense.

To preserve the current behavior, this commit converts such misuses.

I did not touch the following entry in arch/x86/Kconfig:

    choice
            prompt "Memory split" if EXPERT
            default VMSPLIT_3G

This is truly the correct use of the conditional prompt; when EXPERT=n,
this choice block should silently select the reasonable VMSPLIT_3G,
although the resulting PAGE_OFFSET will not be affected anyway.

Presumably, the one in fs/jffs2/Kconfig is also correct, but I converted
it to 'depends on' to avoid any potential behavioral change.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-16 01:08:37 +09:00
Paul E. McKenney f1b5644862 ARM: Emulate one-byte cmpxchg
Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on ARM systems
with ARCH == ARMv6K.

[ paulmck: Apply Arnd Bergmann and Nathan Chancellor feedback. ]
[ paulmck: Apply Linus Walleij feedback. ]

Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/54798f68-48f7-4c65-9cba-47c0bf175143@sirena.org.uk/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYuZ+pf6p8AXMZWtdFtX-gbG8HMaBKp=XbxcdzA_QeLkxQ@mail.gmail.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Davis <afd@ti.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric DeVolder <eric.devolder@oracle.com>
Cc: Rob Herring <robh@kernel.org>
Cc: <linux-arm-kernel@lists.infradead.org>
2024-07-04 13:32:41 -07:00
Jinjie Ruan 2335c9cb83 ARM: 9407/1: Add support for STACKLEAK gcc plugin
Add the STACKLEAK gcc plugin to arm32 by adding the helper used by
stackleak common code: on_thread_stack(). It initialize the stack with the
poison value before returning from system calls which improves the kernel
security. Additionally, this disables the plugin in EFI stub code and
decompress code, which are out of scope for the protection.

Before the test on Qemu versatilepb board:
	# echo STACKLEAK_ERASING  > /sys/kernel/debug/provoke-crash/DIRECT
	lkdtm: Performing direct entry STACKLEAK_ERASING
	lkdtm: XFAIL: stackleak is not supported on this arch (HAVE_ARCH_STACKLEAK=n)

After:
	# echo STACKLEAK_ERASING  > /sys/kernel/debug/provoke-crash/DIRECT
	lkdtm: Performing direct entry STACKLEAK_ERASING
	lkdtm: stackleak stack usage:
	  high offset: 80 bytes
	  current:     280 bytes
	  lowest:      696 bytes
	  tracked:     696 bytes
	  untracked:   192 bytes
	  poisoned:    7220 bytes
	  low offset:  4 bytes
	lkdtm: OK: the rest of the thread stack is properly erased

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-07-02 09:18:43 +01:00
Yuntao Liu ed0f941022 ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
The current arm32 architecture does not yet support the
HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
embedded scenarios, and enabling this feature would be beneficial for
reducing the size of the kernel image.

In order to make this work, we keep the necessary tables by annotating
them with KEEP, also it requires further changes to linker script to KEEP
some tables and wildcard compiler generated sections into the right place.
When using ld.lld for linking, KEEP is not recognized within the OVERLAY
command, and Ard proposed a concise method to solve this problem.

It boots normally with defconfig, vexpress_defconfig and tinyconfig.

The size comparison of zImage is as follows:
defconfig       vexpress_defconfig      tinyconfig
5137712         5138024                 424192          no dce
5032560         4997824                 298384          dce
2.0%            2.7%                    29.7%           shrink

When using smaller config file, there is a significant reduction in the
size of the zImage.

We also tested this patch on a commercially available single-board
computer, and the comparison is as follows:
a15eb_config
2161384         no dce
2092240         dce
3.2%            shrink

The zImage size has been reduced by approximately 3.2%, which is 70KB on
2.1M.

Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-06-10 12:01:33 +01:00
Geert Uytterhoeven 8ede71e120 ARM: 9402/1: Kconfig: Spelling s/Cortex A-/Cortex-A/
Fix a misspelling of "Cortex-A9", to make it easier to find which errata
are applicable to Cortex-A9 CPU cores.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-06-10 12:01:31 +01:00
Linus Torvalds 61307b7be4 Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Russell King (Oracle) f698d314ee Merge branches 'amba', 'cfi', 'clkdev' and 'misc' into for-linus 2024-05-16 12:35:01 +01:00
Linus Walleij 1a4fec49ef ARM: 9392/2: Support CLANG CFI
Support Control Flow Integrity (CFI) when compiling with
CLANG.

In the as-of-writing LLVM CLANG implementation (v17)
the 32-bit ARM platform is supported by the generic CFI
implementation, which isn't tailored specifically for ARM32
but works well enough to enable the feature.

Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-04-29 14:14:23 +01:00
David Hildenbrand 25176ad09c mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST
Nowadays, we call it "GUP-fast", the external interface includes functions
like "get_user_pages_fast()", and we renamed all internal functions to
reflect that as well.

Let's make the config option reflect that.

Link: https://lkml.kernel.org/r/20240402125516.223131-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25 20:56:41 -07:00
Linus Walleij 7af5b901e8 ARM: 9358/2: Implement PAN for LPAE by TTBR0 page table walks disablement
With LPAE enabled, privileged no-access cannot be enforced using CPU
domains as such feature is not available. This patch implements PAN
by disabling TTBR0 page table walks while in kernel mode.

The ARM architecture allows page table walks to be split between TTBR0
and TTBR1. With LPAE enabled, the split is defined by a combination of
TTBCR T0SZ and T1SZ bits. Currently, an LPAE-enabled kernel uses TTBR0
for user addresses and TTBR1 for kernel addresses with the VMSPLIT_2G
and VMSPLIT_3G configurations. The main advantage for the 3:1 split is
that TTBR1 is reduced to 2 levels, so potentially faster TLB refill
(though usually the first level entries are already cached in the TLB).

The PAN support on LPAE-enabled kernels uses TTBR0 when running in user
space or in kernel space during user access routines (TTBCR T0SZ and
T1SZ are both 0). When running user accesses are disabled in kernel
mode, TTBR0 page table walks are disabled by setting TTBCR.EPD0. TTBR1
is used for kernel accesses (including loadable modules; anything
covered by swapper_pg_dir) by reducing the TTBCR.T0SZ to the minimum
(2^(32-7) = 32MB). To avoid user accesses potentially hitting stale TLB
entries, the ASID is switched to 0 (reserved) by setting TTBCR.A1 and
using the ASID value in TTBR1. The difference from a non-PAN kernel is
that with the 3:1 memory split, TTBR1 always uses 3 levels of page
tables.

As part of the change we are using preprocessor elif definied() clauses
so balance these clauses by converting relevant precedingt ifdef
clauses to if defined() clauses.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-04-18 12:10:46 +01:00
Linus Torvalds 02fb638bed Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:

 - remove a misuse of kernel-doc comment

 - use "Call trace:" for backtraces like other architectures

 - implement copy_from_kernel_nofault_allowed() to fix a LKDTM test

 - add a "cut here" line for prefetch aborts

 - remove unnecessary Kconfing entry for FRAME_POINTER

 - remove iwmmxy support for PJ4/PJ4B cores

 - use bitfield helpers in ptrace to improve readabililty

 - check if folio is reserved before flushing

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9359/1: flush: check if the folio is reserved for no-mapping addresses
  ARM: 9354/1: ptrace: Use bitfield helpers
  ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
  ARM: 9353/1: remove unneeded entry for CONFIG_FRAME_POINTER
  ARM: 9351/1: fault: Add "cut here" line for prefetch aborts
  ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed()
  ARM: 9349/1: unwind: Add missing "Call trace:" line
  ARM: 9334/1: mm: init: remove misuse of kernel-doc comment
2024-03-23 09:17:03 -07:00
Linus Torvalds 902861e34c Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable >0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
2024-03-14 17:43:30 -07:00
Linus Torvalds 216532e147 Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "As is pretty normal for this tree, there are changes all over the
  place, especially for small fixes, selftest improvements, and improved
  macro usability.

  Some header changes ended up landing via this tree as they depended on
  the string header cleanups. Also, a notable set of changes is the work
  for the reintroduction of the UBSAN signed integer overflow sanitizer
  so that we can continue to make improvements on the compiler side to
  make this sanitizer a more viable future security hardening option.

  Summary:

   - string.h and related header cleanups (Tanzir Hasan, Andy
     Shevchenko)

   - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
     Harshit Mogalapalli)

   - selftests/powerpc: Fix load_unaligned_zeropad build failure
     (Michael Ellerman)

   - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)

   - Handle tail call optimization better in LKDTM (Douglas Anderson)

   - Use long form types in overflow.h (Andy Shevchenko)

   - Add flags param to string_get_size() (Andy Shevchenko)

   - Add Coccinelle script for potential struct_size() use (Jacob
     Keller)

   - Fix objtool corner case under KCFI (Josh Poimboeuf)

   - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)

   - Add str_plural() helper (Michal Wajdeczko, Kees Cook)

   - Ignore relocations in .notes section

   - Add comments to explain how __is_constexpr() works

   - Fix m68k stack alignment expectations in stackinit Kunit test

   - Convert string selftests to KUnit

   - Add KUnit tests for fortified string functions

   - Improve reporting during fortified string warnings

   - Allow non-type arg to type_max() and type_min()

   - Allow strscpy() to be called with only 2 arguments

   - Add binary mode to leaking_addresses scanner

   - Various small cleanups to leaking_addresses scanner

   - Adding wrapping_*() arithmetic helper

   - Annotate initial signed integer wrap-around in refcount_t

   - Add explicit UBSAN section to MAINTAINERS

   - Fix UBSAN self-test warnings

   - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL

   - Reintroduce UBSAN's signed overflow sanitizer"

* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
  selftests/powerpc: Fix load_unaligned_zeropad build failure
  string: Convert helpers selftest to KUnit
  string: Convert selftest to KUnit
  sh: Fix build with CONFIG_UBSAN=y
  compiler.h: Explain how __is_constexpr() works
  overflow: Allow non-type arg to type_max() and type_min()
  VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
  lib/string_helpers: Add flags param to string_get_size()
  x86, relocs: Ignore relocations in .notes section
  objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
  overflow: Use POD in check_shl_overflow()
  lib: stackinit: Adjust target string to 8 bytes for m68k
  sparc: vdso: Disable UBSAN instrumentation
  kernel.h: Move lib/cmdline.c prototypes to string.h
  leaking_addresses: Provide mechanism to scan binary files
  leaking_addresses: Ignore input device status lines
  leaking_addresses: Use File::Temp for /tmp files
  MAINTAINERS: Update LEAKING_ADDRESSES details
  fortify: Improve buffer overflow reporting
  fortify: Add KUnit tests for runtime overflows
  ...
2024-03-12 14:49:30 -07:00
Arnd Bergmann 5394f1e9b6 arch: define CONFIG_PAGE_SIZE_*KB on all architectures
Most architectures only support a single hardcoded page size. In order
to ensure that each one of these sets the corresponding Kconfig symbols,
change over the PAGE_SHIFT definition to the common one and allow
only the hardware page size to be selected.

Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-06 19:29:09 +01:00
Ard Biesheuvel b9920fdd5a ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
PJ4 is a v7 core that incorporates a iWMMXt coprocessor. However, GCC
does not support this combination (its iWMMXt configuration always
implies v5te), and so there is no v6/v7 user space that actually makes
use of this, beyond generic support for things like setjmp() that
preserve/restore the iWMMXt register file using generic LDC/STC
instructions emitted in assembler.  As [0] appears to imply, this logic
is triggered for the init process at boot, and so most user threads will
have a iWMMXt register context associated with it, even though it is
never used.

At this point, it is highly unlikely that such GCC support will ever
materialize (and Clang does not implement support for iWMMXt to begin
with).

This means that advertising iWMMXt support on these cores results in
context switch overhead without any associated benefit, and so it is
better to simply ignore the iWMMXt unit on these systems. So rip out the
support. Doing so also fixes the issue reported in [0] related to UNDEF
handling of co-processor #0/#1 instructions issued from user space
running in Thumb2 mode.

The PJ4 cores are used in four platforms: Armada 370/xp, Dove (Cubox,
d2plug), MMP2 (xo-1.75) and Berlin (Google TV). Out of these, only the
first is still widely used, but that one actually doesn't have iWMMXt
but instead has only VFPV3-D16, and so it is not impacted by this
change.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218427 [0]

Fixes: 8bcba70cb5 ("ARM: entry: Disregard Thumb undef exception ...")
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-02-26 10:16:31 +00:00
Mathieu Desnoyers 8690bbcf3b Introduce cpu_dcache_is_aliasing() across all architectures
Introduce a generic way to query whether the data cache is virtually
aliased on all architectures. Its purpose is to ensure that subsystems
which are incompatible with virtually aliased data caches (e.g. FS_DAX)
can reliably query this.

For data cache aliasing, there are three scenarios dependending on the
architecture. Here is a breakdown based on my understanding:

A) The data cache is always aliasing:

* arc
* csky
* m68k (note: shared memory mappings are incoherent ? SHMLBA is missing there.)
* sh
* parisc

B) The data cache aliasing is statically known or depends on querying CPU
   state at runtime:

* arm (cache_is_vivt() || cache_is_vipt_aliasing())
* mips (cpu_has_dc_aliases)
* nios2 (NIOS2_DCACHE_SIZE > PAGE_SIZE)
* sparc32 (vac_cache_size > PAGE_SIZE)
* sparc64 (L1DCACHE_SIZE > PAGE_SIZE)
* xtensa (DCACHE_WAY_SIZE > PAGE_SIZE)

C) The data cache is never aliasing:

* alpha
* arm64 (aarch64)
* hexagon
* loongarch (but with incoherent write buffers, which are disabled since
             commit d23b7795 ("LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE"))
* microblaze
* openrisc
* powerpc
* riscv
* s390
* um
* x86

Require architectures in A) and B) to select ARCH_HAS_CPU_CACHE_ALIASING and
implement "cpu_dcache_is_aliasing()".

Architectures in C) don't select ARCH_HAS_CPU_CACHE_ALIASING, and thus
cpu_dcache_is_aliasing() simply evaluates to "false".

Note that this leaves "cpu_icache_is_aliasing()" to be implemented as future
work. This would be useful to gate features like XIP on architectures
which have aliasing CPU dcache-icache but not CPU dcache-dcache.

Use "cpu_dcache" and "cpu_cache" rather than just "dcache" and "cache"
to clarify that we really mean "CPU data cache" and "CPU cache" to
eliminate any possible confusion with VFS "dentry cache" and "page
cache".

Link: https://lore.kernel.org/lkml/20030910210416.GA24258@mail.jlokier.co.uk/
Link: https://lkml.kernel.org/r/20240215144633.96437-9-mathieu.desnoyers@efficios.com
Fixes: d92576f116 ("dax: does not work correctly with virtual aliasing caches")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:27:19 -08:00
Kees Cook 918327e9b7 ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL
For simplicity in splitting out UBSan options into separate rules,
remove CONFIG_UBSAN_SANITIZE_ALL, effectively defaulting to "y", which
is how it is generally used anyway. (There are no ":= y" cases beyond
where a specific file is enabled when a top-level ":= n" is in effect.)

Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-06 02:21:38 -08:00
Linus Torvalds c4c6044d35 Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:

 - add missing neon instructions for the neon support hook

 - arrange for davinci to select PINCTRL

 - try VMA lock-base page fault handling first

 - use memblock_alloc_try_nid_raw() for kasan shadow page

 - dma: use kvzalloc() rather than kzalloc()/vzalloc()

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9331/1: ARM/dma-mapping: replace kzalloc() and vzalloc() with kvzalloc()
  ARM: 9329/1: kasan: Use memblock_alloc_try_nid_raw for shadow page
  ARM: 9328/1: mm: try VMA lock-based page fault handling first
  ARM: 9330/1: davinci: also select PINCTRL
  ARM: 9327/1: vfp: Add missing VFP instructions to neon_support_hook
2024-01-17 11:34:45 -08:00
Linus Torvalds fb249b275c Merge tag 'soc-arm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC code updates from Arnd Bergmann:
 "There are two notable changes this time:

   - add a arch/arm/Kconfig.platforms file to simplify the platforms
     that have no code except their Kconfig file (Andrew Davis)

   - remove support for the ARM11MPCore CPU in the versatile/realview
     platform. Since this is the last remaining one after removing
     ox820, some core code can go as well (Linus Walleij)

  The other changes are minor cleanups and bugfixes"

* tag 'soc-arm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: davinci: always select CONFIG_CPU_ARM926T
  soc: pxa: ssp: fix casts
  ARM: debug: fix DEBUG_UNCOMPRESS help for !MULTIPLATFORM
  ARM: MAINTAINERS: drop empty entries for removed boards
  ARM: Delete ARM11MPCore perf leftovers
  ARM: mach-nspire: Rework support and directory structure
  ARM: mach-sunplus: Rework support and directory structure
  ARM: mach-airoha: Rework support and directory structure
  ARM: mach-moxart: Move MOXA ART support into Kconfig.platforms
  ARM: mach-uniphier: Move Socionext UniPhier support into Kconfig.platforms
  ARM: mach-rda: Move RDA Micro support into Kconfig.platforms
  ARM: mach-asm9260: Move ASM9260 support into Kconfig.platforms
  ARM: Kconfig: move platform selection into its own Kconfig file
  ARM: Delete ARM11MPCore (ARM11 ARMv6K SMP) support
  MAINTAINERS: add Marvell MBus driver to Marvell EBU SoCs support
  ARM: mxs: Do not search for "fsl,clkctrl"
  ARM: imx: Use device_get_match_data()
  MAINTAINERS: add omap bus drivers to OMAP2+ SUPPORT
  ARM: at91: pm: set soc_pm.data.mode in at91_pm_secure_init()
2024-01-11 11:42:53 -08:00
Kirill A. Shutemov 5e0a760b44 mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
commit 23baf831a3 ("mm, treewide: redefine MAX_ORDER sanely") has
changed the definition of MAX_ORDER to be inclusive.  This has caused
issues with code that was not yet upstream and depended on the previous
definition.

To draw attention to the altered meaning of the define, rename MAX_ORDER
to MAX_PAGE_ORDER.

Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08 15:27:15 -08:00
Andrew Davis 671c08ec00 ARM: mach-nspire: Rework support and directory structure
Having a platform need a mach-* directory should be seen as a negative,
it means the platform needs special non-standard handling. ARM64 support
does not allow mach-* directories at all. While we may not get to that
given all the non-standard architectures we support, we should still try
to get as close as we can and reduce the number of mach directories.

The mach-nspire/ directory and files, provides just one "feature":
having the kernel print the machine name if the DTB does not also contain
a "model" string (which they always do). To reduce the number of mach-*
directories let's do without that feature and remove this directory.

NOTE: The default l2c_aux_mask is now ~0 but these devices never have
this type of cache controller so this is safe.

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:23:30 +00:00
Andrew Davis ae73dadb12 ARM: mach-sunplus: Rework support and directory structure
Having a platform need a mach-* directory should be seen as a negative,
it means the platform needs special non-standard handling. ARM64 support
does not allow mach-* directories at all. While we may not get to that
given all the non-standard architectures we support, we should still try
to get as close as we can and reduce the number of mach directories.

The mach-sunplus/ directory and files, provides just one "feature":
having the kernel print the machine name if the DTB does not also contain
a "model" string (which they always do). To reduce the number of mach-*
directories let's do without that feature and remove this directory.

NOTE: The default l2c_aux_mask is now ~0 but these devices never have
this type of cache controller so this is safe.

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:28 +00:00
Andrew Davis 00e58c36d2 ARM: mach-airoha: Rework support and directory structure
Having a platform need a mach-* directory should be seen as a negative,
it means the platform needs special non-standard handling. ARM64 support
does not allow mach-* directories at all. While we may not get to that
given all the non-standard architectures we support, we should still try
to get as close as we can and reduce the number of mach directories.

The mach-airoha/ directory, and files within, provide just one "feature":
having the kernel print the machine name if the DTB does not also contain
a "model" string (which they always do). To reduce the number of mach-*
directories let's do without that feature and remove this directory.

It also seems there was a copy/paste error and the "MEDIATEK_DT"
name was re-used in the DT_MACHINE_START macro. This may have caused
conflicts if this was built in a multi-arch configuration.

NOTE: The default l2c_aux_mask is now ~0 but these devices never have
this type of cache controller so this is safe.

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:28 +00:00
Andrew Davis dcfbe025c2 ARM: mach-moxart: Move MOXA ART support into Kconfig.platforms
This removes the need for a dedicated Kconfig and empty mach directory.

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:28 +00:00
Andrew Davis 9391174157 ARM: mach-uniphier: Move Socionext UniPhier support into Kconfig.platforms
This removes the need for a dedicated Kconfig and empty mach directory.

Signed-off-by: Andrew Davis <afd@ti.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:27 +00:00
Andrew Davis 8b7776fe93 ARM: mach-rda: Move RDA Micro support into Kconfig.platforms
This removes the need for a dedicated Kconfig and empty mach directory.

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:27 +00:00
Andrew Davis b6ed480013 ARM: mach-asm9260: Move ASM9260 support into Kconfig.platforms
This removes the need for a dedicated Kconfig and mach directory.

Signed-off-by: Andrew Davis <afd@ti.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:27 +00:00
Andrew Davis 20e3ab9ecb ARM: Kconfig: move platform selection into its own Kconfig file
Mostly just for better organization for now. This matches what is done on
some other platforms including ARM64. This also lets us start to reduce
the number of mach- directories that only exist to store the platform
selection.

Start with "Platform selection" and ARCH_VIRT.

Signed-off-by: Andrew Davis <afd@ti.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-22 14:12:27 +00:00
Wang Kefeng c16af12124 ARM: 9328/1: mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails, the ebizzy benchmark
shows 25% improvement on qemu with 2 cpus.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2023-12-05 11:42:13 +00:00
Christoph Hellwig 2c8ed1b960 dma-direct: add a CONFIG_ARCH_HAS_DMA_ALLOC symbol
Instead of using arch_dma_alloc if none of the generic coherent
allocators are used, require the architectures to explicitly opt into
providing it.  This will used to deal with the case of m68knommu and
coldfire where we can't do any coherent allocations whatsoever, and
also makes it clear that arch_dma_alloc is a last resort.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Tested-by: Greg Ungerer <gerg@linux-m68k.org>
2023-10-22 16:38:54 +02:00
Randy Dunlap ef815d2cba treewide: drop CONFIG_EMBEDDED
There is only one Kconfig user of CONFIG_EMBEDDED and it can be switched
to EXPERT or "if !ARCH_MULTIPLATFORM" (suggested by Arnd).

Link: https://lkml.kernel.org/r/20230816055010.31534-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>	[RISC-V]
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:46:25 -07:00
Eric DeVolder 4183635e90 arm/kexec: refactor for kernel/Kconfig.kexec
The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_SUPPORTS_ and ARCH_SELECTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Link: https://lkml.kernel.org/r/20230712161545.87870-4-eric.devolder@oracle.com
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-18 10:18:52 -07:00
Linus Torvalds 6c1561fb90 Merge tag 'soc-dt-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC devicetree updates from Arnd Bergmann:
 "The biggest change this time is for the 32-bit devicetree files, which
  are all moved to a new location, using separate subdirectories for
  each SoC vendor, following the same scheme that is used on arm64, mips
  and riscv. This has been discussed for many years, but so far we never
  did this as there was a plan to move the files out of the kernel
  entirely, which has never happened.

  The impact of this will be that all external patches no longer apply,
  and anything depending on the location of the dtb files in the build
  directory will have to change. The installed files after 'make
  dtbs_install' keep the current location.

  There are six added SoCs here that are largely variants of previously
  added chips. Two other chips are added in a separate branch along with
  their device drivers.

   - The Samsung Exynos 4212 makes its return after the Samsung Galaxy
     Express phone is addded at last. The SoC support was originally
     added in 2012 but removed again in 2017 as it was unused at the
     time.

   - Amlogic C3 is a Cortex-A35 based smart IP camera chip

   - Qualcomm MSM8939 (Snapdragon 615) is a more featureful variant of
     the still common MSM8916 (Snapdragon 410) phone chip that has been
     supported for a long time.

   - Qualcomm SC8180x (Snapdragon 8cx) is one of their earlier high-end
     laptop chips, used in the Lenovo Flex 5G, which is added along with
     the reference board.

   - Qualcomm SDX75 is the latest generation modem chip that is used as
     a peripherial in phones but can also run a standalone Linux. Unlike
     the prior 32-bit SDX65 and SDX55, this now has a 64-bit Cortex-A55.

   - Alibaba T-Head TH1520 is a quad-core RISC-V chip based on the
     Xuantie C910 core, a step up from all previously added rv64 chips.

  All of the above come with reference board implementations, those
  included there are 39 new board files, but only five more 32-bit this
  time, probably a new low:

   - Marantec Maveo board based on dhcor imx6ull module

   - Endian 4i Edge 200, based on the armv5 Marvell Kirkwood chip

   - Epson Moverio BT-200 AR glasses based on TI OMAP4

   - PHYTEC STM32MP1-3 Dev board based on STM32MP15 PHYTEC SOM

   - ICnova ADB4006 board based on Allwinner A20

  On the 64-bit side, there are also fewer addded machines than we had
  in the recent releases:

   - Three boards based on NXP i.MX8: Emtop SoM & Baseboard, NXP i.MX8MM
     EVKB board and i.MX8MP based Gateworks Venice gw7905-2x device.

   - NVIDIA IGX Orin and Jetson Orin Nano boards, both based on tegra234

   - Qualcomm gains support for 6 reference boards on various members of
     their IPQ networking SoC series, as well as the Sony Xperia M4 Aqua
     phone, the Acer Aspire 1 laptop, and the Fxtec Pro1X board on top
     of the various reference platforms for their new chips.

   - Rockchips support for several newer boards: Indiedroid Nova
     (rk3588), Edgeble Neural Compute Module 6B (rk3588), FriendlyARM
     NanoPi R2C Plus (rk3328), Anbernic RG353PS (rk3566), Lunzn
     Fastrhino R66S/R68S (rk3568)

   - TI K3/AM625 based PHYTEC phyBOARD-Lyra-AM625 board and Toradex
     Verdin family with AM62 COM, carrier and dev boards

  Other changes to existing boards contain the usual minor improvements
  along with

   - continued updates to clean up dts files based on dtc warnings and
     binding checks, in particular cache properties and node names

   - support for devicetree overlays on at91, bcm283x

   - significant additions to existing SoC support on mediatek,
     qualcomm, ti k3 family, starfive jh71xx, NXP i.MX6 and i.MX8, ST
     STM32MP1

  As usual, a lot more detail is available in the individual merge
  commits"

* tag 'soc-dt-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (926 commits)
  ARM: mvebu: fix unit address on armada-390-db flash
  ARM: dts: Move .dts files to vendor sub-directories
  kbuild: Support flat DTBs install
  ARM: dts: Add .dts files missing from the build
  ARM: dts: allwinner: Use quoted #include
  ARM: dts: lan966x: kontron-d10: add PHY interrupts
  ARM: dts: lan966x: kontron-d10: fix SPI CS
  ARM: dts: lan966x: kontron-d10: fix board reset
  ARM: dts: at91: Enable device-tree overlay support for AT91 boards
  arm: dts: Enable device-tree overlay support for AT91 boards
  arm64: dts: exynos: Remove clock from Exynos850 pmu_system_controller
  ARM: dts: at91: use generic name for shutdown controller
  ARM: dts: BCM5301X: Add cells sizes to PCIe nodes
  dt-bindings: firmware: brcm,kona-smc: convert to YAML
  riscv: dts: sort makefile entries by directory
  riscv: defconfig: enable T-HEAD SoC
  MAINTAINERS: add entry for T-HEAD RISC-V SoC
  riscv: dts: thead: add sipeed Lichee Pi 4A board device tree
  riscv: dts: add initial T-HEAD TH1520 SoC device tree
  riscv: Add the T-HEAD SoC family Kconfig option
  ...
2023-06-29 15:07:06 -07:00
Linus Torvalds 9471f1f2f5 Merge branch 'expand-stack'
This modifies our user mode stack expansion code to always take the
mmap_lock for writing before modifying the VM layout.

It's actually something we always technically should have done, but
because we didn't strictly need it, we were being lazy ("opportunistic"
sounds so much better, doesn't it?) about things, and had this hack in
place where we would extend the stack vma in-place without doing the
proper locking.

And it worked fine.  We just needed to change vm_start (or, in the case
of grow-up stacks, vm_end) and together with some special ad-hoc locking
using the anon_vma lock and the mm->page_table_lock, it all was fairly
straightforward.

That is, it was all fine until Ruihan Li pointed out that now that the
vma layout uses the maple tree code, we *really* don't just change
vm_start and vm_end any more, and the locking really is broken.  Oops.

It's not actually all _that_ horrible to fix this once and for all, and
do proper locking, but it's a bit painful.  We have basically three
different cases of stack expansion, and they all work just a bit
differently:

 - the common and obvious case is the page fault handling. It's actually
   fairly simple and straightforward, except for the fact that we have
   something like 24 different versions of it, and you end up in a maze
   of twisty little passages, all alike.

 - the simplest case is the execve() code that creates a new stack.
   There are no real locking concerns because it's all in a private new
   VM that hasn't been exposed to anybody, but lockdep still can end up
   unhappy if you get it wrong.

 - and finally, we have GUP and page pinning, which shouldn't really be
   expanding the stack in the first place, but in addition to execve()
   we also use it for ptrace(). And debuggers do want to possibly access
   memory under the stack pointer and thus need to be able to expand the
   stack as a special case.

None of these cases are exactly complicated, but the page fault case in
particular is just repeated slightly differently many many times.  And
ia64 in particular has a fairly complicated situation where you can have
both a regular grow-down stack _and_ a special grow-up stack for the
register backing store.

So to make this slightly more manageable, the bulk of this series is to
first create a helper function for the most common page fault case, and
convert all the straightforward architectures to it.

Thus the new 'lock_mm_and_find_vma()' helper function, which ends up
being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa.  So we not only convert more
than half the architectures, we now have more shared code and avoid some
of those twisty little passages.

And largely due to this common helper function, the full diffstat of
this series ends up deleting more lines than it adds.

That still leaves eight architectures (ia64, m68k, microblaze, openrisc,
parisc, s390, sparc64 and um) that end up doing 'expand_stack()'
manually because they are doing something slightly different from the
normal pattern.  Along with the couple of special cases in execve() and
GUP.

So there's a couple of patches that first create 'locked' helper
versions of the stack expansion functions, so that there's a obvious
path forward in the conversion.  The execve() case is then actually
pretty simple, and is a nice cleanup from our old "grow-up stackls are
special, because at execve time even they grow down".

The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because
it's just more straightforward to write out the stack expansion there
manually, instead od having get_user_pages_remote() do it for us in some
situations but not others and have to worry about locking rules for GUP.

And the final step is then to just convert the remaining odd cases to a
new world order where 'expand_stack()' is called with the mmap_lock held
for reading, but where it might drop it and upgrade it to a write, only
to return with it held for reading (in the success case) or with it
completely dropped (in the failure case).

In the process, we remove all the stack expansion from GUP (where
dropping the lock wouldn't be ok without special rules anyway), and add
it in manually to __access_remote_vm() for ptrace().

Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases.
Everything else here felt pretty straightforward, but the ia64 rules for
stack expansion are really quite odd and very different from everything
else.  Also thanks to Vegard Nossum who caught me getting one of those
odd conditions entirely the wrong way around.

Anyway, I think I want to actually move all the stack expansion code to
a whole new file of its own, rather than have it split up between
mm/mmap.c and mm/memory.c, but since this will have to be backported to
the initial maple tree vma introduction anyway, I tried to keep the
patches _fairly_ minimal.

Also, while I don't think it's valid to expand the stack from GUP, the
final patch in here is a "warn if some crazy GUP user wants to try to
expand the stack" patch.  That one will be reverted before the final
release, but it's left to catch any odd cases during the merge window
and release candidates.

Reported-by: Ruihan Li <lrh2000@pku.edu.cn>

* branch 'expand-stack':
  gup: add warning if some caller would seem to want stack expansion
  mm: always expand the stack with the mmap write lock held
  execve: expand new process stack manually ahead of time
  mm: make find_extend_vma() fail if write lock not held
  powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
  mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
  arm/mm: Convert to using lock_mm_and_find_vma()
  riscv/mm: Convert to using lock_mm_and_find_vma()
  mips/mm: Convert to using lock_mm_and_find_vma()
  powerpc/mm: Convert to using lock_mm_and_find_vma()
  arm64/mm: Convert to using lock_mm_and_find_vma()
  mm: make the page fault mmap locking killable
  mm: introduce new 'lock_mm_and_find_vma()' page fault helper
2023-06-28 20:35:21 -07:00
Linus Torvalds 04fc8904d5 Merge tag 'docs-arm-move' of git://git.lwn.net/linux
Pull arm documentation move from Jonathan Corbet:
 "Move the Arm architecture documentation under Documentation/arch/.

  This brings some order to the documentation directory, declutters the
  top-level directory, and makes the documentation organization more
  closely match that of the source"

* tag 'docs-arm-move' of git://git.lwn.net/linux:
  dt-bindings: Update Documentation/arm references
  docs: update some straggling Documentation/arm references
  crypto: update some Arm documentation references
  mips: update a reference to a moved Arm Document
  arm64: Update Documentation/arm references
  arm: update in-source documentation references
  arm: docs: Move Arm documentation to Documentation/arch/
2023-06-27 11:58:16 -07:00
Linus Torvalds 9244724fbf Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
2023-06-26 13:59:56 -07:00
Ben Hutchings 8b35ca3e45 arm/mm: Convert to using lock_mm_and_find_vma()
arm has an additional check for address < FIRST_USER_ADDRESS before
expanding the stack.  Since FIRST_USER_ADDRESS is defined everywhere
(generally as 0), move that check to the generic expand_downwards().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24 14:12:58 -07:00
Rob Herring 6a1d798feb kbuild: Support flat DTBs install
In preparation to move Arm .dts files into sub-directories grouped
by vendor/family, the current flat tree of DTBs generated by
dtbs_install needs to be maintained. Moving the installed DTBs to
sub-directories would break various consumers using 'make dtbs_install'.

This is a NOP until sub-directories are introduced.

Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-21 07:51:08 -06:00
Thomas Gleixner ee31bb0524 ARM: cpu: Switch to arch_cpu_finalize_init()
check_bugs() is about to be phased out. Switch over to the new
arch_cpu_finalize_init() implementation.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.078124882@linutronix.de
2023-06-16 10:15:59 +02:00