linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-24 10:49:54 +02:00

Author	SHA1	Message	Date
Gao Xiang	5244536e65	erofs: fix rare pcluster memory leak after unmounting commit `b10a1e5643` upstream. There may still exist some pcluster with valid reference counts during unmounting. Instead of introducing another synchronization primitive, just try again as unmounting is relatively rare. This approach is similar to z_erofs_cache_invalidate_folio(). It was also reported by syzbot as a UAF due to commit `f5ad9f9a60` ("erofs: free pclusters if no cached folio is attached"): BUG: KASAN: slab-use-after-free in do_raw_spin_trylock+0x72/0x1f0 kernel/locking/spinlock_debug.c:123 .. queued_spin_trylock include/asm-generic/qspinlock.h:92 [inline] do_raw_spin_trylock+0x72/0x1f0 kernel/locking/spinlock_debug.c:123 __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] _raw_spin_trylock+0x20/0x80 kernel/locking/spinlock.c:138 spin_trylock include/linux/spinlock.h:361 [inline] z_erofs_put_pcluster fs/erofs/zdata.c:959 [inline] z_erofs_decompress_pcluster fs/erofs/zdata.c:1403 [inline] z_erofs_decompress_queue+0x3798/0x3ef0 fs/erofs/zdata.c:1425 z_erofs_decompressqueue_work+0x99/0xe0 fs/erofs/zdata.c:1437 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa68/0x1840 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f2/0x390 kernel/kthread.c:389 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> However, it seems a long outstanding memory leak. Fix it now. Fixes: `f5ad9f9a60` ("erofs: free pclusters if no cached folio is attached") Reported-by: syzbot+7ff87b095e7ca0c5ac39@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/674c1235.050a0220.ad585.0032.GAE@google.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241203072821.1885740-1-hsiangkao@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-17 18:37:23 +02:00
Chao Yu	fd67f52eea	erofs: fix to add missing tracepoint in erofs_readahead() [ Upstream commit `d53238b614` ] Commit `771c994ea5` ("erofs: convert all uncompressed cases to iomap") converts to use iomap interface, it removed trace_erofs_readahead() tracepoint in the meantime, let's add it back. Fixes: `771c994ea5` ("erofs: convert all uncompressed cases to iomap") Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250707084832.2725677-1-chao@kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-17 18:37:18 +02:00
Gao Xiang	ea2350dfa3	erofs: refine readahead tracepoint [ Upstream commit `4eb56b0761` ] - trace_erofs_readpages => trace_erofs_readahead; - Rename a redundant statement `nrpages = readahead_count(rac);`; - Move the tracepoint to the beginning of z_erofs_readahead(). Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250514120820.2739288-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Stable-dep-of: `d53238b614` ("erofs: fix to add missing tracepoint in erofs_readahead()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-17 18:37:18 +02:00
Gao Xiang	9493b5f9ad	erofs: tidy up zdata.c [ Upstream commit `6f435e94a1` ] All small code style adjustments, no logic changes: - z_erofs_decompress_frontend => z_erofs_frontend; - z_erofs_decompress_backend => z_erofs_backend; - Use Z_EROFS_DEFINE_FRONTEND() to replace DECOMPRESS_FRONTEND_INIT(); - `nr_folios` should be `nrpages` in z_erofs_readahead(); - Refine in-line comments. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250114034429.431408-3-hsiangkao@linux.alibaba.com Stable-dep-of: `d53238b614` ("erofs: fix to add missing tracepoint in erofs_readahead()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-17 18:37:18 +02:00
Gao Xiang	71e4f033a9	erofs: get rid of `z_erofs_next_pcluster_t` [ Upstream commit `5514d8478b` ] It was originally intended for tagged pointer reservation. Now all encoded data can be represented uniformally with `struct z_erofs_pcluster` as described in commit `bf1aa03980` ("erofs: sunset `struct erofs_workgroup`"), let's drop it too. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250114034429.431408-2-hsiangkao@linux.alibaba.com Stable-dep-of: `d53238b614` ("erofs: fix to add missing tracepoint in erofs_readahead()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-17 18:37:18 +02:00
Chunhai Guo	16396885c2	erofs: free pclusters if no cached folio is attached [ Upstream commit `f5ad9f9a60` ] Once a pcluster is fully decompressed and there are no attached cached folios, its corresponding `struct z_erofs_pcluster` will be freed. This will significantly reduce the frequency of calls to erofs_shrink_scan() and the memory allocated for `struct z_erofs_pcluster`. The tables below show approximately a 96% reduction in the calls to erofs_shrink_scan() and in the memory allocated for `struct z_erofs_pcluster` after applying this patch. The results were obtained by performing a test to copy a 4.1GB partition on ARM64 Android devices running the 6.6 kernel with an 8-core CPU and 12GB of memory. 1. The reduction in calls to erofs_shrink_scan(): +-----------------+-----------+----------+---------+ \| \| w/o patch \| w/ patch \| diff \| +-----------------+-----------+----------+---------+ \| Average (times) \| 11390 \| 390 \| -96.57% \| +-----------------+-----------+----------+---------+ 2. The reduction in memory released by erofs_shrink_scan(): +-----------------+-----------+----------+---------+ \| \| w/o patch \| w/ patch \| diff \| +-----------------+-----------+----------+---------+ \| Average (Byte) \| 133612656 \| 4434552 \| -96.68% \| +-----------------+-----------+----------+---------+ Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241112043235.546164-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Stable-dep-of: `d53238b614` ("erofs: fix to add missing tracepoint in erofs_readahead()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-17 18:37:17 +02:00
Gao Xiang	19ff875dc5	erofs: address D-cache aliasing commit `27917e8194` upstream. Flush the D-cache before unlocking folios for compressed inodes, as they are dirtied during decompression. Avoid calling flush_dcache_folio() on every CPU write, since it's more like playing whack-a-mole without real benefit. It has no impact on x86 and arm64/risc-v: on x86, flush_dcache_folio() is a no-op, and on arm64/risc-v, PG_dcache_clean (PG_arch_1) is clear for new page cache folios. However, certain ARM boards are affected, as reported. Fixes: `3883a79abd` ("staging: erofs: introduce VLE decompression support") Closes: https://lore.kernel.org/r/c1e51e16-6cc6-49d0-a63e-4e9ff6c4dd53@pengutronix.de Closes: https://lore.kernel.org/r/38d43fae-1182-4155-9c5b-ffc7382d9917@siemens.com Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Stefan Kerkmann <s.kerkmann@pengutronix.de> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250709034614.2780117-2-hsiangkao@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-17 18:37:15 +02:00
Chao Yu	4745bfd34a	erofs: fix to add missing tracepoint in erofs_read_folio() commit `99f7619a77` upstream. Commit `771c994ea5` ("erofs: convert all uncompressed cases to iomap") converts to use iomap interface, it removed trace_erofs_readpage() tracepoint in the meantime, let's add it back. Fixes: `771c994ea5` ("erofs: convert all uncompressed cases to iomap") Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250708111942.3120926-1-chao@kernel.org Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-17 18:37:15 +02:00
Al Viro	e036efbe58	add a string-to-qstr constructor [ Upstream commit `c1feab95e0` ] Quite a few places want to build a struct qstr by given string; it would be convenient to have a primitive doing that, rather than open-coding it via QSTR_INIT(). The closest approximation was in bcachefs, but that expands to initializer list - {.len = strlen(string), .name = string}. It would be more useful to have it as compound literal - (struct qstr){.len = strlen(string), .name = string}. Unlike initializer list it's a valid expression. What's more, it's a valid lvalue - it's an equivalent of anonymous local variable with such initializer, so the things like path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name)); are valid. It can also be used as initializer, with identical effect - struct qstr x = (struct qstr){.name = s, .len = strlen(s)}; is equivalent to struct qstr anon_variable = {.name = s, .len = strlen(s)}; struct qstr x = anon_variable; // anon_variable is never used after that point and any even remotely sane compiler will manage to collapse that into struct qstr x = {.name = s, .len = strlen(s)}; What compound literals can't be used for is initialization of global variables, but those are covered by QSTR_INIT(). This commit lifts definition(s) of QSTR() into linux/dcache.h, converts it to compound literal (all bcachefs users are fine with that) and converts assorted open-coded instances to using that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: `cbe4134ea4` ("fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-10 16:05:08 +02:00
Sheng Yong	65115472f7	erofs: avoid using multiple devices with different type [ Upstream commit `9748f2f54f` ] For multiple devices, both primary and extra devices should be the same type. `erofs_init_device` has already guaranteed that if the primary is a file-backed device, extra devices should also be regular files. However, if the primary is a block device while the extra device is a file-backed device, `erofs_init_device` will get an ENOTBLK, which is not treated as an error in `erofs_fc_get_tree`, and that leads to an UAF: erofs_fc_get_tree get_tree_bdev_flags(erofs_fc_fill_super) erofs_read_superblock erofs_init_device // sbi->dif0 is not inited yet, // return -ENOTBLK deactivate_locked_super free(sbi) if (err is -ENOTBLK) sbi->dif0.file = filp_open() // sbi UAF So if -ENOTBLK is hitted in `erofs_init_device`, it means the primary device must be a block device, and the extra device is not a block device. The error can be converted to -EINVAL. Fixes: `fb17675026` ("erofs: add file-backed mount support") Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250515014837.3315886-1-shengyong1@xiaomi.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:29 +02:00
Hongbo Li	9cfca45aec	erofs: fix file handle encoding for 64-bit NIDs [ Upstream commit `510de8363f` ] EROFS uses NID to indicate the on-disk inode offset, which can exceed 32 bits. However, the default encode_fh uses the ino32, thus it doesn't work if the image is larger than 128GiB. Let's introduce our own helpers to encode file handles. It's easy to reproduce: 1. prepare an erofs image with nid bigger than U32_MAX 2. mount -t erofs foo.img /mnt/erofs 3. set exportfs with configuration: /mnt/erofs *(rw,sync, no_root_squash) 4. mount -t nfs $IP:/mnt/erofs /mnt/nfs 5. md5sum /mnt/nfs/foo # foo is the file which nid bigger than U32_MAX. # you will get ESTALE error. In the case of overlayfs, the underlying filesystem's file handle is encoded in ovl_fb.fid, which is similar to NFS's case. If the NID of file is larger than U32_MAX, the overlay will get -ESTALE error when calls exportfs_decode_fh. Fixes: `3e917cc305` ("erofs: make filesystem exportable") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250507094015.14007-1-lihongbo22@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-19 15:31:29 +02:00
Gao Xiang	b35ccfdc85	erofs: initialize decompression early [ Upstream commit `fe1e57d44d` ] - Rename erofs_init_managed_cache() to z_erofs_init_super(); - Move the initialization of managed_pslots into z_erofs_init_super() too; - Move z_erofs_init_super() and packed inode preparation upwards, before the root inode initialization. Therefore, the root directory can also be compressible. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20250317054840.3483000-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-29 11:02:16 +02:00
Gao Xiang	64385c0d02	erofs: ensure the extra temporary copy is valid for shortened bvecs [ Upstream commit `35076d2223` ] When compressed data deduplication is enabled, multiple logical extents may reference the same compressed physical cluster. The previous commit `94c43de735` ("erofs: fix wrong primary bvec selection on deduplicated extents") already avoids using shortened bvecs. However, in such cases, the extra temporary buffers also need to be preserved for later use in z_erofs_fill_other_copies() to to prevent data corruption. IOWs, extra temporary buffers have to be retained not only due to varying start relative offsets (`pageofs_out`, as indicated by `pcl->multibases`) but also because of shortened bvecs. android.hardware.graphics.composer@2.1.so : 270696 bytes 0: 0.. 204185 \| 204185 : 628019200.. 628084736 \| 65536 -> 1: 204185.. 225536 \| 21351 : 544063488.. 544129024 \| 65536 2: 225536.. 270696 \| 45160 : 0.. 0 \| 0 com.android.vndk.v28.apex : 93814897 bytes ... 364: 53869896..54095257 \| 225361 : 543997952.. 544063488 \| 65536 -> 365: 54095257..54309344 \| 214087 : 544063488.. 544129024 \| 65536 366: 54309344..54514557 \| 205213 : 544129024.. 544194560 \| 65536 ... Both 204185 and 54095257 have the same start relative offset of 3481, but the logical page 55 of `android.hardware.graphics.composer@2.1.so` ranges from 225280 to 229632, forming a shortened bvec [225280, 225536) that cannot be used for decompressing the range from 54095257 to 54309344 of `com.android.vndk.v28.apex`. Since `pcl->multibases` is already meaningless, just mark `be->keepxcpy` on demand for simplicity. Again, this issue can only lead to data corruption if `-Ededupe` is on. Fixes: `94c43de735` ("erofs: fix wrong primary bvec selection on deduplicated extents") Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250506101850.191506-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-18 08:24:48 +02:00
Max Kellermann	61e0fc3312	fs/erofs/fileio: call erofs_onlinefolio_split() after bio_add_folio() commit `bbfe756dc3` upstream. If bio_add_folio() fails (because it is full), erofs_fileio_scan_folio() needs to submit the I/O request via erofs_fileio_rq_submit() and allocate a new I/O request with an empty `struct bio`. Then it retries the bio_add_folio() call. However, at this point, erofs_onlinefolio_split() has already been called which increments `folio->private`; the retry will call erofs_onlinefolio_split() again, but there will never be a matching erofs_onlinefolio_end() call. This leaves the folio locked forever and all waiters will be stuck in folio_wait_bit_common(). This bug has been added by commit `ce63cb62d7` ("erofs: support unencoded inodes for fileio"), but was practically unreachable because there was room for 256 folios in the `struct bio` - until commit `9f74ae8c9a` ("erofs: shorten bvecs[] for file-backed mounts") which reduced the array capacity to 16 folios. It was now trivial to trigger the bug by manually invoking readahead from userspace, e.g.: posix_fadvise(fd, 0, st.st_size, POSIX_FADV_WILLNEED); This should be fixed by invoking erofs_onlinefolio_split() only after bio_add_folio() has succeeded. This is safe: asynchronous completions invoking erofs_onlinefolio_end() will not unlock the folio because erofs_fileio_scan_folio() is still holding a reference to be released by erofs_onlinefolio_end() at the end. Fixes: `ce63cb62d7` ("erofs: support unencoded inodes for fileio") Fixes: `9f74ae8c9a` ("erofs: shorten bvecs[] for file-backed mounts") Cc: stable@vger.kernel.org Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Tested-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250428230933.3422273-1-max.kellermann@ionos.com Signed-off-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-05-18 08:24:46 +02:00
Sheng Yong	7b9bdd7059	erofs: set error to bio if file-backed IO fails [ Upstream commit `1595f15391` ] If a file-backed IO fails before submitting the bio to the lower filesystem, an error is returned, but the bio->bi_status is not marked as an error. However, the error information should be passed to the end_io handler. Otherwise, the IO request will be treated as successful. Fixes: `283213718f` ("erofs: support compressed inodes for fileio") Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250408122351.2104507-1-shengyong1@xiaomi.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-04-20 10:15:28 +02:00
Gao Xiang	4896402780	erofs: fix potential return value overflow of z_erofs_shrink_scan() [ Upstream commit `db902986de` ] z_erofs_shrink_scan() could return small numbers due to the mistyped `freed`. Although I don't think it has any visible impact. Fixes: `3883a79abd` ("staging: erofs: introduce VLE decompression support") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250114040058.459981-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:57:57 +01:00
Gao Xiang	9621a0a5e3	erofs: sunset `struct erofs_workgroup` [ Upstream commit `bf1aa03980` ] `struct erofs_workgroup` was introduced to provide a unique header for all physically indexed objects. However, after big pclusters and shared pclusters are implemented upstream, it seems that all EROFS encoded data (which requires transformation) can be represented with `struct z_erofs_pcluster` directly. Move all members into `struct z_erofs_pcluster` for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241021035323.3280682-3-hsiangkao@linux.alibaba.com Stable-dep-of: `db902986de` ("erofs: fix potential return value overflow of z_erofs_shrink_scan()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:57:57 +01:00
Gao Xiang	f66ba30be7	erofs: move erofs_workgroup operations into zdata.c [ Upstream commit `9c91f95962` ] Move related helpers into zdata.c as an intermediate step of getting rid of `struct erofs_workgroup`, and rename: erofs_workgroup_put => z_erofs_put_pcluster erofs_workgroup_get => z_erofs_get_pcluster erofs_try_to_release_workgroup => erofs_try_to_release_pcluster erofs_shrink_workstation => z_erofs_shrink_scan Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241021035323.3280682-2-hsiangkao@linux.alibaba.com Stable-dep-of: `db902986de` ("erofs: fix potential return value overflow of z_erofs_shrink_scan()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:57:57 +01:00
Gao Xiang	e6d1529c79	erofs: get rid of erofs_{find,insert}_workgroup [ Upstream commit `b091e8ed24` ] Just fold them into the only two callers since they are simple enough. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241021035323.3280682-1-hsiangkao@linux.alibaba.com Stable-dep-of: `db902986de` ("erofs: fix potential return value overflow of z_erofs_shrink_scan()") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:57:57 +01:00
Gao Xiang	3042448109	erofs: use buffered I/O for file-backed mounts by default [ Upstream commit `6422cde1b0` ] For many use cases (e.g. container images are just fetched from remote), performance will be impacted if underlay page cache is up-to-date but direct i/o flushes dirty pages first. Instead, let's use buffered I/O by default to keep in sync with loop devices and add a (re)mount option to explicitly give a try to use direct I/O if supported by the underlying files. The container startup time is improved as below: [workload] docker.io/library/workpress:latest unpack 1st run non-1st runs EROFS snapshotter buffered I/O file 4.586404265s 0.308s 0.198s EROFS snapshotter direct I/O file 4.581742849s 2.238s 0.222s EROFS snapshotter loop 4.596023152s 0.346s 0.201s Overlayfs snapshotter 5.382851037s 0.206s 0.214s Fixes: `fb17675026` ("erofs: add file-backed mount support") Cc: Derek McGowan <derek@mcg.dev> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241212134336.2059899-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 14:02:00 +01:00
Gao Xiang	f067d3f69d	erofs: reference `struct erofs_device_info` for erofs_map_dev [ Upstream commit `f8d920a402` ] Record `m_sb` and `m_dif` to replace `m_fscache`, `m_daxdev`, `m_fp` and `m_dax_part_off` in order to simplify the codebase. Note that `m_bdev` is still left since it can be assigned from `sb->s_bdev` directly. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241212235401.2857246-1-hsiangkao@linux.alibaba.com Stable-dep-of: `6422cde1b0` ("erofs: use buffered I/O for file-backed mounts by default") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 14:02:00 +01:00
Gao Xiang	3e0d81efcb	erofs: use `struct erofs_device_info` for the primary device [ Upstream commit `7b00af2c54` ] Instead of just listing each one directly in `struct erofs_sb_info` except that we still use `sb->s_bdev` for the primary block device. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241216125310.930933-2-hsiangkao@linux.alibaba.com Stable-dep-of: `6422cde1b0` ("erofs: use buffered I/O for file-backed mounts by default") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 14:02:00 +01:00
Gao Xiang	910798ecd3	erofs: add erofs_sb_free() helper [ Upstream commit `e2de3c1bf6` ] Unify the common parts of erofs_fc_free() and erofs_kill_sb() as erofs_sb_free(). Thus, fput() in erofs_fc_get_tree() is no longer needed, too. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241212133504.2047178-1-hsiangkao@linux.alibaba.com Stable-dep-of: `6422cde1b0` ("erofs: use buffered I/O for file-backed mounts by default") Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 14:02:00 +01:00
Gao Xiang	0653fa6ee0	erofs: fix PSI memstall accounting [ Upstream commit `1a2180f685` ] Max Kellermann recently reported psi_group_cpu.tasks[NR_MEMSTALL] is incorrect in the 6.11.9 kernel. The root cause appears to be that, since the problematic commit, bio can be NULL, causing psi_memstall_leave() to be skipped in z_erofs_submit_queue(). Reported-by: Max Kellermann <max.kellermann@ionos.com> Closes: https://lore.kernel.org/r/CAKPOu+8tvSowiJADW2RuKyofL_CSkm_SuyZA7ME5vMLWmL6pqw@mail.gmail.com Fixes: `9e2f9d34dd` ("erofs: handle overlapped pclusters out of crafted images properly") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241127085236.3538334-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-27 14:01:59 +01:00
Gao Xiang	daaf68fef4	erofs: handle NONHEAD !delta[1] lclusters gracefully [ Upstream commit `0bc8061ffc` ] syzbot reported a WARNING in iomap_iter_done: iomap_fiemap+0x73b/0x9b0 fs/iomap/fiemap.c:80 ioctl_fiemap fs/ioctl.c:220 [inline] Generally, NONHEAD lclusters won't have delta[1]==0, except for crafted images and filesystems created by pre-1.0 mkfs versions. Previously, it would immediately bail out if delta[1]==0, which led to inadequate decompressed lengths (thus FIEMAP is impacted). Treat it as delta[1]=1 to work around these legacy mkfs versions. `lclusterbits > 14` is illegal for compact indexes, error out too. Reported-by: syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/67373c0c.050a0220.2a2fcc.0079.GAE@google.com Tested-by: syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com Fixes: `d95ae5e253` ("erofs: add support for the full decompressed length") Fixes: `001b8ccd06` ("erofs: fix compact 4B support for 16k block size") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241115173651.3339514-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:02:00 +01:00
Hongzhen Luo	679d8537e5	erofs: fix blksize < PAGE_SIZE for file-backed mounts [ Upstream commit `bae0854160` ] Adjust sb->s_blocksize{,_bits} directly for file-backed mounts when the fs block size is smaller than PAGE_SIZE. Previously, EROFS used sb_set_blocksize(), which caused a panic if bdev-backed mounts is not used. Fixes: `fb17675026` ("erofs: add file-backed mount support") Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Link: https://lore.kernel.org/r/20241015103836.3757438-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:02:00 +01:00
Gao Xiang	5036f2f024	erofs: fix file-backed mounts over FUSE [ Upstream commit `3a23787ca8` ] syzbot reported a null-ptr-deref in fuse_read_args_fill: fuse_read_folio+0xb0/0x100 fs/fuse/file.c:905 filemap_read_folio+0xc6/0x2a0 mm/filemap.c:2367 do_read_cache_folio+0x263/0x5c0 mm/filemap.c:3825 read_mapping_folio include/linux/pagemap.h:1011 [inline] erofs_bread+0x34d/0x7e0 fs/erofs/data.c:41 erofs_read_superblock fs/erofs/super.c:281 [inline] erofs_fc_fill_super+0x2b9/0x2500 fs/erofs/super.c:625 Unlike most filesystems, some network filesystems and FUSE need unavoidable valid `file` pointers for their read I/Os [1]. Anyway, those use cases need to be supported too. [1] https://docs.kernel.org/filesystems/vfs.html Reported-by: syzbot+0b1279812c46e48bb0c1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/6727bbdf.050a0220.3c8d68.0a7e.GAE@google.com Fixes: `fb17675026` ("erofs: add file-backed mount support") Tested-by: syzbot+0b1279812c46e48bb0c1@syzkaller.appspotmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241114234905.1873723-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-05 14:02:00 +01:00
Gao Xiang	14c2d97265	erofs: use get_tree_bdev_flags() to avoid misleading messages Users can pass in an arbitrary source path for the proper type of a mount then without "Can't lookup blockdev" error message. Reported-by: Allison Karlitskaya <allison.karlitskaya@redhat.com> Closes: https://lore.kernel.org/r/CAOYeF9VQ8jKVmpy5Zy9DNhO6xmWSKMB-DO8yvBB0XvBE7=3Ugg@mail.gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241009033151.2334888-2-hsiangkao@linux.alibaba.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-21 14:30:27 +02:00
Linus Torvalds	63fa605041	Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "The main one fixes a syzbot issue due to the invalid inode type out of file-backed mounts. The others are minor cleanups without actual logic changes. Summary: - Make sure only regular inodes can be used for file-backed mounts - Two minor codebase cleanups" * tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: get rid of kaddr in `struct z_erofs_maprecorder` erofs: get rid of z_erofs_try_to_claim_pcluster() erofs: ensure regular inodes for file-backed mounts	2024-10-14 11:12:09 -07:00
Gao Xiang	ae54567eaa	erofs: get rid of kaddr in `struct z_erofs_maprecorder` `kaddr` becomes useless after switching to metabuf. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241010235830.1535616-1-hsiangkao@linux.alibaba.com	2024-10-11 13:36:58 +08:00
Gao Xiang	2402082e53	erofs: get rid of z_erofs_try_to_claim_pcluster() Just fold it into the caller for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20241010090420.405871-1-hsiangkao@linux.alibaba.com	2024-10-11 13:36:58 +08:00
Gao Xiang	416a8b2c02	erofs: ensure regular inodes for file-backed mounts Only regular inodes are allowed for file-backed mounts, not directories (as seen in the original syzbot case) or special inodes. Also ensure that .read_folio() is implemented on the underlying fs for the primary device. Fixes: `fb17675026` ("erofs: add file-backed mount support") Reported-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/00000000000011bdde0622498ee3@google.com Tested-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240917130803.32418-1-hsiangkao@linux.alibaba.com	2024-10-11 13:36:41 +08:00
Al Viro	5f60d5f6bb	move asm/unaligned.h to linux/unaligned.h asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h	2024-10-02 17:23:23 -04:00
Gao Xiang	025497e1d1	erofs: reject inodes with negative i_size Negative i_size is never supported, although crafted images with inodes having negative i_size will NOT lead to security issues in our current codebase: The following image can verify this (gzip+base64 encoded): H4sICCmk4mYAA3Rlc3QuaW1nAGNgGAWjYBSMVPDo4dcH3jP2aTED2TwMKgxMUHHNJY/SQDQX LxcDIw3tZwXit44MDNpQ/n8gQJZ/vxjijosPuSyZ0DUDgQqcZoKzVYFsDShbHeh6PT29ktTi Eqz2g/y2pBFiLxDMh4lhs5+W4TAKRsEoGAWjYBSMglEwCkYBPQAAS2DbowAQAAA= Mark as bad inodes for such corrupted inodes explicitly. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240912083538.3011860-1-hsiangkao@linux.alibaba.com	2024-09-12 23:00:09 +08:00
Gao Xiang	7c3ca1838a	erofs: restrict pcluster size limitations Error out if {en,de}encoded size of a pcluster is unsupported: Maximum supported encoded size (of a pcluster): 1 MiB Maximum supported decoded size (of a pcluster): 12 MiB Users can still choose to use supported large configurations (e.g., for archival purposes), but there may be performance penalties in low-memory scenarios compared to smaller pclusters. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240912074156.2925394-1-hsiangkao@linux.alibaba.com	2024-09-12 23:00:09 +08:00
Chunhai Guo	79f504a2cd	erofs: allocate more short-lived pages from reserved pool first This patch aims to allocate bvpages and short-lived compressed pages from the reserved pool first. After applying this patch, there are three benefits. 1. It reduces the page allocation time. The bvpages and short-lived compressed pages account for about 4% of the pages allocated from the system in the multi-app launch benchmarks [1]. It reduces the page allocation time accordingly and lowers the likelihood of blockage by page allocation in low memory scenarios. 2. The pages in the reserved pool will be allocated on demand. Currently, bvpages and short-lived compressed pages are short-lived pages allocated from the system, and the pages in the reserved pool all originate from short-lived pages. Consequently, the number of reserved pool pages will increase to z_erofs_rsv_nrpages over time. With this patch, all short-lived pages are allocated from the reserved pool first, so the number of reserved pool pages will only increase when there are not enough pages. Thus, even if z_erofs_rsv_nrpages is set to a large number for specific reasons, the actual number of reserved pool pages may remain low as per demand. In the multi-app launch benchmarks [1], z_erofs_rsv_nrpages is set at 256, while the number of reserved pool pages remains below 64. 3. When erofs cache decompression is disabled (EROFS_ZIP_CACHE_DISABLED), all pages will only be allocated from the reserved pool for erofs. This will significantly reduce the memory pressure from erofs. [1] For additional details on the multi-app launch benchmarks, please refer to commit `0f6273ab46` ("erofs: add a reserved buffer pool for lz4 decompression"). Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240906121110.3701889-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-12 22:59:49 +08:00
Gao Xiang	2349d2fa02	erofs: sunset unneeded NOFAILs With iterative development, our codebase can now deal with compressed buffer misses properly if both in-place I/O and compressed buffer allocation fail. Note that if readahead fails (with non-uptodate folios), the original request will then fall back to synchronous read, and `.read_folio()` should return appropriate errnos; otherwise -EIO will be passed to user space, which is unexpected. To simplify rarely encountered failure paths, a mimic decompression will be just used. Before that, failure reasons are recorded in compressed_bvecs[] and they also act as placeholders to avoid in-place pages. They will be parsed just before decompression and then pass back to `.read_folio()`. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905084732.2684515-1-hsiangkao@linux.alibaba.com	2024-09-12 20:26:43 +08:00
Hongzhen Luo	8bdb6a8393	erofs: simplify erofs_map_blocks_flatmode() Get rid of redundant variables (nblocks, offset) and a dead branch (!tailendpacking). Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240905030339.1474396-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 15:27:14 +08:00
Yiyang Wu	53d514b970	erofs: refactor read_inode calling convention Refactor out the iop binding behavior out of the erofs_fill_symlink and move erofs_buf into the erofs_read_inode, so that erofs_fill_inode can only deal with inode operation bindings and can be decoupled from metabuf operations. This results in better calling conventions. Note that after this patch, we do not need erofs_buf and ofs as parameters any more when calling erofs_read_inode as all the data operations are now included in itself. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV/ Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240902093412.509083-1-toolmanp@tlmp.cc Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 15:27:11 +08:00
Yiyang Wu	b1bbb9a637	erofs: use kmemdup_nul in erofs_fill_symlink Remove open coding in erofs_fill_symlink. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20240425222847.GN2118490@ZenIV Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc> Link: https://lore.kernel.org/r/20240902083147.450558-2-toolmanp@tlmp.cc Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 15:27:11 +08:00
Gao Xiang	0d442ce0b3	erofs: mark experimental fscache backend deprecated Although fscache is still described as "General Filesystem Caching" for network filesystems and other things such as ISO9660 filesystems, it has actually become a part of netfslib recently, which was unexpected at the time when "EROFS over fscache" proposed (2021) since EROFS is entirely a disk filesystem and the dependency is redundant. Mark it deprecated and it will be removed after "fanotify pre-content hooks" lands, which will provide the same functionality for EROFS. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-4-hsiangkao@linux.alibaba.com	2024-09-10 15:27:11 +08:00
Gao Xiang	283213718f	erofs: support compressed inodes for fileio Use pseudo bios just like the previous fscache approach since merged bio_vecs can be filled properly with unique interfaces. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-3-hsiangkao@linux.alibaba.com	2024-09-10 15:27:09 +08:00
Gao Xiang	ce63cb62d7	erofs: support unencoded inodes for fileio Since EROFS only needs to handle read requests in simple contexts, Just directly use vfs_iocb_iter_read() for data I/Os. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240905093031.2745929-1-hsiangkao@linux.alibaba.com	2024-09-10 15:26:36 +08:00
Gao Xiang	fb17675026	erofs: add file-backed mount support It actually has been around for years: For containers and other sandbox use cases, there will be thousands (and even more) of authenticated (sub)images running on the same host, unlike OS images. Of course, all scenarios can use the same EROFS on-disk format, but bdev-backed mounts just work well for OS images since golden data is dumped into real block devices. However, it's somewhat hard for container runtimes to manage and isolate so many unnecessary virtual block devices safely and efficiently [1]: they just look like a burden to orchestrators and file-backed mounts are preferred indeed. There were already enough attempts such as Incremental FS, the original ComposeFS and PuzzleFS acting in the same way for immutable fses. As for current EROFS users, ComposeFS, containerd and Android APEXs will be directly benefited from it. On the other hand, previous experimental feature "erofs over fscache" was once also intended to provide a similar solution (inspired by Incremental FS discussion [2]), but the following facts show file-backed mounts will be a better approach: - Fscache infrastructure has recently been moved into new Netfslib which is an unexpected dependency to EROFS really, although it originally claims "it could be used for caching other things such as ISO9660 filesystems too." [3] - It takes an unexpectedly long time to upstream Fscache/Cachefiles enhancements. For example, the failover feature took more than one year, and the deamonless feature is still far behind now; - Ongoing HSM "fanotify pre-content hooks" [4] together with this will perfectly supersede "erofs over fscache" in a simpler way since developers (mainly containerd folks) could leverage their existing caching mechanism entirely in userspace instead of strictly following the predefined in-kernel caching tree hierarchy. After "fanotify pre-content hooks" lands upstream to provide the same functionality, "erofs over fscache" will be removed then (as an EROFS internal improvement and EROFS will not have to bother with on-demand fetching and/or caching improvements anymore.) [1] https://github.com/containers/storage/pull/2039 [2] https://lore.kernel.org/r/CAOQ4uxjbVxnubaPjVaGYiSwoGDTdpWbB=w_AeM6YM=zVixsUfQ@mail.gmail.com [3] https://docs.kernel.org/filesystems/caching/fscache.html [4] https://lore.kernel.org/r/cover.1723670362.git.josef@toxicpanda.com Closes: https://github.com/containers/composefs/issues/144 Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240830032840.3783206-1-hsiangkao@linux.alibaba.com	2024-09-10 15:26:35 +08:00
Gao Xiang	9e2f9d34dd	erofs: handle overlapped pclusters out of crafted images properly syzbot reported a task hang issue due to a deadlock case where it is waiting for the folio lock of a cached folio that will be used for cache I/Os. After looking into the crafted fuzzed image, I found it's formed with several overlapped big pclusters as below: Ext: logical offset \| length : physical offset \| length 0: 0.. 16384 \| 16384 : 151552.. 167936 \| 16384 1: 16384.. 32768 \| 16384 : 155648.. 172032 \| 16384 2: 32768.. 49152 \| 16384 : 537223168.. 537239552 \| 16384 ... Here, extent 0/1 are physically overlapped although it's entirely _impossible_ for normal filesystem images generated by mkfs. First, managed folios containing compressed data will be marked as up-to-date and then unlocked immediately (unlike in-place folios) when compressed I/Os are complete. If physical blocks are not submitted in the incremental order, there should be separate BIOs to avoid dependency issues. However, the current code mis-arranges z_erofs_fill_bio_vec() and BIO submission which causes unexpected BIO waits. Second, managed folios will be connected to their own pclusters for efficient inter-queries. However, this is somewhat hard to implement easily if overlapped big pclusters exist. Again, these only appear in fuzzed images so let's simply fall back to temporary short-lived pages for correctness. Additionally, it justifies that referenced managed folios cannot be truncated for now and reverts part of commit `2080ca1ed3` ("erofs: tidy up `struct z_erofs_bvec`") for simplicity although it shouldn't be any difference. Reported-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Reported-by: syzbot+de04e06b28cfecf2281c@syzkaller.appspotmail.com Reported-by: syzbot+c8c8238b394be4a1087d@syzkaller.appspotmail.com Tested-by: syzbot+4fc98ed414ae63d1ada2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000002fda01061e334873@google.com Fixes: `8e6c8fa9f2` ("erofs: enable big pcluster feature") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240910070847.3356592-1-hsiangkao@linux.alibaba.com	2024-09-10 15:26:15 +08:00
Sandeep Dhavale	3fc3e45fcd	erofs: fix error handling in z_erofs_init_decompressor If we get a failure at the first decompressor init (i = 0), the clean up while loop could enter infinite loop due to wrong while check. Check the value of i now to see if we need any clean up at all. Fixes: `5a7cce827e` ("erofs: refine z_erofs_{init,exit}_subsystem()") Reported-by: liujinbao1 <liujinbao1@xiaomi.com> Signed-off-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240905060027.2388893-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-09-10 00:46:34 +08:00
Gao Xiang	59aadaa7eb	erofs: clean up erofs_register_sysfs() After commit `684b290abc` ("erofs: add support for FS_IOC_GETFSSYSFSPATH"), `sb->s_sysfs_name` is now valid. Just use it to get rid of duplicated logic. Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240828095232.571946-1-hsiangkao@linux.alibaba.com	2024-09-10 00:46:34 +08:00
Gao Xiang	9ed50b8231	erofs: fix incorrect symlink detection in fast symlink Fast symlink can be used if the on-disk symlink data is stored in the same block as the on-disk inode, so we don’t need to trigger another I/O for symlink data. However, currently fs correction could be reported _incorrectly_ if inode xattrs are too large. In fact, these should be valid images although they cannot be handled as fast symlinks. Many thanks to Colin for reporting this! Reported-by: Colin Walters <walters@verbum.org> Reported-by: https://honggfuzz.dev/ Link: https://lore.kernel.org/r/bb2dd430-7de0-47da-ae5b-82ab2dd4d945@app.fastmail.com Fixes: `431339ba90` ("staging: erofs: add inode operations") [ Note that it's a runtime misbehavior instead of a security issue. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240909031911.1174718-1-hsiangkao@linux.alibaba.com	2024-09-10 00:45:13 +08:00
Gao Xiang	0005e01e1e	erofs: fix out-of-bound access when z_erofs_gbuf_growsize() partially fails If z_erofs_gbuf_growsize() partially fails on a global buffer due to memory allocation failure or fault injection (as reported by syzbot [1]), new pages need to be freed by comparing to the existing pages to avoid memory leaks. However, the old gbuf->pages[] array may not be large enough, which can lead to null-ptr-deref or out-of-bound access. Fix this by checking against gbuf->nrpages in advance. [1] https://lore.kernel.org/r/000000000000f7b96e062018c6e3@google.com Reported-by: syzbot+242ee56aaa9585553766@syzkaller.appspotmail.com Fixes: `d6db47e571` ("erofs: do not use pagepool in z_erofs_gbuf_growsize()") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240820085619.1375963-1-hsiangkao@linux.alibaba.com	2024-08-21 08:12:05 +08:00
Gao Xiang	e080a26725	erofs: allow large folios for compressed files As commit `2e6506e1c4` ("mm/migrate: fix deadlock in migrate_pages_batch() on large folios") has landed upstream, large folios can be safely enabled for compressed inodes since all prerequisites have already landed in 6.11-rc1. Stress tests has been running on my fleet for over 20 days without any regression. Additionally, users [1] have requested it for months. Let's allow large folios for EROFS full cases upstream now for wider testing. [1] https://lore.kernel.org/r/CAGsJ_4wtE8OcpinuqVwG4jtdx6Qh5f+TON6wz+4HMCq=A2qFcA@mail.gmail.com Cc: Barry Song <21cnbao@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> [ Gao Xiang: minor commit typo fixes. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240819025207.3808649-1-hsiangkao@linux.alibaba.com	2024-08-19 16:10:04 +08:00

1 2 3 4 5 ...

560 Commits