linux-stable-mirror

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2026-04-03 12:05:13 +02:00

Author	SHA1	Message	Date
Christoph Hellwig	87283ce621	iomap: fix submission side handling of completion side errors [ Upstream commit `4ad357e39b` ] The "if (dio->error)" in iomap_dio_bio_iter exists to stop submitting more bios when a completion already return an error. Commit `cfe057f7db` ("iomap_dio_actor(): fix iov_iter bugs") made it revert the iov by "copied", which is very wrong given that we've already consumed that range and submitted a bio for it. Fixes: `cfe057f7db` ("iomap_dio_actor(): fix iov_iter bugs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-03-04 07:19:29 -05:00
Christoph Hellwig	51297686e0	iomap: allocate s_dio_done_wq for async reads as well commit `7fd8720dff` upstream. Since commit 222f2c7c6d14 ("iomap: always run error completions in user context"), read error completions are deferred to s_dio_done_wq. This means the workqueue also needs to be allocated for async reads. Fixes: 222f2c7c6d14 ("iomap: always run error completions in user context") Reported-by: syzbot+a2b9a4ed0d61b1efb3f5@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251124140013.902853-1-hch@lst.de Tested-by: syzbot+a2b9a4ed0d61b1efb3f5@syzkaller.appspotmail.com Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2026-01-08 10:15:06 +01:00
Joanne Koong	7d107be58b	iomap: account for unaligned end offsets when truncating read range [ Upstream commit `9d875e0eef` ] The end position to start truncating from may be at an offset into a block, which under the current logic would result in overtruncation. Adjust the calculation to account for unaligned end offsets. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-3-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-08 10:13:55 +01:00
Joanne Koong	12053695c8	iomap: adjust read range correctly for non-block-aligned positions [ Upstream commit `7aa6bc3e87` ] iomap_adjust_read_range() assumes that the position and length passed in are block-aligned. This is not always the case however, as shown in the syzbot generated case for erofs. This causes too many bytes to be skipped for uptodate blocks, which results in returning the incorrect position and length to read in. If all the blocks are uptodate, this underflows length and returns a position beyond the folio. Fix the calculation to also take into account the block offset when calculating how many bytes can be skipped for uptodate blocks. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2026-01-08 10:13:54 +01:00
Christoph Hellwig	3b5f35085f	iomap: always run error completions in user context [ Upstream commit `ddb4873286` ] At least zonefs expects error completions to be able to sleep. Because error completions aren't performance critical, just defer them to workqueue context unconditionally. Fixes: `8dcc1a9d90` ("fs: New zonefs file system") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251113170633.1453259-3-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:07 +01:00
Christoph Hellwig	133900d61d	iomap: factor out a iomap_dio_done helper [ Upstream commit `ae2f33a519` ] Split out the struct iomap-dio level final completion from iomap_dio_bio_end_io into a helper to clean up the code and make it reusable. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-7-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: `ddb4873286` ("iomap: always run error completions in user context") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-18 13:55:06 +01:00
Gou Hao	d53b2d49a8	iomap: skip unnecessary ifs_block_is_uptodate check [ Upstream commit `8e3c15ee0d` ] In iomap_adjust_read_range, i is either the first !uptodate block, or it is past last for the second loop looking for trailing uptodate blocks. Assuming there's no overflow (there's no combination of huge folios and tiny blksize) then yeah, there is no point in retesting that the same block pointed to by i is uptodate since we hold the folio lock so nobody else could have set it uptodate. Signed-off-by: Gou Hao <gouhao@uniontech.com> Link: https://lore.kernel.org/20250410071236.16017-1-gouhao@uniontech.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-02 07:59:27 +02:00
Marco Nelissen	402ce16421	iomap: avoid avoid truncating 64-bit offset to 32 bits [ Upstream commit `c13094b894` ] on 32-bit kernels, iomap_write_delalloc_scan() was inadvertently using a 32-bit position due to folio_next_index() returning an unsigned long. This could lead to an infinite loop when writing to an xfs filesystem. Signed-off-by: Marco Nelissen <marco.nelissen@gmail.com> Link: https://lore.kernel.org/r/20250109041253.2494374-1-marco.nelissen@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-23 17:22:56 +01:00
Long Li	82c59a86a2	iomap: fix zero padding data issue in concurrent append writes [ Upstream commit `51d20d1dac` ] During concurrent append writes to XFS filesystem, zero padding data may appear in the file after power failure. This happens due to imprecise disk size updates when handling write completion. Consider this scenario with concurrent append writes same file: Thread 1: Thread 2: ------------ ----------- write [A, A+B] update inode size to A+B submit I/O [A, A+BS] write [A+B, A+B+C] update inode size to A+B+C <I/O completes, updates disk size to min(A+B+C, A+BS)> <power failure> After reboot: 1) with A+B+C < A+BS, the file has zero padding in range [A+B, A+B+C] \|< Block Size (BS) >\| \|DDDDDDDDDDDDDDDD0000000000000000\| ^ ^ ^ A A+B A+B+C (EOF) 2) with A+B+C > A+BS, the file has zero padding in range [A+B, A+BS] \|< Block Size (BS) >\|< Block Size (BS) >\| \|DDDDDDDDDDDDDDDD0000000000000000\|00000000000000000000000000000000\| ^ ^ ^ ^ A A+B A+BS A+B+C (EOF) D = Valid Data 0 = Zero Padding The issue stems from disk size being set to min(io_offset + io_size, inode->i_size) at I/O completion. Since io_offset+io_size is block size granularity, it may exceed the actual valid file data size. In the case of concurrent append writes, inode->i_size may be larger than the actual range of valid file data written to disk, leading to inaccurate disk size updates. This patch modifies the meaning of io_size to represent the size of valid data within EOF in an ioend. If the ioend spans beyond i_size, io_size will be trimmed to provide the file with more accurate size information. This is particularly useful for on-disk size updates at completion time. After this change, ioends that span i_size will not grow or merge with other ioends in concurrent scenarios. However, these cases that need growth/merging rarely occur and it seems no noticeable performance impact. Although rounding up io_size could enable ioend growth/merging in these scenarios, we decided to keep the code simple after discussion [1]. Another benefit is that it makes the xfs_ioend_is_append() check more accurate, which can reduce unnecessary end bio callbacks of xfs_end_bio() in certain scenarios, such as repeated writes at the file tail without extending the file size. Link [1]: https://patchwork.kernel.org/project/xfs/patch/20241113091907.56937-1-leo.lilong@huawei.com Fixes: `ae259a9c85` ("fs: introduce iomap infrastructure") # goes further back than this Signed-off-by: Long Li <leo.lilong@huawei.com> Link: https://lore.kernel.org/r/20241209114241.3725722-3-leo.lilong@huawei.com Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-17 13:40:34 +01:00
Long Li	7adf7df4bb	iomap: pass byte granular end position to iomap_add_to_ioend [ Upstream commit `b44679c63e` ] This is a preparatory patch for fixing zero padding issues in concurrent append write scenarios. In the following patches, we need to obtain byte-granular writeback end position for io_size trimming after EOF handling. Due to concurrent writeback and truncate operations, inode size may shrink. Resampling inode size would force writeback code to handle the newly appeared post-EOF blocks, which is undesirable. As Dave explained in [1]: "Really, the issue is that writeback mappings have to be able to handle the range being mapped suddenly appear to be beyond EOF. This behaviour is a longstanding writeback constraint, and is what iomap_writepage_handle_eof() is attempting to handle. We handle this by only sampling i_size_read() whilst we have the folio locked and can determine the action we should take with that folio (i.e. nothing, partial zeroing, or skip altogether). Once we've made the decision that the folio is within EOF and taken action on it (i.e. moved the folio to writeback state), we cannot then resample the inode size because a truncate may have started and changed the inode size." To avoid resampling inode size after EOF handling, we convert end_pos to byte-granular writeback position and return it from EOF handling function. Since iomap_set_range_dirty() can handle unaligned lengths, this conversion has no impact on it. However, iomap_find_dirty_range() requires aligned start and end range to find dirty blocks within the given range, so the end position needs to be rounded up when passed to it. LINK [1]: https://lore.kernel.org/linux-xfs/Z1Gg0pAa54MoeYME@localhost.localdomain/ Signed-off-by: Long Li <leo.lilong@huawei.com> Link: https://lore.kernel.org/r/20241209114241.3725722-2-leo.lilong@huawei.com Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: `51d20d1dac` ("iomap: fix zero padding data issue in concurrent append writes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-17 13:40:34 +01:00
Pankaj Raghav	f40881bde8	fs/writeback: convert wbc_account_cgroup_owner to take a folio [ Upstream commit `30dac24e14` ] Most of the callers of wbc_account_cgroup_owner() are converting a folio to page before calling the function. wbc_account_cgroup_owner() is converting the page back to a folio to call mem_cgroup_css_from_folio(). Convert wbc_account_cgroup_owner() to take a folio instead of a page, and convert all callers to pass a folio directly except f2fs. Convert the page to folio for all the callers from f2fs as they were the only callers calling wbc_account_cgroup_owner() with a page. As f2fs is already in the process of converting to folios, these call sites might also soon be calling wbc_account_cgroup_owner() with a folio directly in the future. No functional changes. Only compile tested. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com Acked-by: David Sterba <dsterba@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: `51d20d1dac` ("iomap: fix zero padding data issue in concurrent append writes") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-01-17 13:40:33 +01:00
Linus Torvalds	17fa6a5f93	Merge tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iomap fixes from Christian Brauner: "Fixes for iomap to prevent data corruption bugs in the fallocate unshare range implementation of fsdax and a small cleanup to turn iomap_want_unshare_iter() into an inline function" * tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iomap: turn iomap_want_unshare_iter into an inline function fsdax: dax_unshare_iter needs to copy entire blocks fsdax: remove zeroing code from dax_unshare_iter iomap: share iomap_unshare_iter predicate code with fsdax xfs: don't allocate COW extents when unsharing a hole	2024-11-01 07:45:00 -10:00
Christoph Hellwig	6db388585e	iomap: turn iomap_want_unshare_iter into an inline function iomap_want_unshare_iter currently sits in fs/iomap/buffered-io.c, which depends on CONFIG_BLOCK. It is also in used in fs/dax.c whіch has no such dependency. Given that it is a trivial check turn it into an inline in include/linux/iomap.h to fix the DAX && !BLOCK build. Fixes: `6ef6a0e821` ("iomap: share iomap_unshare_iter predicate code with fsdax") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241015041350.118403-1-hch@lst.de Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-21 17:01:01 +02:00
Christoph Hellwig	b784951662	iomap: move locking out of iomap_write_delalloc_release XFS (which currently is the only user of iomap_write_delalloc_release) already holds invalidate_lock for most zeroing operations. To be able to avoid a deadlock it needs to stop taking the lock, but doing so in iomap would leak XFS locking details into iomap. To avoid this require the caller to hold invalidate_lock when calling iomap_write_delalloc_release instead of taking it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-10-15 11:37:42 +02:00
Christoph Hellwig	caf0ea451d	iomap: remove iomap_file_buffered_write_punch_delalloc Currently iomap_file_buffered_write_punch_delalloc can be called from XFS either with the invalidate lock held or not. To fix this while keeping the locking in the file system and not the iomap library code we'll need to life the locking up into the file system. To prepare for that, open code iomap_file_buffered_write_punch_delalloc in the only caller, and instead export iomap_write_delalloc_release. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-10-15 11:37:42 +02:00
Christoph Hellwig	c0adf8c3a9	iomap: factor out a iomap_last_written_block helper Split out a pice of logic from iomap_file_buffered_write_punch_delalloc that is useful for all iomap_end implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>	2024-10-15 11:37:41 +02:00
Darrick J. Wong	6ef6a0e821	iomap: share iomap_unshare_iter predicate code with fsdax The predicate code that iomap_unshare_iter uses to decide if it's really needs to unshare a file range mapping should be shared with the fsdax version, because right now they're opencoded and inconsistent. Note that we simplify the predicate logic a bit -- we no longer allow unsharing of inline data mappings, but there aren't any filesystems that allow shared inline data currently. This is a fix in the sense that it should have been ported to fsdax. Fixes: `b53fdb215d` ("iomap: improve shared block detection in iomap_unshare_iter") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/172796813294.1131942.15762084021076932620.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-07 13:51:47 +02:00
Darrick J. Wong	a311a08a42	iomap: constrain the file range passed to iomap_file_unshare File contents can only be shared (i.e. reflinked) below EOF, so it makes no sense to try to unshare ranges beyond EOF. Constrain the file range parameters here so that we don't have to do that in the callers. Fixes: `5f4e5752a8` ("fs: add iomap_file_dirty") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150213.GC21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-03 10:22:28 +02:00
Darrick J. Wong	f7a4874d97	iomap: don't bother unsharing delalloc extents If unshare encounters a delalloc reservation in the srcmap, that means that the file range isn't shared because delalloc reservations cannot be reflinked. Therefore, don't try to unshare them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20241002150040.GB21853@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-03 10:22:25 +02:00
Linus Torvalds	171754c380	Merge tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs blocksize updates from Christian Brauner: "This contains the vfs infrastructure as well as the xfs bits to enable support for block sizes (bs) larger than page sizes (ps) plus a few fixes to related infrastructure. There has been efforts over the last 16 years to enable enable Large Block Sizes (LBS), that is block sizes in filesystems where bs > page size. Through these efforts we have learned that one of the main blockers to supporting bs > ps in filesystems has been a way to allocate pages that are at least the filesystem block size on the page cache where bs > ps. Thanks to various previous efforts it is possible to support bs > ps in XFS with only a few changes in XFS itself. Most changes are to the page cache to support minimum order folio support for the target block size on the filesystem. A motivation for Large Block Sizes today is to support high-capacity (large amount of Terabytes) QLC SSDs where the internal Indirection Unit (IU) are typically greater than 4k to help reduce DRAM and so in turn cost and space. In practice this then allows different architectures to use a base page size of 4k while still enabling support for block sizes aligned to the larger IUs by relying on high order folios on the page cache when needed. It also allows to take advantage of the drive's support for atomics larger than 4k with buffered IO support in Linux. As described this year at LSFMM, supporting large atomics greater than 4k enables databases to remove the need to rely on their own journaling, so they can disable double buffered writes, which is a feature different cloud providers are already enabling through custom storage solutions" * tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (22 commits) Documentation: iomap: fix a typo iomap: remove the iomap_file_buffered_write_punch_delalloc return value iomap: pass the iomap to the punch callback iomap: pass flags to iomap_file_buffered_write_punch_delalloc iomap: improve shared block detection in iomap_unshare_iter iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release docs:filesystems: fix spelling and grammar mistakes in iomap design page filemap: fix htmldoc warning for mapping_align_index() iomap: make zero range flush conditional on unwritten mappings iomap: fix handling of dirty folios over unwritten extents iomap: add a private argument for iomap_file_buffered_write iomap: remove set_memor_ro() on zero page xfs: enable block size larger than page size support xfs: make the calculation generic in xfs_sb_validate_fsb_count() xfs: expose block size in stat xfs: use kvmalloc for xattr buffers iomap: fix iomap_dio_zero() for fs bs > system page size filemap: cap PTE range to be created to allowed zero fill in folio_map_range() mm: split a folio in minimum folio order chunks readahead: allocate folios with mapping_min_order in readahead ...	2024-09-20 17:53:17 -07:00
Christoph Hellwig	4bceb9ba05	iomap: remove the iomap_file_buffered_write_punch_delalloc return value iomap_file_buffered_write_punch_delalloc can only return errors if either the ->punch callback returned an error, or if someone changed the API of mapping_seek_hole_data to return a negative error code that is not -ENXIO. As the only instance of ->punch never returns an error, an such an error would be fatal anyway remove the entire error propagation and don't return an error code from iomap_file_buffered_write_punch_delalloc. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-6-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:14:15 +02:00
Christoph Hellwig	492f53758f	iomap: pass the iomap to the punch callback XFS will need to look at the flags in the iomap structure, so pass it down all the way to the callback. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-5-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:14:15 +02:00
Christoph Hellwig	11596dc3df	iomap: pass flags to iomap_file_buffered_write_punch_delalloc To fix short write error handling, We'll need to figure out what operation iomap_file_buffered_write_punch_delalloc is called for. Pass the flags argument on to it, and reorder the argument list to match that of ->iomap_end so that the compiler only has to add the new punch argument to the end of it instead of reshuffling the registers. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-4-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:14:14 +02:00
Christoph Hellwig	b53fdb215d	iomap: improve shared block detection in iomap_unshare_iter Currently iomap_unshare_iter relies on the IOMAP_F_SHARED flag to detect blocks to unshare. This is reasonable, but IOMAP_F_SHARED is also useful for the file system to do internal book keeping for out of place writes. XFS used to that, until it got removed in commit `72a048c105` ("xfs: only set IOMAP_F_SHARED when providing a srcmap to a write") because unshare for incorrectly unshare such blocks. Add an extra safeguard by checking the explicitly provided srcmap instead of the fallback to the iomap for valid data, as that catches the case where we'd just copy from the same place we'd write to easily, allowing to reinstate setting IOMAP_F_SHARED for all XFS writes that go to the COW fork. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-3-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:13:43 +02:00
Christoph Hellwig	7a9d43eace	iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release When direct I/O completions invalidates the page cache it holds neither the i_rwsem nor the invalidate_lock so it can be racing with iomap_write_delalloc_release. If the search for the end of the region that contains data returns the start offset we hit such a race and just need to look for the end of the newly created hole instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240910043949.3481298-2-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-10 11:13:43 +02:00
Brian Foster	7d9b474ee4	iomap: make zero range flush conditional on unwritten mappings iomap_zero_range() flushes pagecache to mitigate consistency problems with dirty pagecache and unwritten mappings. The flush is unconditional over the entire range because checking pagecache state after mapping lookup is racy with writeback and reclaim. There are ways around this using iomap's mapping revalidation mechanism, but this is not supported by all iomap based filesystems and so is not a generic solution. There is another way around this limitation that is good enough to filter the flush for most cases in practice. If we check for dirty pagecache over the target range (instead of unconditionally flush), we can keep track of whether the range was dirty before lookup and defer the flush until/unless we see a combination of dirty cache backed by an unwritten mapping. We don't necessarily know whether the dirty cache was backed by the unwritten maping or some other (written) part of the range, but the impliciation of a false positive here is a spurious flush and thus relatively harmless. Note that we also flush for hole mappings because iomap_zero_range() is used for partial folio zeroing in some cases. For example, if a folio straddles EOF on a sub-page FSB size fs, the post-eof portion is hole-backed and dirtied/written via mapped write, and then i_size increases before writeback can occur (which otherwise zeroes the post-eof portion of the EOF folio), then the folio becomes inconsistent with disk until reclaimed. A flush in this case executes partial zeroing from writeback, and iomap knows that there is otherwise no I/O to submit for hole backed mappings. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20240830145634.138439-3-bfoster@redhat.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:24 +02:00
Brian Foster	c5c810b94c	iomap: fix handling of dirty folios over unwritten extents The iomap zero range implementation doesn't properly handle dirty pagecache over unwritten mappings. It skips such mappings as if they were pre-zeroed. If some part of an unwritten mapping is dirty in pagecache from a previous write, the data in cache should be zeroed as well. Instead, the data is left in cache and creates a stale data exposure problem if writeback occurs sometime after the zero range. Most callers are unaffected by this because the higher level filesystem contexts that call zero range typically perform a filemap flush of the target range for other reasons. A couple contexts that don't otherwise need to flush are write file size extension and truncate in XFS. The former path is currently susceptible to the stale data exposure problem and the latter performs a flush specifically to work around it. This is clearly inconsistent and incomplete. As a first step toward correcting behavior, lift the XFS workaround to iomap_zero_range() and unconditionally flush the range before the zero range operation proceeds. While this appears to be a bit of a big hammer, most all users already do this from calling context save for the couple of exceptions noted above. Future patches will optimize or elide this flush while maintaining functional correctness. Fixes: `ae259a9c85` ("fs: introduce iomap infrastructure") Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20240830145634.138439-2-bfoster@redhat.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:24 +02:00
Josef Bacik	31754ea6cb	iomap: add a private argument for iomap_file_buffered_write In order to switch fuse over to using iomap for buffered writes we need to be able to have the struct file for the original write, in case we have to read in the page to make it uptodate. Handle this by using the existing private field in the iomap_iter, and add the argument to iomap_file_buffered_write. This will allow us to pass the file in through the iomap buffered write path, and is flexible for any other file systems needs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:23 +02:00
Luis Chamberlain	d1dd75dcda	iomap: remove set_memor_ro() on zero page Stephen reported a boot failure on ppc power8 system where set_memor_ro() on the new zero page failed [0]. Christophe Leroy further clarifies we can't use this on on linear memory on ppc, and so instead of special casing this just for PowerPC [2] remove the call as suggested by Darrick. [0] https://lore.kernel.org/all/20240826175931.1989f99e@canb.auug.org.au/T/#u [1] https://lore.kernel.org/all/b0fe75b4-c1bb-47f7-a7c3-2534b31c1780@csgroup.eu/ [2] https://lore.kernel.org/all/ZszrJkFOpiy5rCma@bombadil.infradead.org/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240826212632.2098685-1-mcgrof@kernel.org Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-03 15:01:23 +02:00
Pankaj Raghav	10553a9165	iomap: fix iomap_dio_zero() for fs bs > system page size iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-7-kernel@pankajraghav.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-09-02 16:19:43 +02:00
Matthew Wilcox (Oracle)	97edbc02b2	buffer: Convert block_write_end() to take a folio All callers now have a folio, so pass it in instead of converting from a folio to a page and back to a folio again. Saves a call to compound_head(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-08-07 11:31:59 +02:00
Linus Torvalds	4f5e249ec0	Merge tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: "This contains some minor work for the iomap subsystem: - Add documentation on the design of iomap and how to port to it - Optimize iomap_read_folio() - Bring back the change to iomap_write_end() to no increase i_size. This is accompanied by a change to xfs to reserve blocks for truncating large realtime inodes to avoid exposing stale data when iomap_write_end() stops increasing i_size" * tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: don't increase i_size in iomap_write_end() xfs: reserve blocks for truncating large realtime inode Documentation: the design of iomap and how to port iomap: Optimize iomap_read_folio	2024-07-15 13:28:14 -07:00
Linus Torvalds	aff31330e0	Merge tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull PG_error removal updates from Christian Brauner: "This contains work to remove almost all remaining users of PG_error from filesystems and filesystem helper libraries. An additional patch will be coming in via the jfs tree which tests the PG_error bit. Afterwards nothing will be testing it anymore and it's safe to remove all places which set or clear the PG_error bit. The goal is to fully remove PG_error by the next merge window" * tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: buffer: Remove calls to set and clear the folio error flag iomap: Remove calls to set and clear folio error flag vboxsf: Convert vboxsf_read_folio() to use a folio ufs: Remove call to set the folio error flag romfs: Convert romfs_read_folio() to use a folio reiserfs: Remove call to folio_set_error() orangefs: Remove calls to set/clear the error flag nfs: Remove calls to folio_set_error jffs2: Remove calls to set/clear the folio error flag hostfs: Convert hostfs_read_folio() to use a folio isofs: Convert rock_ridge_symlink_read_folio to use a folio hpfs: Convert hpfs_symlink_read_folio to use a folio efs: Convert efs_symlink_read_folio to use a folio cramfs: Convert cramfs_read_folio to use a folio coda: Convert coda_symlink_filler() to use folio_end_read() befs: Convert befs_symlink_read_folio() to use folio_end_read()	2024-07-15 11:08:14 -07:00
Zhang Yi	602f09f402	iomap: don't increase i_size in iomap_write_end() This reverts commit '0841ea4a3b41 ("iomap: keep on increasing i_size in iomap_write_end()")'. After xfs could zero out the tail blocks aligned to the allocation unitsize and convert the tail blocks to unwritten for realtime inode on truncate down, it couldn't expose any stale data when unaligned truncate down realtime inodes, so we could keep on keeping i_size for IOMAP_UNSHARE and IOMAP_ZERO in iomap_write_end(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240618142112.1315279-3-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-19 15:58:28 +02:00
Ritesh Harjani (IBM)	af4eb6f46f	iomap: Optimize iomap_read_folio iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks" within a folio separately. This makes iomap_read_folio() to call into ->iomap_begin() to request for extent mapping even though it might already have an extent which is not fully processed. This happens when we either have a large folio or with bs < ps. In these cases we can have sub blocks which can be uptodate (say for e.g. due to previous writes). With iomap_read_folio_iter(), this is handled more efficiently by not calling ->iomap_begin() call until all the sub blocks with the current folio are processed. iomap_read_folio_iter() handles multiple sub blocks within a given folio but it's implementation logic is similar to how iomap_readahead_iter() handles multiple folios within a single mapped extent. Both of them iterate over a given range of folio/mapped extent and call iomap_readpage_iter() for reading. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/92ae9f3333c9a7e66214568d08f45664261c899c.1715067055.git.ritesh.list@gmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-19 15:58:19 +02:00
Ritesh Harjani (IBM)	f5ceb1bbc9	iomap: Fix iomap_adjust_read_range for plen calculation If the extent spans the block that contains i_size, we need to handle both halves separately so that we properly zero data in the page cache for blocks that are entirely outside of i_size. But this is needed only when i_size is within the current folio under processing. "orig_pos + length > isize" can be true for all folios if the mapped extent length is greater than the folio size. That is making plen to break for every folio instead of only the last folio. So use orig_plen for checking if "orig_pos + orig_plen > isize". Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/a32e5f9a4fcfdb99077300c4020ed7ae61d6e0f9.1715067055.git.ritesh.list@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-05 17:27:03 +02:00
Zhang Yi	0841ea4a3b	iomap: keep on increasing i_size in iomap_write_end() Commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")' breaks xfs with realtime device on generic/561, the problem is when unaligned truncate down a xfs realtime inode with rtextsize > 1 fs block, xfs only zero out the EOF block but doesn't zero out the tail blocks that aligned to rtextsize, so if we don't increase i_size in iomap_write_end(), it could expose stale data after we do an append write beyond the aligned EOF block. xfs should zero out the tail blocks when truncate down, but before we finish that, let's fix the issue by just revert the changes in iomap_write_end(). Fixes: `943bc0882c` ("iomap: don't increase i_size if it's not a write operation") Reported-by: Chandan Babu R <chandanbabu@kernel.org> Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240603112222.2109341-1-yi.zhang@huaweicloud.com Tested-by: Chandan Babu R <chandanbabu@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-06-05 17:23:39 +02:00
Matthew Wilcox (Oracle)	1f56eedf7f	iomap: Remove calls to set and clear folio error flag The folio error flag is not checked anywhere, so we can remove the calls to set and clear it. Cc: Christian Brauner <brauner@kernel.org> Cc: linux-xfs@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240530202110.2653630-16-willy@infradead.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-31 12:31:42 +02:00
Xu Yang	4e527d5841	iomap: fault in smaller chunks for non-large folio mappings Since commit (`5d8edfb900` "iomap: Copy larger chunks from userspace"), iomap will try to copy in larger chunks than PAGE_SIZE. However, if the mapping doesn't support large folio, only one page of maximum 4KB will be created and 4KB data will be writen to pagecache each time. Then, next 4KB will be handled in next iteration. This will cause potential write performance problem. If chunk is 2MB, total 512 pages need to be handled finally. During this period, fault_in_iov_iter_readable() is called to check iov_iter readable validity. Since only 4KB will be handled each time, below address space will be checked over and over again: start end - buf, buf+2MB buf+4KB, buf+2MB buf+8KB, buf+2MB ... buf+2044KB buf+2MB Obviously the checking size is wrong since only 4KB will be handled each time. So this will get a correct chunk to let iomap work well in non-large folio case. With this change, the write speed will be stable. Tested on ARM64 device. Before: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (334 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (278 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (204 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (170 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (150 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (139 MB/s) After: - dd if=/dev/zero of=/dev/sda bs=400K count=10485 (339 MB/s) - dd if=/dev/zero of=/dev/sda bs=800K count=5242 (330 MB/s) - dd if=/dev/zero of=/dev/sda bs=1600K count=2621 (332 MB/s) - dd if=/dev/zero of=/dev/sda bs=2200K count=1906 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=3000K count=1398 (333 MB/s) - dd if=/dev/zero of=/dev/sda bs=4500K count=932 (333 MB/s) Fixes: `5d8edfb900` ("iomap: Copy larger chunks from userspace") Cc: stable@vger.kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240521114939.2541461-2-xu.yang_2@nxp.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-24 13:34:07 +02:00
Linus Torvalds	ff9a79307f	Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...	2024-05-18 12:39:20 -07:00
Masahiro Yamada	b1992c3772	kbuild: use $(src) instead of $(srctree)/$(src) for source directory Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2024-05-10 04:34:52 +09:00
Zhang Yi	e1f453d433	iomap: do some small logical cleanup in buffered write Since iomap_write_end() can never return a partial write length, the comparison between written, copied and bytes becomes useless, just merge them with the unwritten branch. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-10-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	815f4b633b	iomap: make iomap_write_end() return a boolean For now, we can make sure iomap_write_end() always return 0 or copied bytes, so instead of return written bytes, convert to return a boolean to indicate the copied bytes have been written to the pagecache. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-9-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	1a61d74932	iomap: use a new variable to handle the written bytes in iomap_write_iter() In iomap_write_iter(), the status variable used to receive the return value from iomap_write_end() is confusing, replace it with a new written variable to represent the written bytes in each cycle, no logic changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-8-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	943bc0882c	iomap: don't increase i_size if it's not a write operation Increase i_size in iomap_zero_range() and iomap_unshare_iter() is not needed, the caller should handle it. Especially, when truncate partial block, we should not increase i_size beyond the new EOF here. It doesn't affect xfs and gfs2 now because they set the new file size after zero out, it doesn't matter that a transient increase in i_size, but it will affect ext4 because it set file size before truncate. So move the i_size updating logic to iomap_write_iter(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-7-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:54 +02:00
Zhang Yi	89c6c1d91a	iomap: drop the write failure handles when unsharing and zeroing Unsharing and zeroing can only happen within EOF, so there is never a need to perform posteof pagecache truncation if write begin fails, also partial write could never theoretically happened from iomap_write_end(), so remove both of them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240320110548.2200662-6-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-25 14:23:53 +02:00
Christoph Hellwig	0fac04e4e0	iomap: convert iomap_writepages to writeack_iter This removes one indirect function call per folio, and adds type safety by not casting through a void pointer. Based on a patch by Matthew Wilcox. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240412061614.1511629-1-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-04-15 14:25:55 +02:00
Christian Brauner	86835c39e0	Merge tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull write hint fix from Christian Brauner: UFS devices are widely used in mobile applications, e.g. in smartphones. UFS vendors need data lifetime information to achieve good performance. Providing data lifetime information to UFS devices can result in up to 40% lower write amplification. Hence this patch series that restores the bi_write_hint member in struct bio. After this patch series has been merged, patches that implement data lifetime support in the SCSI disk (sd) driver will be sent to the Linux kernel SCSI maintainer. The following changes are included in this patch series: - Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls. - Move enum rw_hint into a new header file. - Support F_SET_RW_HINT for block devices to make it easy to test data lifetime support. - Restore the bio.bi_write_hint member and restore support in the VFS layer and also in the block layer for data lifetime information. The shell script that has been used to test the patch series combined with the SCSI patches is available at the end of this cover letter. * tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: block, fs: Restore the per-bio/request data lifetime fields fs: Propagate write hints to the struct block_device inode fs: Move enum rw_hint into a new header file fs: Split fcntl_rw_hint() fs: Verify write lifetime constants at compile time fs: Fix rw_hint validation Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-03-04 18:35:21 +01:00
Kassey Li	dcd04ea587	iomap: Add processed for iomap_iter processed: The number of bytes processed by the body in the most recent iteration, or a negative errno. 0 causes the iteration to stop. The processed is useful to check when the loop breaks. Signed-off-by: Kassey Li <quic_yingangl@quicinc.com> Link: https://lore.kernel.org/r/20240219021138.3481763-1-quic_yingangl@quicinc.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-21 08:37:50 +01:00
Zhang Yi	54943abce0	iomap: add pos and dirty_len into trace_iomap_writepage_map Since commit fd07e0aa23c4 ("iomap: map multiple blocks at a time"), we could map multi-blocks once a time, and the dirty_len indicates the expected map length, map_len won't large than it. The pos and dirty_len means the dirty range that should be mapped to write, add them into trace_iomap_writepage_map() could be more useful for debug. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240220115759.3445025-1-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-21 08:35:20 +01:00

1 2 3 4 5 ...

372 Commits