Skip to content

Commit 7a3dade

Browse files
committed
Merge tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim: "In this round, we've added new features such as zone capacity for ZNS and a new GC policy, ATGC, along with in-memory segment management. In addition, we could improve the decompression speed significantly by changing virtual mapping method. Even though we've fixed lots of small bugs in compression support, I feel that it becomes more stable so that I could give it a try in production. Enhancements: - suport zone capacity in NVMe Zoned Namespace devices - introduce in-memory current segment management - add standart casefolding support - support age threshold based garbage collection - improve decompression speed by changing virtual mapping method Bug fixes: - fix condition checks in some ioctl() such as compression, move_range, etc - fix 32/64bits support in data structures - fix memory allocation in zstd decompress - add some boundary checks to avoid kernel panic on corrupted image - fix disallowing compression for non-empty file - fix slab leakage of compressed block writes In addition, it includes code refactoring for better readability and minor bug fixes for compression and zoned device support" * tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits) f2fs: code cleanup by removing unnecessary check f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info f2fs: fix writecount false positive in releasing compress blocks f2fs: introduce check_swap_activate_fast() f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case f2fs: handle errors of f2fs_get_meta_page_nofail f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode f2fs: reject CASEFOLD inode flag without casefold feature f2fs: fix memory alignment to support 32bit f2fs: fix slab leak of rpages pointer f2fs: compress: fix to disallow enabling compress on non-empty file f2fs: compress: introduce cic/dic slab cache f2fs: compress: introduce page array slab cache f2fs: fix to do sanity check on segment/section count f2fs: fix to check segment boundary during SIT page readahead f2fs: fix uninit-value in f2fs_lookup f2fs: remove unneeded parameter in find_in_block() f2fs: fix wrong total_sections check and fsmeta check f2fs: remove duplicated code in sanity_check_area_boundary f2fs: remove unused check on version_bitmap ...
2 parents 54a4c78 + 788e96d commit 7a3dade

28 files changed

Lines changed: 1795 additions & 493 deletions

File tree

Documentation/ABI/testing/sysfs-fs-f2fs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ Contact: "Namjae Jeon" <namjae.jeon@samsung.com>
2222
Description: Controls the victim selection policy for garbage collection.
2323
Setting gc_idle = 0(default) will disable this option. Setting
2424
gc_idle = 1 will select the Cost Benefit approach & setting
25-
gc_idle = 2 will select the greedy approach.
25+
gc_idle = 2 will select the greedy approach & setting
26+
gc_idle = 3 will select the age-threshold based approach.
2627

2728
What: /sys/fs/f2fs/<disk>/reclaim_segments
2829
Date: October 2013

Documentation/filesystems/f2fs.rst

Lines changed: 67 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -127,14 +127,14 @@ active_logs=%u Support configuring the number of active logs. In the
127127
current design, f2fs supports only 2, 4, and 6 logs.
128128
Default number is 6.
129129
disable_ext_identify Disable the extension list configured by mkfs, so f2fs
130-
does not aware of cold files such as media files.
130+
is not aware of cold files such as media files.
131131
inline_xattr Enable the inline xattrs feature.
132132
noinline_xattr Disable the inline xattrs feature.
133133
inline_xattr_size=%u Support configuring inline xattr size, it depends on
134134
flexible inline xattr feature.
135-
inline_data Enable the inline data feature: New created small(<~3.4k)
135+
inline_data Enable the inline data feature: Newly created small (<~3.4k)
136136
files can be written into inode block.
137-
inline_dentry Enable the inline dir feature: data in new created
137+
inline_dentry Enable the inline dir feature: data in newly created
138138
directory entries can be written into inode block. The
139139
space of inode block which is used to store inline
140140
dentries is limited to ~3.4k.
@@ -203,9 +203,9 @@ usrjquota=<file> Appoint specified file and type during mount, so that quota
203203
grpjquota=<file> information can be properly updated during recovery flow,
204204
prjjquota=<file> <quota file>: must be in root directory;
205205
jqfmt=<quota type> <quota type>: [vfsold,vfsv0,vfsv1].
206-
offusrjquota Turn off user journelled quota.
207-
offgrpjquota Turn off group journelled quota.
208-
offprjjquota Turn off project journelled quota.
206+
offusrjquota Turn off user journalled quota.
207+
offgrpjquota Turn off group journalled quota.
208+
offprjjquota Turn off project journalled quota.
209209
quota Enable plain user disk quota accounting.
210210
noquota Disable all plain disk quota option.
211211
whint_mode=%s Control which write hints are passed down to block
@@ -266,6 +266,8 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted
266266
inline encryption hardware. The on-disk format is
267267
unaffected. For more details, see
268268
Documentation/block/inline-encryption.rst.
269+
atgc Enable age-threshold garbage collection, it provides high
270+
effectiveness and efficiency on background GC.
269271
======================== ============================================================
270272

271273
Debugfs Entries
@@ -301,7 +303,7 @@ Usage
301303

302304
# insmod f2fs.ko
303305

304-
3. Create a directory trying to mount::
306+
3. Create a directory to use when mounting::
305307

306308
# mkdir /mnt/f2fs
307309

@@ -315,7 +317,7 @@ mkfs.f2fs
315317
The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
316318
which builds a basic on-disk layout.
317319

318-
The options consist of:
320+
The quick options consist of:
319321

320322
=============== ===========================================================
321323
``-l [label]`` Give a volume label, up to 512 unicode name.
@@ -337,17 +339,21 @@ The options consist of:
337339
1 is set by default, which conducts discard.
338340
=============== ===========================================================
339341

342+
Note: please refer to the manpage of mkfs.f2fs(8) to get full option list.
343+
340344
fsck.f2fs
341345
---------
342346
The fsck.f2fs is a tool to check the consistency of an f2fs-formatted
343347
partition, which examines whether the filesystem metadata and user-made data
344348
are cross-referenced correctly or not.
345349
Note that, initial version of the tool does not fix any inconsistency.
346350

347-
The options consist of::
351+
The quick options consist of::
348352

349353
-d debug level [default:0]
350354

355+
Note: please refer to the manpage of fsck.f2fs(8) to get full option list.
356+
351357
dump.f2fs
352358
---------
353359
The dump.f2fs shows the information of specific inode and dumps SSA and SIT to
@@ -371,6 +377,37 @@ Examples::
371377
# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
372378
# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
373379

380+
Note: please refer to the manpage of dump.f2fs(8) to get full option list.
381+
382+
sload.f2fs
383+
----------
384+
The sload.f2fs gives a way to insert files and directories in the exisiting disk
385+
image. This tool is useful when building f2fs images given compiled files.
386+
387+
Note: please refer to the manpage of sload.f2fs(8) to get full option list.
388+
389+
resize.f2fs
390+
-----------
391+
The resize.f2fs lets a user resize the f2fs-formatted disk image, while preserving
392+
all the files and directories stored in the image.
393+
394+
Note: please refer to the manpage of resize.f2fs(8) to get full option list.
395+
396+
defrag.f2fs
397+
-----------
398+
The defrag.f2fs can be used to defragment scattered written data as well as
399+
filesystem metadata across the disk. This can improve the write speed by giving
400+
more free consecutive space.
401+
402+
Note: please refer to the manpage of defrag.f2fs(8) to get full option list.
403+
404+
f2fs_io
405+
-------
406+
The f2fs_io is a simple tool to issue various filesystem APIs as well as
407+
f2fs-specific ones, which is very useful for QA tests.
408+
409+
Note: please refer to the manpage of f2fs_io(8) to get full option list.
410+
374411
Design
375412
======
376413

@@ -383,7 +420,7 @@ consists of a set of sections. By default, section and zone sizes are set to one
383420
segment size identically, but users can easily modify the sizes by mkfs.
384421

385422
F2FS splits the entire volume into six areas, and all the areas except superblock
386-
consists of multiple segments as described below::
423+
consist of multiple segments as described below::
387424

388425
align with the zone size <-|
389426
|-> align with the segment size
@@ -486,7 +523,7 @@ one inode block (i.e., a file) covers::
486523
`- direct node (1018)
487524
`- data (1018)
488525

489-
Note that, all the node blocks are mapped by NAT which means the location of
526+
Note that all the node blocks are mapped by NAT which means the location of
490527
each node is translated by the NAT table. In the consideration of the wandering
491528
tree problem, F2FS is able to cut off the propagation of node updates caused by
492529
leaf data writes.
@@ -566,7 +603,7 @@ When F2FS finds a file name in a directory, at first a hash value of the file
566603
name is calculated. Then, F2FS scans the hash table in level #0 to find the
567604
dentry consisting of the file name and its inode number. If not found, F2FS
568605
scans the next hash table in level #1. In this way, F2FS scans hash tables in
569-
each levels incrementally from 1 to N. In each levels F2FS needs to scan only
606+
each levels incrementally from 1 to N. In each level F2FS needs to scan only
570607
one bucket determined by the following equation, which shows O(log(# of files))
571608
complexity::
572609

@@ -707,7 +744,7 @@ WRITE_LIFE_LONG " WRITE_LIFE_LONG
707744
Fallocate(2) Policy
708745
-------------------
709746

710-
The default policy follows the below posix rule.
747+
The default policy follows the below POSIX rule.
711748

712749
Allocating disk space
713750
The default operation (i.e., mode is zero) of fallocate() allocates
@@ -720,7 +757,7 @@ Allocating disk space
720757
as a method of optimally implementing that function.
721758

722759
However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
723-
fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
760+
fallocate(fd, DEFAULT_MODE), it allocates on-disk block addressess having
724761
zero or random data, which is useful to the below scenario where:
725762

726763
1. create(fd)
@@ -739,7 +776,7 @@ Compression implementation
739776
cluster can be compressed or not.
740777

741778
- In cluster metadata layout, one special block address is used to indicate
742-
cluster is compressed one or normal one, for compressed cluster, following
779+
a cluster is a compressed one or normal one; for compressed cluster, following
743780
metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
744781
stores data including compress header and compressed data.
745782

@@ -772,3 +809,18 @@ Compress metadata layout::
772809
+-------------+-------------+----------+----------------------------+
773810
| data length | data chksum | reserved | compressed data |
774811
+-------------+-------------+----------+----------------------------+
812+
813+
NVMe Zoned Namespace devices
814+
----------------------------
815+
816+
- ZNS defines a per-zone capacity which can be equal or less than the
817+
zone-size. Zone-capacity is the number of usable blocks in the zone.
818+
F2FS checks if zone-capacity is less than zone-size, if it is, then any
819+
segment which starts after the zone-capacity is marked as not-free in
820+
the free segment bitmap at initial mount time. These segments are marked
821+
as permanently used so they are not allocated for writes and
822+
consequently are not needed to be garbage collected. In case the
823+
zone-capacity is not aligned to default segment size(2MB), then a segment
824+
can start before the zone-capacity and span across zone-capacity boundary.
825+
Such spanning segments are also considered as usable segments. All blocks
826+
past the zone-capacity are considered unusable in these segments.

fs/f2fs/acl.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ static void *f2fs_acl_to_disk(struct f2fs_sb_info *sbi,
160160
return (void *)f2fs_acl;
161161

162162
fail:
163-
kvfree(f2fs_acl);
163+
kfree(f2fs_acl);
164164
return ERR_PTR(-EINVAL);
165165
}
166166

@@ -190,7 +190,7 @@ static struct posix_acl *__f2fs_get_acl(struct inode *inode, int type,
190190
acl = NULL;
191191
else
192192
acl = ERR_PTR(retval);
193-
kvfree(value);
193+
kfree(value);
194194

195195
return acl;
196196
}
@@ -240,7 +240,7 @@ static int __f2fs_set_acl(struct inode *inode, int type,
240240

241241
error = f2fs_setxattr(inode, name_index, "", value, size, ipage, 0);
242242

243-
kvfree(value);
243+
kfree(value);
244244
if (!error)
245245
set_cached_acl(inode, type, acl);
246246

fs/f2fs/checkpoint.c

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
107107
return __get_meta_page(sbi, index, true);
108108
}
109109

110-
struct page *f2fs_get_meta_page_nofail(struct f2fs_sb_info *sbi, pgoff_t index)
110+
struct page *f2fs_get_meta_page_retry(struct f2fs_sb_info *sbi, pgoff_t index)
111111
{
112112
struct page *page;
113113
int count = 0;
@@ -243,6 +243,8 @@ int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
243243
blkno * NAT_ENTRY_PER_BLOCK);
244244
break;
245245
case META_SIT:
246+
if (unlikely(blkno >= TOTAL_SEGS(sbi)))
247+
goto out;
246248
/* get sit block addr */
247249
fio.new_blkaddr = current_sit_addr(sbi,
248250
blkno * SIT_ENTRY_PER_BLOCK);
@@ -1047,8 +1049,12 @@ int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type)
10471049
get_pages(sbi, is_dir ?
10481050
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA));
10491051
retry:
1050-
if (unlikely(f2fs_cp_error(sbi)))
1052+
if (unlikely(f2fs_cp_error(sbi))) {
1053+
trace_f2fs_sync_dirty_inodes_exit(sbi->sb, is_dir,
1054+
get_pages(sbi, is_dir ?
1055+
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA));
10511056
return -EIO;
1057+
}
10521058

10531059
spin_lock(&sbi->inode_lock[type]);
10541060

@@ -1619,11 +1625,16 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
16191625

16201626
f2fs_flush_sit_entries(sbi, cpc);
16211627

1628+
/* save inmem log status */
1629+
f2fs_save_inmem_curseg(sbi);
1630+
16221631
err = do_checkpoint(sbi, cpc);
16231632
if (err)
16241633
f2fs_release_discard_addrs(sbi);
16251634
else
16261635
f2fs_clear_prefree_segments(sbi, cpc);
1636+
1637+
f2fs_restore_inmem_curseg(sbi);
16271638
stop:
16281639
unblock_operations(sbi);
16291640
stat_inc_cp_count(sbi->stat_info);
@@ -1654,7 +1665,7 @@ void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi)
16541665
}
16551666

16561667
sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS -
1657-
NR_CURSEG_TYPE - __cp_payload(sbi)) *
1668+
NR_CURSEG_PERSIST_TYPE - __cp_payload(sbi)) *
16581669
F2FS_ORPHANS_PER_BLOCK;
16591670
}
16601671

0 commit comments

Comments
 (0)