Commit 4cfc9ec
mm/page_alloc: Optimize free_contig_range()
Decompose the range of order-0 pages to be freed into the set of largest
possible power-of-2 size and aligned chunks and free them to the pcp or
buddy. This improves on the previous approach which freed each order-0
page individually in a loop. Testing shows performance to be improved by
more than 10x in some cases.
Since each page is order-0, we must decrement each page's reference
count individually and only consider the page for freeing as part of a
high order chunk if the reference count goes to zero. Additionally
free_pages_prepare() must be called for each individual order-0 page
too, so that the struct page state and global accounting state can be
appropriately managed. But once this is done, the resulting high order
chunks can be freed as a unit to the pcp or buddy.
This significantly speeds up the free operation but also has the side
benefit that high order blocks are added to the pcp instead of each page
ending up on the pcp order-0 list; memory remains more readily available
in high orders.
vmalloc will shortly become a user of this new optimized
free_contig_range() since it aggressively allocates high order
non-compound pages, but then calls split_page() to end up with
contiguous order-0 pages. These can now be freed much more efficiently.
The execution time of the following function was measured in a server
class arm64 machine:
static int page_alloc_high_order_test(void)
{
unsigned int order = HPAGE_PMD_ORDER;
struct page *page;
int i;
for (i = 0; i < 100000; i++) {
page = alloc_pages(GFP_KERNEL, order);
if (!page)
return -1;
split_page(page, order);
free_contig_range(page_to_pfn(page), 1UL << order);
}
return 0;
}
Execution time before: 4097358 usec
Execution time after: 729831 usec
Perf trace before:
99.63% 0.00% kthreadd [kernel.kallsyms] [.] kthread
|
---kthread
0xffffb33c12a26af8
|
|--98.13%--0xffffb33c12a26060
| |
| |--97.37%--free_contig_range
| | |
| | |--94.93%--___free_pages
| | | |
| | | |--55.42%--__free_frozen_pages
| | | | |
| | | | --43.20%--free_frozen_page_commit
| | | | |
| | | | --35.37%--_raw_spin_unlock_irqrestore
| | | |
| | | |--11.53%--_raw_spin_trylock
| | | |
| | | |--8.19%--__preempt_count_dec_and_test
| | | |
| | | |--5.64%--_raw_spin_unlock
| | | |
| | | |--2.37%--__get_pfnblock_flags_mask.isra.0
| | | |
| | | --1.07%--free_frozen_page_commit
| | |
| | --1.54%--__free_frozen_pages
| |
| --0.77%--___free_pages
|
--0.98%--0xffffb33c12a26078
alloc_pages_noprof
Perf trace after:
8.42% 2.90% kthreadd [kernel.kallsyms] [k] __free_contig_range
|
|--5.52%--__free_contig_range
| |
| |--5.00%--free_prepared_contig_range
| | |
| | |--1.43%--__free_frozen_pages
| | | |
| | | --0.51%--free_frozen_page_commit
| | |
| | |--1.08%--_raw_spin_trylock
| | |
| | --0.89%--_raw_spin_unlock
| |
| --0.52%--free_pages_prepare
|
--2.90%--ret_from_fork
kthread
0xffffae1c12abeaf8
0xffffae1c12abe7a0
|
--2.69%--vfree
__free_contig_range
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Muhammad Usama Anjum <usama.anjum@arm.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com>1 parent d1344ad commit 4cfc9ec
2 files changed
Lines changed: 110 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
467 | 467 | | |
468 | 468 | | |
469 | 469 | | |
| 470 | + | |
| 471 | + | |
470 | 472 | | |
471 | 473 | | |
472 | 474 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
93 | 96 | | |
94 | 97 | | |
95 | 98 | | |
| |||
1339 | 1342 | | |
1340 | 1343 | | |
1341 | 1344 | | |
1342 | | - | |
1343 | | - | |
| 1345 | + | |
| 1346 | + | |
1344 | 1347 | | |
1345 | 1348 | | |
1346 | 1349 | | |
1347 | 1350 | | |
1348 | 1351 | | |
1349 | 1352 | | |
1350 | 1353 | | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
1351 | 1357 | | |
1352 | 1358 | | |
1353 | 1359 | | |
| |||
6824 | 6830 | | |
6825 | 6831 | | |
6826 | 6832 | | |
| 6833 | + | |
| 6834 | + | |
| 6835 | + | |
| 6836 | + | |
| 6837 | + | |
| 6838 | + | |
| 6839 | + | |
| 6840 | + | |
| 6841 | + | |
| 6842 | + | |
| 6843 | + | |
| 6844 | + | |
| 6845 | + | |
| 6846 | + | |
| 6847 | + | |
| 6848 | + | |
| 6849 | + | |
| 6850 | + | |
| 6851 | + | |
| 6852 | + | |
| 6853 | + | |
| 6854 | + | |
| 6855 | + | |
| 6856 | + | |
| 6857 | + | |
| 6858 | + | |
| 6859 | + | |
| 6860 | + | |
| 6861 | + | |
| 6862 | + | |
| 6863 | + | |
| 6864 | + | |
| 6865 | + | |
| 6866 | + | |
| 6867 | + | |
| 6868 | + | |
| 6869 | + | |
| 6870 | + | |
| 6871 | + | |
| 6872 | + | |
| 6873 | + | |
| 6874 | + | |
| 6875 | + | |
| 6876 | + | |
| 6877 | + | |
| 6878 | + | |
| 6879 | + | |
| 6880 | + | |
| 6881 | + | |
| 6882 | + | |
| 6883 | + | |
| 6884 | + | |
| 6885 | + | |
| 6886 | + | |
| 6887 | + | |
| 6888 | + | |
| 6889 | + | |
| 6890 | + | |
| 6891 | + | |
| 6892 | + | |
| 6893 | + | |
| 6894 | + | |
| 6895 | + | |
| 6896 | + | |
| 6897 | + | |
| 6898 | + | |
| 6899 | + | |
| 6900 | + | |
| 6901 | + | |
| 6902 | + | |
| 6903 | + | |
| 6904 | + | |
| 6905 | + | |
| 6906 | + | |
| 6907 | + | |
| 6908 | + | |
| 6909 | + | |
| 6910 | + | |
| 6911 | + | |
| 6912 | + | |
| 6913 | + | |
| 6914 | + | |
| 6915 | + | |
| 6916 | + | |
| 6917 | + | |
| 6918 | + | |
| 6919 | + | |
| 6920 | + | |
| 6921 | + | |
| 6922 | + | |
| 6923 | + | |
| 6924 | + | |
| 6925 | + | |
| 6926 | + | |
| 6927 | + | |
| 6928 | + | |
| 6929 | + | |
| 6930 | + | |
| 6931 | + | |
6827 | 6932 | | |
6828 | 6933 | | |
6829 | 6934 | | |
| |||
7370 | 7475 | | |
7371 | 7476 | | |
7372 | 7477 | | |
7373 | | - | |
7374 | | - | |
| 7478 | + | |
7375 | 7479 | | |
7376 | 7480 | | |
7377 | 7481 | | |
| |||
0 commit comments