|
| 1 | +.. SPDX-License-Identifier: GPL-2.0-or-later |
| 2 | +
|
| 3 | +Ground Rules |
| 4 | +============ |
| 5 | + |
| 6 | +Do not work around kernel bugs |
| 7 | +------------------------------ |
| 8 | + |
| 9 | +We have decided that we will not work around bugs in upstream LTP sources. If a |
| 10 | +test fails on your system for a good reason, e.g. patch wasn't backported and |
| 11 | +the bug is present, work around for this will not be accepted upstream. The |
| 12 | +main reason for this decision is that this masks the failure for everyone else. |
| 13 | + |
| 14 | + |
| 15 | +Do not synchronize by sleep |
| 16 | +--------------------------- |
| 17 | + |
| 18 | +Why is sleep in tests bad then? |
| 19 | +``````````````````````````````` |
| 20 | + |
| 21 | +The first problem is that it will likely introduce very rare test failures, |
| 22 | +that means somebody has to spend time looking into these, which is a wasted |
| 23 | +effort. Nobody likes tests that will fail rarely for no good reason. Even more |
| 24 | +so you cannot run such tests with a background load to ensure that everything |
| 25 | +works correctly on a busy system, because that will increase the likehood of a |
| 26 | +failure. |
| 27 | + |
| 28 | +The second problem is that this wastes resources and slows down a test run. If |
| 29 | +you think that adding a sleep to a test is not a big deal, lets have a look at |
| 30 | +the bigger perspective. There are about 1600 syscall tests in Linux Test |
| 31 | +Project, if 7.5% of them would sleep just for one second, we would end up with |
| 32 | +two minutes of wasted time per testrun. In practice most of the tests, that |
| 33 | +historically misused sleep for synchronization, waited for much longer just to |
| 34 | +be sure that things will works even on slower hardware. With sleeps between 2 |
| 35 | +and 5 seconds that puts us somewhere between 4 and 10 minutes which is between |
| 36 | +13% and 33% of the syscall runtime on an outdated thinkpad, where the run |
| 37 | +finishes in a bit less than half an hour. It's even worse on newer hardware, |
| 38 | +because this slowdown will not change no matter how fast your machine is, which |
| 39 | +is maybe the reason why this was acceptable twenty years ago but it's not now. |
| 40 | + |
| 41 | + |
| 42 | +What to do instead? |
| 43 | +``````````````````` |
| 44 | + |
| 45 | +Use proper synchronization. |
| 46 | + |
| 47 | +There are different problems and different solutions. Most often test needs to |
| 48 | +synchronize between child and parent process. |
| 49 | + |
| 50 | +The easiest case is when parent needs to wait for a child to finish, that can |
| 51 | +be fixed just be adding a :manpage:`waitpid(2)` in the parent which ensures that child |
| 52 | +has finished before parent runs. |
| 53 | + |
| 54 | +Commonly child has to execute certain piece of code before parent can continue. |
| 55 | +For that LTP library implements checkpoints with simple |
| 56 | +:c:func:`TST_CHECKPOINT_WAIT()` and :c:func:`TST_CHECKPOINT_WAKE()` functions based |
| 57 | +on futexes on a piece of shared memory set up by the test library. |
| 58 | + |
| 59 | +Another common case is when a child must sleep in a syscall before parent can |
| 60 | +continue, for which we have a :c:func:`TST_PROCESS_STATE_WAIT()` helper that |
| 61 | +polls `/proc/$PID/stat`. |
| 62 | + |
| 63 | +Less often test needs to wait for an action that is done asynchronously, or for |
| 64 | +a kernel resource deallocation that is deferred to a later time. In such cases |
| 65 | +the best we can do is to poll. In LTP we ended up with a macro that polls by |
| 66 | +calling a piece of code in a loop with exponentially increasing sleeps between |
| 67 | +retries until it succeeds. Which means that instead of sleeping for a maximal |
| 68 | +time event can possibly take the sleep is capped by twice of the optimal |
| 69 | +sleeping time while we avoid polling too aggressively. |
| 70 | + |
| 71 | + |
| 72 | +Use runtime checks for kernel features |
| 73 | +-------------------------------------- |
| 74 | + |
| 75 | +What is and what isn't supported by kernel is determined by the version and |
| 76 | +configuration of the kernel the system is currently running on. That |
| 77 | +especially means that any checks done during the compilation cannot be used to |
| 78 | +assume features supported by the kernel the tests end up running on. The |
| 79 | +compile time checks, done by :master:`configure.ac` script, are only useful for |
| 80 | +enabling fallback kernel API definitions when missing, as we do in |
| 81 | +:master:`include/lapi/` directory. |
| 82 | + |
| 83 | + |
| 84 | +Don't require root unless it's essential |
| 85 | +---------------------------------------- |
| 86 | + |
| 87 | +If root/caps are needed, say why in the test doc comment. Drop privileges for |
| 88 | +the part that doesn't need them and avoid running the whole test as root |
| 89 | +“because it's easier”. |
| 90 | + |
| 91 | + |
| 92 | +Always clean up, even on failure |
| 93 | +-------------------------------- |
| 94 | + |
| 95 | +Every test should leave the system as it found it: unmount, restore sysctls, |
| 96 | +delete temp files/dirs, kill spawned processes, remove cgroups/namespaces, |
| 97 | +detach loop devices, restore ulimits, etc. Cleanup must run on early-exit |
| 98 | +paths too. |
| 99 | + |
| 100 | +The test library can simplify cleanup greatly as there are various helpers such as: |
| 101 | + |
| 102 | +- :c:type:`.needs_tmpdir = 1 <tst_test>` that creates and deletes a temporary directory for the test |
| 103 | +- :c:type:`.save_restore = 1 <tst_test>` that saves and restores /sys/ and /proc/ files |
| 104 | +- :c:type:`.needs_device = 1 <tst_test>` sets up and tears down a block device for the test |
| 105 | +- :c:type:`.restore_wallclock = 1 <tst_test>` that restores wall clock after the test |
| 106 | +- :c:type:`.needs_cgroup_ctrls = 1 <tst_test>` sets up and cleans up cgroups for the test |
| 107 | +- And many more. |
| 108 | + |
| 109 | + |
| 110 | +Write portable code |
| 111 | +------------------- |
| 112 | + |
| 113 | +Avoid nonstandard libc APIs when a portable equivalent exists; don't assume |
| 114 | +64-bit, page size, endianness, or particular tool versions. |
| 115 | + |
| 116 | +If the test is specific to a certain architecture, make sure that it at least |
| 117 | +compiles at the rest of architectures and set the |
| 118 | +:c:type:`.supported_archs = const char *const []){"s390x", ..., NULL} <tst_test>`. |
| 119 | + |
| 120 | +This also applies to shell code where it's easy to use bash features that are |
| 121 | +not available on other shell implementations, e.g. dash or busybox. Make sure |
| 122 | +to stick to portable POSIX shell whenever possible. |
| 123 | + |
| 124 | +You can check for common mistakes, not only in portability, with our |
| 125 | +``make check`` tooling. |
| 126 | + |
| 127 | + |
| 128 | +Split changed into well defined chunks |
| 129 | +-------------------------------------- |
| 130 | + |
| 131 | +When submitting patches make sure to split the work into small well-defined |
| 132 | +chunks. Patches that touch many files or mix unrelated changes and features are |
| 133 | +difficult to review and are likely to be delayed or even ignored. |
| 134 | + |
| 135 | +Aim for a single logical change per patch. Split more complex works into a |
| 136 | +patch series where each patch: |
| 137 | + |
| 138 | + - builds/compiles successfully |
| 139 | + - keeps tests and tooling functional |
| 140 | + - does not introduce intermediate breakage |
| 141 | + - has a clear commit message to explain the change |
| 142 | + - significant changes need to be detailed in the cover letter |
| 143 | + |
| 144 | + |
| 145 | +Be careful when using AI tools |
| 146 | +------------------------------ |
| 147 | + |
| 148 | +AI tools can be useful for executing, summarizing, or suggesting approaches, |
| 149 | +but they can also be confidently wrong and give an illusion of correctness. |
| 150 | +Treat AI output as untrusted: verify claims against the code, documentation, |
| 151 | +and actual behavior on a reproducer. |
| 152 | + |
| 153 | +Do not send AI-generated changes as raw patches. AI-generated diffs often |
| 154 | +contain irrelevant churn, incorrect assumptions, inconsistent style, or subtle |
| 155 | +bugs, which creates additional burden for maintainers to review and fix. |
| 156 | + |
| 157 | +Best practice is to write your own patches and have them reviewed by AI before |
| 158 | +submitting them, which helps add beneficial improvements to your work. |
| 159 | + |
| 160 | + |
| 161 | +Kernel features and RCs |
| 162 | +----------------------- |
| 163 | + |
| 164 | +LTP tests or fixes for kernel changes that have not yet been released may be |
| 165 | +posted to the LTP list for a review but they will not be be accepted until |
| 166 | +respective kernel changes are released. Review of such changes is also |
| 167 | +considered to be lower priority than rest of the changes. This is because |
| 168 | +kernel changes especially in the early RC phase are volatile and could be |
| 169 | +changed or reverted. |
| 170 | + |
| 171 | +These patches should also add a [STAGING] keyword into the patch subject, e.g. |
| 172 | +"Subject: [PATCH v1][STAGING] fanotify: add test for <feature> (requires v6.19-rc3)" |
| 173 | + |
| 174 | +In a case that a test for unreleased kernel is really needed to be merged we do |
| 175 | +not add it to the list of test executed by default and keep it in |
| 176 | +:master:`runtest/staging` file until the kernel code is finalized. |
0 commit comments