Skip to content

Commit 36b915c

Browse files
committed
doc: Add ground rules page
This is a continued effort to write down the unwritten rules we have in the project. Feel free to suggest more topics for the page. Reviewed-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Li Wang <liwang@redhat.com> Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
1 parent 1990fe5 commit 36b915c

2 files changed

Lines changed: 177 additions & 0 deletions

File tree

doc/developers/ground_rules.rst

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
.. SPDX-License-Identifier: GPL-2.0-or-later
2+
3+
Ground Rules
4+
============
5+
6+
Do not work around kernel bugs
7+
------------------------------
8+
9+
We have decided that we will not work around bugs in upstream LTP sources. If a
10+
test fails on your system for a good reason, e.g. patch wasn't backported and
11+
the bug is present, work around for this will not be accepted upstream. The
12+
main reason for this decision is that this masks the failure for everyone else.
13+
14+
15+
Do not synchronize by sleep
16+
---------------------------
17+
18+
Why is sleep in tests bad then?
19+
```````````````````````````````
20+
21+
The first problem is that it will likely introduce very rare test failures,
22+
that means somebody has to spend time looking into these, which is a wasted
23+
effort. Nobody likes tests that will fail rarely for no good reason. Even more
24+
so you cannot run such tests with a background load to ensure that everything
25+
works correctly on a busy system, because that will increase the likehood of a
26+
failure.
27+
28+
The second problem is that this wastes resources and slows down a test run. If
29+
you think that adding a sleep to a test is not a big deal, lets have a look at
30+
the bigger perspective. There are about 1600 syscall tests in Linux Test
31+
Project, if 7.5% of them would sleep just for one second, we would end up with
32+
two minutes of wasted time per testrun. In practice most of the tests, that
33+
historically misused sleep for synchronization, waited for much longer just to
34+
be sure that things will works even on slower hardware. With sleeps between 2
35+
and 5 seconds that puts us somewhere between 4 and 10 minutes which is between
36+
13% and 33% of the syscall runtime on an outdated thinkpad, where the run
37+
finishes in a bit less than half an hour. It's even worse on newer hardware,
38+
because this slowdown will not change no matter how fast your machine is, which
39+
is maybe the reason why this was acceptable twenty years ago but it's not now.
40+
41+
42+
What to do instead?
43+
```````````````````
44+
45+
Use proper synchronization.
46+
47+
There are different problems and different solutions. Most often test needs to
48+
synchronize between child and parent process.
49+
50+
The easiest case is when parent needs to wait for a child to finish, that can
51+
be fixed just be adding a :manpage:`waitpid(2)` in the parent which ensures that child
52+
has finished before parent runs.
53+
54+
Commonly child has to execute certain piece of code before parent can continue.
55+
For that LTP library implements checkpoints with simple
56+
:c:func:`TST_CHECKPOINT_WAIT()` and :c:func:`TST_CHECKPOINT_WAKE()` functions based
57+
on futexes on a piece of shared memory set up by the test library.
58+
59+
Another common case is when a child must sleep in a syscall before parent can
60+
continue, for which we have a :c:func:`TST_PROCESS_STATE_WAIT()` helper that
61+
polls `/proc/$PID/stat`.
62+
63+
Less often test needs to wait for an action that is done asynchronously, or for
64+
a kernel resource deallocation that is deferred to a later time. In such cases
65+
the best we can do is to poll. In LTP we ended up with a macro that polls by
66+
calling a piece of code in a loop with exponentially increasing sleeps between
67+
retries until it succeeds. Which means that instead of sleeping for a maximal
68+
time event can possibly take the sleep is capped by twice of the optimal
69+
sleeping time while we avoid polling too aggressively.
70+
71+
72+
Use runtime checks for kernel features
73+
--------------------------------------
74+
75+
What is and what isn't supported by kernel is determined by the version and
76+
configuration of the kernel the system is currently running on. That
77+
especially means that any checks done during the compilation cannot be used to
78+
assume features supported by the kernel the tests end up running on. The
79+
compile time checks, done by :master:`configure.ac` script, are only useful for
80+
enabling fallback kernel API definitions when missing, as we do in
81+
:master:`include/lapi/` directory.
82+
83+
84+
Don't require root unless it's essential
85+
----------------------------------------
86+
87+
If root/caps are needed, say why in the test doc comment. Drop privileges for
88+
the part that doesn't need them and avoid running the whole test as root
89+
“because it's easier”.
90+
91+
92+
Always clean up, even on failure
93+
--------------------------------
94+
95+
Every test should leave the system as it found it: unmount, restore sysctls,
96+
delete temp files/dirs, kill spawned processes, remove cgroups/namespaces,
97+
detach loop devices, restore ulimits, etc. Cleanup must run on early-exit
98+
paths too.
99+
100+
The test library can simplify cleanup greatly as there are various helpers such as:
101+
102+
- :c:type:`.needs_tmpdir = 1 <tst_test>` that creates and deletes a temporary directory for the test
103+
- :c:type:`.save_restore = 1 <tst_test>` that saves and restores /sys/ and /proc/ files
104+
- :c:type:`.needs_device = 1 <tst_test>` sets up and tears down a block device for the test
105+
- :c:type:`.restore_wallclock = 1 <tst_test>` that restores wall clock after the test
106+
- :c:type:`.needs_cgroup_ctrls = 1 <tst_test>` sets up and cleans up cgroups for the test
107+
- And many more.
108+
109+
110+
Write portable code
111+
-------------------
112+
113+
Avoid nonstandard libc APIs when a portable equivalent exists; don't assume
114+
64-bit, page size, endianness, or particular tool versions.
115+
116+
If the test is specific to a certain architecture, make sure that it at least
117+
compiles at the rest of architectures and set the
118+
:c:type:`.supported_archs = const char *const []){"s390x", ..., NULL} <tst_test>`.
119+
120+
This also applies to shell code where it's easy to use bash features that are
121+
not available on other shell implementations, e.g. dash or busybox. Make sure
122+
to stick to portable POSIX shell whenever possible.
123+
124+
You can check for common mistakes, not only in portability, with our
125+
``make check`` tooling.
126+
127+
128+
Split changed into well defined chunks
129+
--------------------------------------
130+
131+
When submitting patches make sure to split the work into small well-defined
132+
chunks. Patches that touch many files or mix unrelated changes and features are
133+
difficult to review and are likely to be delayed or even ignored.
134+
135+
Aim for a single logical change per patch. Split more complex works into a
136+
patch series where each patch:
137+
138+
- builds/compiles successfully
139+
- keeps tests and tooling functional
140+
- does not introduce intermediate breakage
141+
- has a clear commit message to explain the change
142+
- significant changes need to be detailed in the cover letter
143+
144+
145+
Be careful when using AI tools
146+
------------------------------
147+
148+
AI tools can be useful for executing, summarizing, or suggesting approaches,
149+
but they can also be confidently wrong and give an illusion of correctness.
150+
Treat AI output as untrusted: verify claims against the code, documentation,
151+
and actual behavior on a reproducer.
152+
153+
Do not send AI-generated changes as raw patches. AI-generated diffs often
154+
contain irrelevant churn, incorrect assumptions, inconsistent style, or subtle
155+
bugs, which creates additional burden for maintainers to review and fix.
156+
157+
Best practice is to write your own patches and have them reviewed by AI before
158+
submitting them, which helps add beneficial improvements to your work.
159+
160+
161+
Kernel features and RCs
162+
-----------------------
163+
164+
LTP tests or fixes for kernel changes that have not yet been released may be
165+
posted to the LTP list for a review but they will not be be accepted until
166+
respective kernel changes are released. Review of such changes is also
167+
considered to be lower priority than rest of the changes. This is because
168+
kernel changes especially in the early RC phase are volatile and could be
169+
changed or reverted.
170+
171+
These patches should also add a [STAGING] keyword into the patch subject, e.g.
172+
"Subject: [PATCH v1][STAGING] fanotify: add test for <feature> (requires v6.19-rc3)"
173+
174+
In a case that a test for unreleased kernel is really needed to be merged we do
175+
not add it to the list of test executed by default and keep it in
176+
:master:`runtest/staging` file until the kernel code is finalized.

doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
:hidden:
2020
:caption: For developers
2121

22+
developers/ground_rules
2223
developers/setup_mailinglist
2324
developers/writing_tests
2425
developers/test_case_tutorial

0 commit comments

Comments
 (0)