|
| 1 | +=============================================== |
| 2 | +Memory Tagging Extension (MTE) in AArch64 Linux |
| 3 | +=============================================== |
| 4 | + |
| 5 | +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> |
| 6 | + Catalin Marinas <catalin.marinas@arm.com> |
| 7 | + |
| 8 | +Date: 2020-02-25 |
| 9 | + |
| 10 | +This document describes the provision of the Memory Tagging Extension |
| 11 | +functionality in AArch64 Linux. |
| 12 | + |
| 13 | +Introduction |
| 14 | +============ |
| 15 | + |
| 16 | +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) |
| 17 | +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI |
| 18 | +(Top Byte Ignore) feature and allows software to access a 4-bit |
| 19 | +allocation tag for each 16-byte granule in the physical address space. |
| 20 | +Such memory range must be mapped with the Normal-Tagged memory |
| 21 | +attribute. A logical tag is derived from bits 59-56 of the virtual |
| 22 | +address used for the memory access. A CPU with MTE enabled will compare |
| 23 | +the logical tag against the allocation tag and potentially raise an |
| 24 | +exception on mismatch, subject to system registers configuration. |
| 25 | + |
| 26 | +Userspace Support |
| 27 | +================= |
| 28 | + |
| 29 | +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is |
| 30 | +supported by the hardware, the kernel advertises the feature to |
| 31 | +userspace via ``HWCAP2_MTE``. |
| 32 | + |
| 33 | +PROT_MTE |
| 34 | +-------- |
| 35 | + |
| 36 | +To access the allocation tags, a user process must enable the Tagged |
| 37 | +memory attribute on an address range using a new ``prot`` flag for |
| 38 | +``mmap()`` and ``mprotect()``: |
| 39 | + |
| 40 | +``PROT_MTE`` - Pages allow access to the MTE allocation tags. |
| 41 | + |
| 42 | +The allocation tag is set to 0 when such pages are first mapped in the |
| 43 | +user address space and preserved on copy-on-write. ``MAP_SHARED`` is |
| 44 | +supported and the allocation tags can be shared between processes. |
| 45 | + |
| 46 | +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and |
| 47 | +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other |
| 48 | +types of mapping will result in ``-EINVAL`` returned by these system |
| 49 | +calls. |
| 50 | + |
| 51 | +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot |
| 52 | +be cleared by ``mprotect()``. |
| 53 | + |
| 54 | +**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and |
| 55 | +``MADV_FREE`` may have the allocation tags cleared (set to 0) at any |
| 56 | +point after the system call. |
| 57 | + |
| 58 | +Tag Check Faults |
| 59 | +---------------- |
| 60 | + |
| 61 | +When ``PROT_MTE`` is enabled on an address range and a mismatch between |
| 62 | +the logical and allocation tags occurs on access, there are three |
| 63 | +configurable behaviours: |
| 64 | + |
| 65 | +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the |
| 66 | + tag check fault. |
| 67 | + |
| 68 | +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with |
| 69 | + ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The |
| 70 | + memory access is not performed. If ``SIGSEGV`` is ignored or blocked |
| 71 | + by the offending thread, the containing process is terminated with a |
| 72 | + ``coredump``. |
| 73 | + |
| 74 | +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending |
| 75 | + thread, asynchronously following one or multiple tag check faults, |
| 76 | + with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting |
| 77 | + address is unknown). |
| 78 | + |
| 79 | +The user can select the above modes, per thread, using the |
| 80 | +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where |
| 81 | +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK`` |
| 82 | +bit-field: |
| 83 | + |
| 84 | +- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults |
| 85 | +- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode |
| 86 | +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode |
| 87 | + |
| 88 | +The current tag check fault mode can be read using the |
| 89 | +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. |
| 90 | + |
| 91 | +Tag checking can also be disabled for a user thread by setting the |
| 92 | +``PSTATE.TCO`` bit with ``MSR TCO, #1``. |
| 93 | + |
| 94 | +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, |
| 95 | +irrespective of the interrupted context. ``PSTATE.TCO`` is restored on |
| 96 | +``sigreturn()``. |
| 97 | + |
| 98 | +**Note**: There are no *match-all* logical tags available for user |
| 99 | +applications. |
| 100 | + |
| 101 | +**Note**: Kernel accesses to the user address space (e.g. ``read()`` |
| 102 | +system call) are not checked if the user thread tag checking mode is |
| 103 | +``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is |
| 104 | +``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user |
| 105 | +address accesses, however it cannot always guarantee it. |
| 106 | + |
| 107 | +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions |
| 108 | +----------------------------------------------------------------- |
| 109 | + |
| 110 | +The architecture allows excluding certain tags to be randomly generated |
| 111 | +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux |
| 112 | +excludes all tags other than 0. A user thread can enable specific tags |
| 113 | +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, |
| 114 | +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap |
| 115 | +in the ``PR_MTE_TAG_MASK`` bit-field. |
| 116 | + |
| 117 | +**Note**: The hardware uses an exclude mask but the ``prctl()`` |
| 118 | +interface provides an include mask. An include mask of ``0`` (exclusion |
| 119 | +mask ``0xffff``) results in the CPU always generating tag ``0``. |
| 120 | + |
| 121 | +Initial process state |
| 122 | +--------------------- |
| 123 | + |
| 124 | +On ``execve()``, the new process has the following configuration: |
| 125 | + |
| 126 | +- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) |
| 127 | +- Tag checking mode set to ``PR_MTE_TCF_NONE`` |
| 128 | +- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) |
| 129 | +- ``PSTATE.TCO`` set to 0 |
| 130 | +- ``PROT_MTE`` not set on any of the initial memory maps |
| 131 | + |
| 132 | +On ``fork()``, the new process inherits the parent's configuration and |
| 133 | +memory map attributes with the exception of the ``madvise()`` ranges |
| 134 | +with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set |
| 135 | +to 0). |
| 136 | + |
| 137 | +The ``ptrace()`` interface |
| 138 | +-------------------------- |
| 139 | + |
| 140 | +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read |
| 141 | +the tags from or set the tags to a tracee's address space. The |
| 142 | +``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, |
| 143 | +data)`` where: |
| 144 | + |
| 145 | +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``. |
| 146 | +- ``pid`` - the tracee's PID. |
| 147 | +- ``addr`` - address in the tracee's address space. |
| 148 | +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to |
| 149 | + a buffer of ``iov_len`` length in the tracer's address space. |
| 150 | + |
| 151 | +The tags in the tracer's ``iov_base`` buffer are represented as one |
| 152 | +4-bit tag per byte and correspond to a 16-byte MTE tag granule in the |
| 153 | +tracee's address space. |
| 154 | + |
| 155 | +**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel |
| 156 | +will use the corresponding aligned address. |
| 157 | + |
| 158 | +``ptrace()`` return value: |
| 159 | + |
| 160 | +- 0 - tags were copied, the tracer's ``iov_len`` was updated to the |
| 161 | + number of tags transferred. This may be smaller than the requested |
| 162 | + ``iov_len`` if the requested address range in the tracee's or the |
| 163 | + tracer's space cannot be accessed or does not have valid tags. |
| 164 | +- ``-EPERM`` - the specified process cannot be traced. |
| 165 | +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid |
| 166 | + address) and no tags copied. ``iov_len`` not updated. |
| 167 | +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` |
| 168 | + or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. |
| 169 | +- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never |
| 170 | + mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. |
| 171 | + |
| 172 | +**Note**: There are no transient errors for the requests above, so user |
| 173 | +programs should not retry in case of a non-zero system call return. |
| 174 | + |
| 175 | +``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == |
| 176 | +``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged |
| 177 | +address ABI control and MTE configuration of a process as per the |
| 178 | +``prctl()`` options described in |
| 179 | +Documentation/arm64/tagged-address-abi.rst and above. The corresponding |
| 180 | +``regset`` is 1 element of 8 bytes (``sizeof(long))``). |
| 181 | + |
| 182 | +Example of correct usage |
| 183 | +======================== |
| 184 | + |
| 185 | +*MTE Example code* |
| 186 | + |
| 187 | +.. code-block:: c |
| 188 | +
|
| 189 | + /* |
| 190 | + * To be compiled with -march=armv8.5-a+memtag |
| 191 | + */ |
| 192 | + #include <errno.h> |
| 193 | + #include <stdint.h> |
| 194 | + #include <stdio.h> |
| 195 | + #include <stdlib.h> |
| 196 | + #include <unistd.h> |
| 197 | + #include <sys/auxv.h> |
| 198 | + #include <sys/mman.h> |
| 199 | + #include <sys/prctl.h> |
| 200 | +
|
| 201 | + /* |
| 202 | + * From arch/arm64/include/uapi/asm/hwcap.h |
| 203 | + */ |
| 204 | + #define HWCAP2_MTE (1 << 18) |
| 205 | +
|
| 206 | + /* |
| 207 | + * From arch/arm64/include/uapi/asm/mman.h |
| 208 | + */ |
| 209 | + #define PROT_MTE 0x20 |
| 210 | +
|
| 211 | + /* |
| 212 | + * From include/uapi/linux/prctl.h |
| 213 | + */ |
| 214 | + #define PR_SET_TAGGED_ADDR_CTRL 55 |
| 215 | + #define PR_GET_TAGGED_ADDR_CTRL 56 |
| 216 | + # define PR_TAGGED_ADDR_ENABLE (1UL << 0) |
| 217 | + # define PR_MTE_TCF_SHIFT 1 |
| 218 | + # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) |
| 219 | + # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) |
| 220 | + # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) |
| 221 | + # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) |
| 222 | + # define PR_MTE_TAG_SHIFT 3 |
| 223 | + # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) |
| 224 | +
|
| 225 | + /* |
| 226 | + * Insert a random logical tag into the given pointer. |
| 227 | + */ |
| 228 | + #define insert_random_tag(ptr) ({ \ |
| 229 | + uint64_t __val; \ |
| 230 | + asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ |
| 231 | + __val; \ |
| 232 | + }) |
| 233 | +
|
| 234 | + /* |
| 235 | + * Set the allocation tag on the destination address. |
| 236 | + */ |
| 237 | + #define set_tag(tagged_addr) do { \ |
| 238 | + asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ |
| 239 | + } while (0) |
| 240 | +
|
| 241 | + int main() |
| 242 | + { |
| 243 | + unsigned char *a; |
| 244 | + unsigned long page_sz = sysconf(_SC_PAGESIZE); |
| 245 | + unsigned long hwcap2 = getauxval(AT_HWCAP2); |
| 246 | +
|
| 247 | + /* check if MTE is present */ |
| 248 | + if (!(hwcap2 & HWCAP2_MTE)) |
| 249 | + return EXIT_FAILURE; |
| 250 | +
|
| 251 | + /* |
| 252 | + * Enable the tagged address ABI, synchronous MTE tag check faults and |
| 253 | + * allow all non-zero tags in the randomly generated set. |
| 254 | + */ |
| 255 | + if (prctl(PR_SET_TAGGED_ADDR_CTRL, |
| 256 | + PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT), |
| 257 | + 0, 0, 0)) { |
| 258 | + perror("prctl() failed"); |
| 259 | + return EXIT_FAILURE; |
| 260 | + } |
| 261 | +
|
| 262 | + a = mmap(0, page_sz, PROT_READ | PROT_WRITE, |
| 263 | + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); |
| 264 | + if (a == MAP_FAILED) { |
| 265 | + perror("mmap() failed"); |
| 266 | + return EXIT_FAILURE; |
| 267 | + } |
| 268 | +
|
| 269 | + /* |
| 270 | + * Enable MTE on the above anonymous mmap. The flag could be passed |
| 271 | + * directly to mmap() and skip this step. |
| 272 | + */ |
| 273 | + if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { |
| 274 | + perror("mprotect() failed"); |
| 275 | + return EXIT_FAILURE; |
| 276 | + } |
| 277 | +
|
| 278 | + /* access with the default tag (0) */ |
| 279 | + a[0] = 1; |
| 280 | + a[1] = 2; |
| 281 | +
|
| 282 | + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); |
| 283 | +
|
| 284 | + /* set the logical and allocation tags */ |
| 285 | + a = (unsigned char *)insert_random_tag(a); |
| 286 | + set_tag(a); |
| 287 | +
|
| 288 | + printf("%p\n", a); |
| 289 | +
|
| 290 | + /* non-zero tag access */ |
| 291 | + a[0] = 3; |
| 292 | + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); |
| 293 | +
|
| 294 | + /* |
| 295 | + * If MTE is enabled correctly the next instruction will generate an |
| 296 | + * exception. |
| 297 | + */ |
| 298 | + printf("Expecting SIGSEGV...\n"); |
| 299 | + a[16] = 0xdd; |
| 300 | +
|
| 301 | + /* this should not be printed in the PR_MTE_TCF_SYNC mode */ |
| 302 | + printf("...haven't got one\n"); |
| 303 | +
|
| 304 | + return EXIT_FAILURE; |
| 305 | + } |
0 commit comments