You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users may wish get the status of the available allocatable/usable memory. Since this is a transient information based on the overall state of the device usage, user would need to invoke the extension to obtain the information at each point of interest. This extension provides extended information about the usable memory size available as part of the device. The extension introduces the ${x}_device_usablemem_size_ext_properties_t struct which can be passed to $xDeviceGetProperties via the `pNext` member of $x_device_properties_t.
34
+
Users may wish get the status of the available allocatable/usable memory. Since this is a transient information based on the overall state of the device usage, user would need to invoke the extension to obtain the information at each point of interest. This extension provides extended information about the usable memory size available as part of the device. The extension introduces the ${x}_device_usablemem_size_ext_properties_t struct which can be passed to ${x}DeviceGetProperties via the `pNext` member of ${x}_device_properties_t.
35
35
36
-
The following psuedo-code demonstrates a sequence for obtaining extended information about the usable memory size
36
+
The following psuedo-code demonstrates a sequence for obtaining extended information about the usable memory size:
37
37
38
38
.. parsed-literal::
39
39
@@ -49,9 +49,8 @@ The following psuedo-code demonstrates a sequence for obtaining extended informa
User can adjust Event synchronization modes by passing ${x}_event_sync_mode_desc_t struct as pNext during Event creation.
1339
1339
1340
1340
Low power wait
1341
-
^^^^^^^^^^^^^^^^^^
1341
+
^^^^^^^^^^^^^^^
1342
1342
1343
-
When ${X}_EVENT_SYNC_MODE_FLAG_LOW_POWER_WAIT flag is enabled, driver will optimize Event host synchronization calls like ${x}EventHostSynchronize to use CPU threads more efficiently. For example, instead of active polling on memory location, it may use OS methods to sleep CPU thread.
1343
+
When ${X}_EVENT_SYNC_MODE_FLAG_LOW_POWER_WAIT flag is enabled, driver will optimize Event host synchronization calls like ${x}EventHostSynchronize to use CPU threads more efficiently. For example, instead of active polling on memory location, it may use OS methods to sleep CPU thread.
1344
1344
Changing this mode may impact completion latency.
1345
1345
1346
1346
Interrups
1347
-
^^^^^^^^^^^^^^^^^^^^^
1347
+
^^^^^^^^^^
1348
1348
1349
-
When ${X}_EVENT_SYNC_MODE_FLAG_SIGNAL_INTERRUPT flag is enabled, driver may program additional GPU commands related to signaling Event on the Device. Those commands will generate system interrupt.
1350
-
Interrupt may be used as additional signal to wake up CPU thread that is waiting for Event completion in low power mode.
1349
+
When ${X}_EVENT_SYNC_MODE_FLAG_SIGNAL_INTERRUPT flag is enabled, driver may program additional GPU commands related to signaling Event on the Device. Those commands will generate system interrupt.
1350
+
Interrupt may be used as additional signal to wake up CPU thread that is waiting for Event completion in low power mode.
1351
1351
Driver may select which API calls are applicable for generating interrupts.
1352
1352
1353
-
Additionally, user may provide external interrupt id (${X}_EVENT_SYNC_MODE_FLAG_EXTERNAL_INTERRUPT_WAIT). OS methods will be used for Event host synchronization calls, to optimize waiting for completion. Similar to low power mode.
1353
+
Additionally, user may provide external interrupt id (${X}_EVENT_SYNC_MODE_FLAG_EXTERNAL_INTERRUPT_WAIT). OS methods will be used for Event host synchronization calls, to optimize waiting for completion. Similar to low power mode.
1354
1354
It can be used only with Counter Based Events.
1355
1355
1356
1356
.. _counter-based-events:
1357
+
1357
1358
Counter Based Events
1358
-
~~~~~~~~~~~~~~~~~~~~~~~
1359
+
~~~~~~~~~~~~~~~~~~~~
1359
1360
1360
-
This type of event, referred to as a Counter Based (CB) Event, does not require an event pool, as the related allocations are managed internally by the driver. This reduces the overhead on the host for managing pool allocations.
1361
-
The CB Event can only be signaled on the GPU using an in-order command list.
1361
+
This type of event, referred to as a Counter Based (CB) Event, does not require an event pool, as the related allocations are managed internally by the driver. This reduces the overhead on the host for managing pool allocations.
1362
+
The CB Event can only be signaled on the GPU using an in-order command list.
1362
1363
1363
1364
Every in-order command list has an internal submission counter that is updated with each append call. This counter manages internal in-order dependencies. The next append call waits for that counter implicitly.
1364
-
Note that some operations may be optimized, and the counter value may not directly correspond to the number of append calls.
1365
+
Note that some operations may be optimized, and the counter value may not directly correspond to the number of append calls.
1365
1366
1366
-
When a CB Event is passed as a signal event, it points to a specific counter value and memory location. Since the command list manages the counter allocation, this method avoids producing additional GPU memory operations (except timestamps). As a result, users do not need to explicitly control event completion before reusing it.
1367
+
When a CB Event is passed as a signal event, it points to a specific counter value and memory location. Since the command list manages the counter allocation, this method avoids producing additional GPU memory operations (except timestamps). As a result, users do not need to explicitly control event completion before reusing it.
1367
1368
1368
1369
Key features
1369
1370
^^^^^^^^^^^^^^^^^^^^^
@@ -1390,32 +1391,31 @@ Regular Event rely on memory state controlled by the user (explicit Reset calls)
// cmdList2 can be still running on GPU. It waits for counter=X on memory CL1_alloc.
1394
+
// cmdList2 can be still running on GPU. It waits for counter=X on memory CL1_alloc.
1394
1395
// Its also safe to delete Event object.
1395
1396
1396
1397
${x}EventHostSynchronize(event1, UINT32_MAX); // wait for counter=Y on memory CL3_alloc
1397
1398
1398
1399
IPC sharing
1399
-
^^^^^^^^^^^^^^^^^^^^^
1400
+
^^^^^^^^^^^
1400
1401
As mentioned previously, signaling CB Event replaces its state. This is why IPC sharing is one-directional. Opened event can be used only for waiting/querying (on host and GPU).
1401
1402
1402
-
Both Event object (original and shared) are independent. There is no need to wait for completion before reusing.
1403
-
Second process points to the original state until ${x}EventCounterBasedCloseIpcHandle is called.
1404
-
Original Event state may be changed without waiting for completion. Second process is not affected.
1403
+
Both Event object (original and shared) are independent. There is no need to wait for completion before reusing.
1404
+
Second process points to the original state until ${x}EventCounterBasedCloseIpcHandle is called.
1405
+
Original Event state may be changed without waiting for completion. Second process is not affected.
1405
1406
1406
1407
Counter Based Event has dedicated API calls to handle IPC operations:${x}EventCounterBasedGetIpcHandle, ${x}EventCounterBasedOpenIpcHandle, ${x}EventCounterBasedCloseIpcHandle
1407
1408
1408
1409
**Timestamps are not allowed for IPC sharing.**
1409
1410
1410
1411
Obtaining counter memory and value
1411
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1412
-
User may obtain counter memory location and value using ${x}EventCounterBasedGetDeviceAddress. For example, waiting for completion outside the L0 Driver.
1413
-
If Event state is replaced by new append call or ${x}CommandQueueExecuteCommandLists that signals such Event, below API must be called again to obtain new data.
1412
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1413
+
User may obtain counter memory location and value using ${x}EventCounterBasedGetDeviceAddress. For example, waiting for completion outside the L0 Driver. If Event state is replaced by new append call or ${x}CommandQueueExecuteCommandLists that signals such Event, below API must be called again to obtain new data.
1414
1414
1415
1415
Multi directional dependencies on Regular command lists
Regular command list with overlapping dependencies may be executed multiple times. For example, two command lists are executed in parallel with bi-directional dependencies.
1418
-
Its important to understand counter (Event) state transition, to correctly reflect users intention.
Regular command list with overlapping dependencies may be executed multiple times. For example, two command lists are executed in parallel with bi-directional dependencies.
1418
+
Its important to understand counter (Event) state transition, to correctly reflect users intention.
1419
1419
1420
1420
1421
1421
.. parsed-literal::
@@ -1425,8 +1425,8 @@ Its important to understand counter (Event) state transition, to correctly refle
1425
1425
V |
1426
1426
regularCmdList2: (wait for A) -------------> (B) -----> (D)
1427
1427
1428
-
In this example, all Events are synchronized to "ready" state after the first execution.
1429
-
It means that second execution of `regularCmdList1` waits again for "ready" `{1->2->3}` state of `regularCmdList2` (first execution) instead of `{4->5->6}`.
1428
+
In this example, all Events are synchronized to "ready" state after the first execution.
1429
+
It means that second execution of `regularCmdList1` waits again for "ready" `{1->2->3}` state of `regularCmdList2` (first execution) instead of `{4->5->6}`.
1430
1430
This is because `regularCmdList2` was not yet executed for the second time. And their counters were not updated.
1431
1431
1432
1432
First execution:
@@ -1452,13 +1452,13 @@ Second execution:
1452
1452
1453
1453
Different approach:
1454
1454
1455
-
To avoid above situation, user must remove all bi-directional dependencies. By using single command list (if possible) or split the workload into different command lists with single-directional dependencies.
1455
+
To avoid above situation, user must remove all bi-directional dependencies. By using single command list (if possible) or split the workload into different command lists with single-directional dependencies.
1456
1456
1457
1457
Using Counter Based Events for such scenarios is not always the most optimal usage mode. It may be better to use Regular Events with explicit Reset calls.
1458
1458
1459
1459
External synchronization allocation
1460
1460
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1461
-
User may optionally specify externally managed counter allocation and value. This can be done by passing ${x}_event_counter_based_external_sync_allocation_desc_t as extension of ${x}_event_counter_based_desc_t
1461
+
User may optionally specify externally managed counter allocation and value. This can be done by passing ${x}_event_counter_based_external_sync_allocation_desc_t as extension of ${x}_event_counter_based_desc_t
1462
1462
1463
1463
Requirements:
1464
1464
@@ -1472,7 +1472,7 @@ Requirements:
1472
1472
External aggregate storage
1473
1473
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1474
1474
1475
-
Aggregated storage event is a special use case for CB Events. It can be signaled from multiple append calls, but waiting requires only one memory compare operation.
1475
+
Aggregated storage event is a special use case for CB Events. It can be signaled from multiple append calls, but waiting requires only one memory compare operation.
1476
1476
It can be created by passing ${x}_event_counter_based_external_aggregate_storage_desc_t as extension of ${x}_event_counter_based_desc_t.
0 commit comments