You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/core.rst
+32-8Lines changed: 32 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,9 @@ Map, emit, and sink
15
15
map
16
16
sink
17
17
18
-
You can create a basic pipeline by instantiating the ``Streamz`` object and then using methods like ``map``, ``accumulate``, and ``sink``.
18
+
You can create a basic pipeline by instantiating the ``Streamz``
19
+
object and then using methods like ``map``, ``accumulate``, and
20
+
``sink``.
19
21
20
22
.. code-block:: python
21
23
@@ -27,7 +29,10 @@ You can create a basic pipeline by instantiating the ``Streamz`` object and then
27
29
source = Stream()
28
30
source.map(increment).sink(print)
29
31
30
-
The ``map`` and ``sink`` methods both take a function and apply that function to every element in the stream. The ``map`` method returns a new stream with the modified elements while ``sink`` is typically used at the end of a stream for final actions.
32
+
The ``map`` and ``sink`` methods both take a function and apply that
33
+
function to every element in the stream. The ``map`` method returns a
34
+
new stream with the modified elements while ``sink`` is typically used
35
+
at the end of a stream for final actions.
31
36
32
37
To push data through our pipeline we call ``emit``
33
38
@@ -383,14 +388,33 @@ want to read further about :doc:`collections <collections>`
383
388
Metadata
384
389
--------
385
390
386
-
Metadata can be emitted into the pipeline to accompany the data as a list of dictionaries. Most functions will pass the metadata to the downstream function without making any changes. However, functions that make the pipeline asynchronous require logic that dictates how and when the metadata will be passed downstream. Synchronous functions and asynchronous functions that have a 1:1 ratio of the number of values on the input to the number of values on the output will emit the metadata collection without any modification. However, functions that have multiple input streams or emit collections of data will emit the metadata associated with the emitted data as a collection.
391
+
Metadata can be emitted into the pipeline to accompany the data as a
392
+
list of dictionaries. Most functions will pass the metadata to the
393
+
downstream function without making any changes. However, functions
394
+
that make the pipeline asynchronous require logic that dictates how
395
+
and when the metadata will be passed downstream. Synchronous functions
396
+
and asynchronous functions that have a 1:1 ratio of the number of
397
+
values on the input to the number of values on the output will emit
398
+
the metadata collection without any modification. However, functions
399
+
that have multiple input streams or emit collections of data will emit
400
+
the metadata associated with the emitted data as a collection.
387
401
388
402
389
403
Reference Counting and Checkpointing
390
404
------------------------------------
391
405
392
-
Checkpointing is achieved in Streamz through the use of reference counting. With this method, a checkpoint can be saved when and only when data has progressed through all of the the pipeline without any issues. This prevents data loss and guarantees at-least-once semantics.
393
-
394
-
Any node that caches or holds data after it returns increments the reference counter associated with the given data by one. When a node is no longer holding the data, it will release it by decrementing the counter by one. When the counter changes to zero, a callback associated with the data is triggered.
395
-
396
-
References are passed in the metadata as a value of the `ref` keyword. Each metadata object contains only one reference counter object.
406
+
Checkpointing is achieved in Streamz through the use of reference
407
+
counting. With this method, a checkpoint can be saved when and only
408
+
when data has progressed through all of the the pipeline without any
409
+
issues. This prevents data loss and guarantees at-least-once
410
+
semantics.
411
+
412
+
Any node that caches or holds data after it returns increments the
413
+
reference counter associated with the given data by one. When a node
414
+
is no longer holding the data, it will release it by decrementing the
415
+
counter by one. When the counter changes to zero, a callback
416
+
associated with the data is triggered.
417
+
418
+
References are passed in the metadata as a value of the `ref`
419
+
keyword. Each metadata object contains only one reference counter
0 commit comments