Skip to content

Commit 2bf3fae

Browse files
committed
Merge branch 'master' of github.com:python-streamz/streamz into 287
2 parents 37caec9 + 9c8d3bb commit 2bf3fae

18 files changed

Lines changed: 1021 additions & 130 deletions

.coveragerc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ omit =
1212

1313
exclude_lines =
1414
if __name__ == '__main__':
15+
pragma: no cover

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ log
1212
*.swo
1313
.cache/
1414
.ipynb_checkpoints/
15+
.vscode

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ language: python
77

88
matrix:
99
include:
10-
- python: 3.6
10+
- python: 3.7
1111

1212
env:
1313
- STREAMZ_LAUNCH_KAFKA=true

README.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
Streamz
22
=======
33

4-
|Build Status| |Doc Status| |Version Status|
4+
|Build Status| |Doc Status| |Version Status| |RAPIDS custreamz gpuCI|
55

66
Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, and so on.
77

8-
Optionally, Streamz can also work with Pandas dataframes to provide sensible streaming operations on continuous tabular data.
8+
Optionally, Streamz can also work with both `Pandas <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_ and `cuDF <https://docs.rapids.ai/api/cudf/stable/>`_ dataframes, to provide sensible streaming operations on continuous tabular data.
99

1010
To learn more about how to use Streamz see documentation at `streamz.readthedocs.org <https://streamz.readthedocs.org>`_.
1111

@@ -21,3 +21,5 @@ BSD-3 Clause
2121
:alt: Documentation Status
2222
.. |Version Status| image:: https://img.shields.io/pypi/v/streamz.svg
2323
:target: https://pypi.python.org/pypi/streamz/
24+
.. |RAPIDS custreamz gpuCI| image:: https://img.shields.io/badge/gpuCI-custreamz-green
25+
:target: https://github.com/jdye64/cudf/blob/kratos/python/custreamz/custreamz/kafka.py

docs/source/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ Definitions
9595

9696
.. autofunction:: filenames
9797
.. autofunction:: from_kafka
98+
.. autofunction:: from_kafka_batched
9899
.. autofunction:: from_textfile
99100

100101
.. currentmodule:: streamz.dask

docs/source/core.rst

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,8 @@ Branching and Joining
156156
zip_latest
157157

158158
You can branch multiple streams off of a single stream. Elements that go into
159-
the input will pass through to both output streams.
159+
the input will pass through to both output streams. Note: ``graphviz`` and
160+
``networkx`` need to be installed to visualize the stream graph.
160161

161162
.. code-block:: python
162163
@@ -377,3 +378,19 @@ For operations like this Streamz adds virtually no overhead.
377378
378379
Streams provides higher level APIs for situations just like this one. You may
379380
want to read further about :doc:`collections <collections>`
381+
382+
383+
Metadata
384+
--------
385+
386+
Metadata can be emitted into the pipeline to accompany the data as a list of dictionaries. Most functions will pass the metadata to the downstream function without making any changes. However, functions that make the pipeline asynchronous require logic that dictates how and when the metadata will be passed downstream. Synchronous functions and asynchronous functions that have a 1:1 ratio of the number of values on the input to the number of values on the output will emit the metadata collection without any modification. However, functions that have multiple input streams or emit collections of data will emit the metadata associated with the emitted data as a collection.
387+
388+
389+
Reference Counting and Checkpointing
390+
------------------------------------
391+
392+
Checkpointing is achieved in Streamz through the use of reference counting. With this method, a checkpoint can be saved when and only when data has progressed through all of the the pipeline without any issues. This prevents data loss and guarantees at-least-once semantics.
393+
394+
Any node that caches or holds data after it returns increments the reference counter associated with the given data by one. When a node is no longer holding the data, it will release it by decrementing the counter by one. When the counter changes to zero, a callback associated with the data is triggered.
395+
396+
References are passed in the metadata as a value of the `ref` keyword. Each metadata object contains only one reference counter object.

docs/source/dataframes.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,8 @@ following:
103103
104104
def on_old(self, state, old):
105105
total, count = state
106-
total = total - new.sum() # switch + for - here
107-
count = count - new.count() # switch + for - here
106+
total = total - old.sum() # switch + for - here
107+
count = count - old.count() # switch + for - here
108108
new_state = (total, count)
109109
new_value = total / count
110110
return new_state, new_value

docs/source/index.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,7 @@ Streamz helps you build pipelines to manage continuous streams of data. It is
55
simple to use in simple cases, but also supports complex pipelines that involve
66
branching, joining, flow control, feedback, back pressure, and so on.
77

8-
Optionally, Streamz can also work with Pandas dataframes to provide sensible
9-
streaming operations on continuous tabular data.
8+
Optionally, Streamz can also work with both `Pandas <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_ and `cuDF <https://docs.rapids.ai/api/cudf/stable/>`_ dataframes, to provide sensible streaming operations on continuous tabular data.
109

1110
To learn more about how to use streams, visit :doc:`Core documentation <core>`.
1211

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010

1111
setup(name='streamz',
12-
version='0.5.2',
12+
version='0.5.3',
1313
description='Streams',
1414
url='http://github.com/python-streamz/streamz/',
1515
maintainer='Matthew Rocklin',

streamz/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@
88
except ImportError:
99
pass
1010

11-
__version__ = '0.5.2'
11+
__version__ = '0.5.3'

0 commit comments

Comments
 (0)