Merge pull request #362 from jbednar/docfixes

martindurant · web-flow · commit fc3db09d45ad · 2020-09-24T12:42:11.000-04:00
Minor fixes to docs, plus adding a plotting section
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -48,7 +48,7 @@
 
 # General information about the project.
 project = 'Streamz'
-copyright = '2017, Matthew Rocklin'
+copyright = '2017-2020, Matthew Rocklin'
 author = 'Matthew Rocklin'
 
 # The version info for the project you're documenting, acts as replacement for
@@ -160,7 +160,7 @@
 #  dir menu entry, description, category)
 texinfo_documents = [
     (master_doc, 'Streamz', 'Streamz Documentation',
-     author, 'Streamz', 'One line description of project.',
+     author, 'Streamz', 'Support for pipelines managing continuous streams of data.',
      'Miscellaneous'),
 ]
 
diff --git a/docs/source/core.rst b/docs/source/core.rst
@@ -15,7 +15,9 @@ Map, emit, and sink
    map
    sink
 
-You can create a basic pipeline by instantiating the ``Streamz`` object and then using methods like ``map``, ``accumulate``, and ``sink``.
+You can create a basic pipeline by instantiating the ``Streamz``
+object and then using methods like ``map``, ``accumulate``, and
+``sink``.
 
 .. code-block:: python
 
@@ -27,7 +29,10 @@ You can create a basic pipeline by instantiating the ``Streamz`` object and then
    source = Stream()
    source.map(increment).sink(print)
 
-The ``map`` and ``sink`` methods both take a function and apply that function to every element in the stream.  The ``map`` method returns a new stream with the modified elements while ``sink`` is typically used at the end of a stream for final actions.
+The ``map`` and ``sink`` methods both take a function and apply that
+function to every element in the stream.  The ``map`` method returns a
+new stream with the modified elements while ``sink`` is typically used
+at the end of a stream for final actions.
 
 To push data through our pipeline we call ``emit``
 
@@ -383,14 +388,33 @@ want to read further about :doc:`collections <collections>`
 Metadata
 --------
 
-Metadata can be emitted into the pipeline to accompany the data as a list of dictionaries. Most functions will pass the metadata to the downstream function without making any changes. However, functions that make the pipeline asynchronous require logic that dictates how and when the metadata will be passed downstream. Synchronous functions and asynchronous functions that have a 1:1 ratio of the number of values on the input to the number of values on the output will emit the metadata collection without any modification. However, functions that have multiple input streams or emit collections of data will emit the metadata associated with the emitted data as a collection.
+Metadata can be emitted into the pipeline to accompany the data as a
+list of dictionaries. Most functions will pass the metadata to the
+downstream function without making any changes. However, functions
+that make the pipeline asynchronous require logic that dictates how
+and when the metadata will be passed downstream. Synchronous functions
+and asynchronous functions that have a 1:1 ratio of the number of
+values on the input to the number of values on the output will emit
+the metadata collection without any modification. However, functions
+that have multiple input streams or emit collections of data will emit
+the metadata associated with the emitted data as a collection.
 
 
 Reference Counting and Checkpointing
 ------------------------------------
 
-Checkpointing is achieved in Streamz through the use of reference counting. With this method, a checkpoint can be saved when and only when data has progressed through all of the the pipeline without any issues. This prevents data loss and guarantees at-least-once semantics.
-
-Any node that caches or holds data after it returns increments the reference counter associated with the given data by one. When a node is no longer holding the data, it will release it by decrementing the counter by one. When the counter changes to zero, a callback associated with the data is triggered.
-
-References are passed in the metadata as a value of the `ref` keyword. Each metadata object contains only one reference counter object.
+Checkpointing is achieved in Streamz through the use of reference
+counting. With this method, a checkpoint can be saved when and only
+when data has progressed through all of the the pipeline without any
+issues. This prevents data loss and guarantees at-least-once
+semantics.
+
+Any node that caches or holds data after it returns increments the
+reference counter associated with the given data by one. When a node
+is no longer holding the data, it will release it by decrementing the
+counter by one. When the counter changes to zero, a callback
+associated with the data is triggered.
+
+References are passed in the metadata as a value of the `ref`
+keyword. Each metadata object contains only one reference counter
+object.
diff --git a/docs/source/dask.rst b/docs/source/dask.rst
@@ -36,7 +36,7 @@ Then start a local Dask cluster
    from dask.distributed import Client
    client = Client()
 
-This operates on a local processes or threads.  If you have Bokeh installed
+This operates on local processes or threads.  If you have Bokeh installed
 then this will also start a diagnostics web server at
 http://localhost:8787/status which you may want to open to get a real-time view
 of execution.
@@ -49,7 +49,7 @@ Sequential Execution
    map
    sink
 
-Before we build a parallel stream, lets build a sequential stream that maps a
+Before we build a parallel stream, let's build a sequential stream that maps a
 simple function across data, and then prints those results.  We use the core
 ``Stream`` object.
 
@@ -69,7 +69,7 @@ simple function across data, and then prints those results.  We use the core
    for i in range(10):
        source.emit(i)
 
-This should take ten seconds we call the ``inc`` function ten times
+This should take ten seconds because we call the ``inc`` function ten times
 sequentially.
 
 Parallel Execution
@@ -101,7 +101,7 @@ You may want to look at http://localhost:8787/status during execution to get a
 sense of the parallel execution.
 
 This should have run much more quickly depending on how many cores you have on
-your machine.  We added a few extra nodes to our stream, lets look at what they
+your machine.  We added a few extra nodes to our stream; let's look at what they
 did.
 
 -   ``scatter``: Converted our Stream into a DaskStream.  The elements that we
@@ -123,17 +123,20 @@ Gotchas
 +++++++
 
 
-An important gotcha with ``DaskStream`` is that it is a subclass ``Stream``, and so can be used as an input 
-to any function expecting a ``Stream``. If there is no intervening ``.gather()``, then the downstream node will
-receive Dask futures instead of the data they represent::
+An important gotcha with ``DaskStream`` is that it is a subclass of
+``Stream``, and so can be used as an input to any function expecting a
+``Stream``. If there is no intervening ``.gather()``, then the
+downstream node will receive Dask futures instead of the data they
+represent::
 
     source = Stream()
     source2 = Stream()
     a = source.scatter().map(inc)
     b = source2.combine_latest(a)
 
-In this case, the combine operation will get real values from ``source2``, and Dask futures. 
-Downstream nodes would be free to operate on the futures, but more likely, the line should be::
+In this case, the combine operation will get real values from
+``source2``, and Dask futures.  Downstream nodes would be free to
+operate on the futures, but more likely, the line should be::
 
     b = source2.combine_latest(a.gather())
 
diff --git a/docs/source/gpu-dataframes.rst b/docs/source/gpu-dataframes.rst
@@ -1,13 +1,15 @@
-Streaming GPU DataFrames(cudf)
-------------------------------
+Streaming GPU DataFrames (cudf)
+-------------------------------
 
-The ``streamz.dataframe`` module provides DataFrame-like interface on streaming
-data as described in ``dataframes`` documentation. It provides support for dataframe
-like libraries such as pandas and cudf. This documentation is specific to streaming GPU
-dataframes(cudf).
+The ``streamz.dataframe`` module provides a DataFrame-like interface
+on streaming data as described in the ``dataframes`` documentation. It
+provides support for dataframe-like libraries such as pandas and
+cudf. This documentation is specific to streaming GPU dataframes using
+cudf.
 
-The example in the ``dataframes`` documentation is rewritten below using cudf dataframes
-just by replacing ``pandas`` module with ``cudf``:
+The example in the ``dataframes`` documentation is rewritten below
+using cudf dataframes just by replacing the ``pandas`` module with
+``cudf``:
 
 .. code-block:: python
 
@@ -23,19 +25,21 @@ just by replacing ``pandas`` module with ``cudf``:
 Supported Operations
 --------------------
 
-Streaming cudf dataframes support the following classes of operations
+Streaming cudf dataframes support the following classes of operations:
 
 -  Elementwise operations like ``df.x + 1``
 -  Filtering like ``df[df.name == 'Alice']``
 -  Column addition like ``df['z'] = df.x + df.y``
 -  Reductions like ``df.amount.mean()``
 -  Windowed aggregations (fixed length) like ``df.window(n=100).amount.sum()``
 
-The following operations are not supported with cudf(as of version 0.8) yet
+The following operations are not yet supported with cudf (as of version 0.8):
+
 -  Groupby-aggregations like ``df.groupby(df.name).amount.mean()``
 -  Windowed aggregations (index valued) like ``df.window(value='2h').amount.sum()``
 -  Windowed groupby aggregations like ``df.window(value='2h').groupby('name').amount.sum()``
 
 
-Window based Aggregations with cudf are supported just as explained in ``dataframes`` documentation.
-The support for groupby operations will be added in future.
+Window-based Aggregations with cudf are supported just as explained in
+the ``dataframes`` documentation.  Support for groupby operations is
+expected to be added in the future.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -111,8 +111,10 @@ data streaming systems like `Apache Flink <https://flink.apache.org/>`_,
 
    core.rst
    dataframes.rst
+   gpu-dataframes.rst
    dask.rst
    collections.rst
    api.rst
    collections-api.rst
    async.rst
+   plotting.rst
diff --git a/docs/source/plotting.rst b/docs/source/plotting.rst
@@ -0,0 +1,61 @@
+Visualizing streamz
+===================
+
+A variety of tools are available to help you understand, debug,
+visualize your streaming objects:
+
+- Most Streamz objects automatically display themselves in Jupyter
+  notebooks, periodically updating their visual representation as text
+  or tables by registering events with the Tornado IOLoop used by Jupyter
+- The network graph underlying a stream can be visualized using `dot` to
+  render a PNG using `Stream.visualize(filename)`
+- Streaming data can be visualized using the optional separate packages
+  hvPlot, HoloViews, and Panel (see below)
+
+
+hvplot.streamz
+--------------
+
+hvPlot is a separate plotting library providing Bokeh-based plots for
+Pandas dataframes and a variety of other object types, including
+streamz DataFrame and Series objects.
+
+See `hvplot.holoviz.org <https://hvplot.holoviz.org>`_ for
+instructions on how to install hvplot.  Once it is installed, you can
+use the Pandas .plot() API to get a dynamically updating plot in
+Jupyter or in Bokeh/Panel Server:
+
+.. code-block:: python
+
+   import hvplot.streamz
+   from streamz.dataframe import Random
+   
+   df = Random()
+   df.hvplot(backlog=100)
+
+See the `streaming section
+<https://hvplot.holoviz.org/user_guide/Streaming.html>`_ of the hvPlot
+user guide for more details, and the `dataframes.ipynb` example that
+comes with streamz for a simple runnable example.
+
+
+HoloViews
+---------
+
+hvPlot is built on HoloViews, and you can also use HoloViews directly
+if you want more control over events and how they are processed.  See
+the `HoloViews user guide
+<http://holoviews.org/user_guide/Streaming_Data.html>`_ for more
+details.
+
+
+Panel
+-----
+
+Panel is a general purpose dashboard and app framework, supporting a
+wide variety of displayable objects as "Panes". Panel provides a
+`streamz Pane
+<https://panel.holoviz.org/reference/panes/Streamz.html>`_ for
+rendering arbitrary streamz objects, and streamz DataFrames are
+handled by the Panel `DataFrame Pane
+<https://panel.holoviz.org/reference/panes/DataFrame.html>`_.