Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Commit 430d377

Browse files
authored
Update README.rst
Added description for Documentation Generation and coding guidelines for Intel SDC docstrings
1 parent f405f15 commit 430d377

1 file changed

Lines changed: 163 additions & 5 deletions

File tree

README.rst

Lines changed: 163 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ Intel® Scalable Dataframe Compiler
1414
:target: https://coveralls.io/github/IntelPython/sdc?branch=master
1515
:alt: Coveralls
1616

17-
Extension for Numba for Pandas compilation
18-
###########################################
17+
Numba* Extension For Pandas* Operations Compilation
18+
###################################################
1919

2020
Intel® Scalable Dataframe Compiler (Intel® SDC), which is an extension of `Numba* <https://numba.pydata.org/>`_
2121
that enables compilation of `Pandas* <https://pandas.pydata.org/>`_ operations. It automatically vectorizes and parallelizes
@@ -71,7 +71,7 @@ These academic papers describe the underlying methods in Intel SDC:
7171

7272

7373
Building Intel® SDC from Source on Linux
74-
----------------------------------
74+
----------------------------------------
7575

7676
We use `Anaconda <https://www.anaconda.com/download/>`_ distribution of
7777
Python for setting up Intel SDC build environment.
@@ -87,7 +87,7 @@ It is possible to build Intel SDC via conda-build or setuptools. Follow one of t
8787
cases below to install Intel SDC and its dependencies on Linux.
8888

8989
Building on Linux with conda-build
90-
~~~~~~~~~~~~~~~~~~~~~~~~~
90+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9191
::
9292

9393
PYVER=<3.6 or 3.7>
@@ -113,7 +113,7 @@ Building on Linux with setuptools
113113
In case of issues, reinstalling in a new conda environment is recommended.
114114

115115
Building Intel® SDC from Source on Windows
116-
------------------------------------
116+
------------------------------------------
117117

118118
Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):
119119

@@ -161,9 +161,167 @@ Troubleshooting Windows Build
161161
and add a string value named ``14.0`` whose data is ``C:\Program Files (x86)\Microsoft Visual Studio 14.0\``.
162162
* Sometimes if the conda version or visual studio version being used are not latest then building Intel SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.
163163

164+
165+
Building documentation
166+
----------------------
167+
Building Intel SDC User's Guide documentation requires pre-installed Intel SDC package along with compatible Pandas* version as well as Sphinx* 2.2.1 or later.
168+
169+
You can install Sphinx* using either ``conda`` or ``pip``:
170+
::
171+
172+
conda install sphinx
173+
pip install sphinx
174+
175+
Currently the build precedure is based on ``make`` located at ``./sdc/docs/`` folder. While it is not generally required we recommended that you clean up the system from previous documentaiton build by running
176+
::
177+
178+
make clean
179+
180+
To build HTML documentation you will need to run
181+
::
182+
183+
make html
184+
185+
The built documentation will be located in the ``.sdc/docs/build/html`` directory. To preview the documentation open ``index.html``
186+
file.
187+
188+
Sphinx* Generation Internals
189+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190+
The documentation generation is controlled by ``conf.py`` script automatically invoked by Sphinx.
191+
See `Sphinx documentation <http://www.sphinx-doc.org/en/master/usage/configuration.html>`_ for details.
192+
193+
The API Reference for Intel SDC User's Guide is auto-generated by inspecting ``pandas`` and ``sdc`` modules. That's why these modules must be pre-installed for documentation generation using Sphinx*. However, there is a possibility to skip API Reference auto-generation by setting environment variable ``SDC_DOC_NO_API_REF_STR=1``.
194+
195+
If the environment variable ``SDC_DOC_NO_API_REF_STR`` is unset then Sphinx's ``conf.py`` invokes ``generate_api_reference()`` function located in ``./sdc/docs/source/buildscripts/apiref_generator`` module. This function parses ``pandas`` and ``sdc`` docstrings for each API, combines those into single docstring and writes it into RST file with respective Pandas* API name. The auto-generated RST files are
196+
located at ``./sdc/docs/source/_api_ref`` directory.
197+
198+
.. note:
199+
Sphinx will automatically clean the ``_api_ref`` directory on the next invocation of the documenation build.
200+
201+
Intel SDC docstring decoration rules
202+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203+
Since SDC API Reference is auto-generated from respective Pandas* and Intel SDC docstrings there are certain rules that must be
204+
followed to accurately generate the API description.
205+
206+
1. Every SDC API must have the docstring.
207+
If developer does not provide the docstring then Sphinx will not be able to match Pandas docstring with respective SDC one. In this situation Sphinx assumes that SDC does not support such API and will include respective note in the API Reference that
208+
**This API is currently unsupported**.
209+
210+
2. Follow 'one function - one docstring' rule.
211+
You cannot have one docstring for multiple APIs, even if those are very similar. Auto-generator assumes every SDC API is covered by
212+
respective docstring. If Sphinx does not find the docstring for particular API then it assumes that SDC does not support such API
213+
and will include respective note in the API Reference that **This API is currently unsupported**.
214+
215+
3. Description (introductory section, the very first few paragraphs without a title) is taken from Pandas*.
216+
Intel SDC developers should not include API description in SDC docstring.
217+
But developers are encouraged to follow Pandas API description naming conventions
218+
so that the combined docstring appears consistent.
219+
220+
4. Parameters, Returns, and Raises sections' description is taken from Pandas* docstring.
221+
SDC developers should not include such descriptions in their SDC docstrings.
222+
Rather developers are encouraged to follow Pandas naming conventions
223+
so that the combined docstring appears consistent.
224+
225+
5. Every SDC docstring must be of the follwing structure:
226+
::
227+
228+
"""
229+
Intel Scalable Dataframe Compiler User Guide
230+
********************************************
231+
Pandas API: <full pandas name, e.g. pandas.Series.nlargest>
232+
233+
<Intel SDC specific sections>
234+
235+
Intel Scalable Dataframe Compiler Developer Guide
236+
*************************************************
237+
<Developer's Guide specific sections>
238+
"""
239+
240+
The first two lines must be the User Guide header. This is an indication to Sphinx that this section is intended for public API
241+
and it will be combined with repsective Pandas API docstring.
242+
243+
Line 3 must specify what Pandas API this Intel SDC docstring does correspond to. It must start with ``Pandas API:`` followed by
244+
full Pandas API name that corresponds to this SDC docstring. Remember to include full name, for example, ``nlargest`` is not
245+
sufficient for auto-generator to perform the match. The full name must be ``pandas.Series.nlargest``.
246+
247+
After User Guide sections in the docstring there can be another header indicating that the remaining part of the docstring belongs to
248+
Developer's Guide and must not be included into User's Guide.
249+
250+
6. Examples, See Also, References sections are **NOT** taken from Pandas docstring. SDC developers are expected to complete these sections in SDC doctrings.
251+
This is so because respective Pandas sections are sometimes too Pandas specific and are not relevant to SDC. SDC developers have to
252+
rewrite those sections in Intel SDC style. Do not forget about User Guide header and Pandas API name prior to adding SDC specific
253+
sections.
254+
255+
7. Examples section is mandatory for every SDC API. 'One API - at least one example' rule is applied.
256+
Examples are essential part of user experience and must accompany every API docstring.
257+
258+
8. Embed examples into Examples section from ``./sdc/examples``.
259+
Rather than writing example in the docstring (which is error-prone) embed relevant example scripts into the docstring. For example,
260+
here is an example how to embed example for ``pandas.Series.get()`` function into respective Intel SDC docstring:
261+
262+
::
263+
264+
"""
265+
...
266+
Examples
267+
--------
268+
.. literalinclude:: ../../../examples/series_getitem.py
269+
:language: python
270+
:lines: 27-
271+
:caption: Getting Pandas Series elements
272+
:name: ex_series_getitem
273+
274+
.. code-block:: console
275+
276+
> python ./series_getitem.py
277+
55
278+
279+
In the above snapshot the script ``series_getitem.py`` is embedded into the docstring. ``:lines: 27-`` allows to skip lengthy
280+
copyright header of the file. ``:caption:`` provides meaningful description of the example. It is a good tone to have the caption
281+
for every example. ``:name:`` is the Sphinx name that allows referencing example from other parts of the documentation. It is a good
282+
tone to include this field. Please follow the naming convention ``ex_<example file name>`` for consistency.
283+
284+
Accompany every example with the expected output using ``.. code-block:: console`` decorator.
285+
286+
287+
**Every Examples section must come with one or more examples illustrating all major variations of supported API parameter combinations. It is highly recommended to illustrate SDC API limitations (e.g. unsupported parameters) in example script comments.**
288+
289+
9. See Also sections are highly encouraged.
290+
This is a good practice to include relevant references into the See Also section. Embedding references which are not directly
291+
related to the topic may be distructing if those appear across API description. A good style is to have a dedicated section for
292+
relevant topics.
293+
294+
See Also section may include references to relevant SDC and Pandas as well as to external topics.
295+
296+
A special form of See Also section is References to publications. Pandas documentation sometimes uses References section to refer to
297+
external projects. While it is not prohibited to use References section in SDC docstrings, it is better to combine all references
298+
under See Also umbrella.
299+
300+
10. Notes and Warnings must be decorated with ``.. note::`` and ``.. warning::`` respectively.
301+
Do not use
302+
::
303+
Notes
304+
-----
305+
306+
Warning
307+
-------
308+
309+
Pay attention to indentation and required blank lines. Sphinx is very sensitive to that.
310+
311+
11. If SDC API does not support all variations of respective Pandas API then Limitations section is mandatory.
312+
While there is not specific guideline how Limitations section must be written, a good style is to follow Pandas Parameters section
313+
description style and naming conventions.
314+
315+
12. Before committing your code for public SDC API you are expected to:
316+
317+
- have SDC docstring implemented;
318+
- have respective SDC examples implemented and tested
319+
- API Reference documentation generated and visually inspected. New warnings in the documentation build are not allowed.
320+
164321
Running unit tests
165322
------------------
166323
::
167324

168325
python sdc/tests/gen_test_data.py
169326
python -m unittest
327+

0 commit comments

Comments
 (0)