44Compiling With Intel® SDC
55=========================
66
7- .. todo ::
8- Basic compilation controls. What can be compiled and what cannot. How to work around compilation issues.
9- References to relevant discussion in `Numba* `_. Specifics for Series, Dataframes, and other hpat specific
10- data structures
11-
127What if I get a compilation error
138~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149
1510There are a few reasons why Intel SDC cannot compile your code out-of-the-box.
16-
11+
17121. Intel SDC does support only a subset of `Pandas* `_ APIs.
18132. Intel SDC and `Numba* `_ can compile only a subset of Python data types.
19143. Intel SDC cannot infer the type of a variable at compile time.
@@ -22,38 +17,124 @@ Unsupported APIs
2217-----------------
2318
2419Intel® SDC is able to compile variety of the most typical workflows that involve `Pandas* `_ operations but not all.
25- Sometimes it means that your code cannot be compiled out-of-the-box.
26-
27- .. todo ::
28- Give an example here of unsupported `Pandas* `_ API that cannot be compiled as is, e.g. pd.read_excel
29-
30- .. todo ::
31- Give the list of recommendations how to work around such a situation,
32- e.g. getting the function out of jitted region, compilation with nopython=False,
33- using alternative APIs in `Pandas* `_ or `NumPy* `_. Each alternative needs to be illustrated by a code snippet
34-
35- .. todo ::
36- Provide the link to the API Reference section with the list of supported APIs and arguments
37-
20+ Sometimes it means that your code cannot be compiled out-of-the-box:
21+
22+ .. code-block ::
23+
24+ import numba
25+ import pandas
26+
27+ @numba.njit
28+ def read_df(filename):
29+ return pandas.read_excel(filename)
30+
31+ read_df("data.xlsx")
32+
33+ Output:
34+ ::
35+
36+ Traceback (most recent call last):
37+ ...
38+ numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
39+ Unknown attribute 'read_excel' of type Module(<module 'pandas' from ...)
40+
41+
42+ In such case you have the following options:
43+
44+ * Replace unsupported function with similar ones which are supported
45+ (e.g. use :ref: `pandas.read_csv <pandas.read_csv >` instead of :ref: `pandas.read_excel <pandas.read_csv >`):
46+
47+ .. code-block ::
48+
49+ import numba
50+ import pandas
51+
52+ @numba.njit
53+ def read_df():
54+ return pandas.read_csv("data.csv")
55+
56+ read_df()
57+
58+ * Use `Numba* `_ `objmode <https://numba.pydata.org/numba-doc/latest/user/withobjmode.html >`_:
59+
60+ .. code-block ::
61+
62+ import numba
63+ import pandas
64+
65+ @numba.njit
66+ def cummax():
67+ s = pandas.Series([0, 1, 0, 2, 0, 3, 0, 4])
68+
69+ with numba.objmode(r='intp[:]'):
70+ r = s.cummax().values
71+
72+ return pandas.Series(r)
73+
74+ Please note, that an array is returned from objmode. Returning Series or DataFrame from objmode is not a trivial task.
75+
76+ * Exclude such calls from jit region:
77+
78+ .. code-block ::
79+
80+ import numba
81+ import pandas
82+
83+ def cummax():
84+ @numba.njit
85+ def create_series():
86+ return pandas.Series([0, 1, 0, 2, 0, 3, 0, 4])
87+
88+ s = create_series()
89+
90+ return s.cummax()
91+
92+
93+ Please note that last two options would result in performing boxing/unboxing which could signifficantly affect performance.
94+
95+ For more details on performance see :ref: `Getting Performance With Intel® SDC <performance >`
96+
97+ For list of supported functions see :ref: `API Reference <apireference >`
98+
3899Unsupported Data Types
39100------------------------
40101
41102The other common reason why Intel® SDC or `Numba* `_ cannot compile the code is because it does not support
42- a certain data type. You can work this around by using an alternative data type.
103+ a certain data type. e.g. `Numba* `_ doesn't support heterogeneous lists and dicts:
104+
105+ .. code-block ::
106+
107+ a = [0, 2, 5, "a", "b"]
108+
109+ Literal heterogeneous lists usually could be replaced with tuples:
110+
111+ .. code-block ::
112+
113+ a = (0, 2, 5, "a", "b")
114+
115+ While heterogeneous dicts are not supported, it could be passed as parameter to :ref: `pandas.DataFrame <pandas.dataframe >`
116+ or :ref: `pandas.read_csv <pandas.read_csv >`:
117+
118+ .. code-block ::
119+
120+ data = {'A': np.ranf(10), 'B': np.ones(10)}
121+ df = pandas.DataFrame(data=data)
122+
123+
124+ Intel® SDC supports :ref: `pandas.Series <pandas.series >` only of boolean, integer, float and string types.
125+ Other types like Series of datetime or categorical are not supported.
126+
43127
44- .. todo ::
45- Give examples with dictionaries or datetime, show how one type can be replaced with another
46-
47128Type Inference And Type Stability
48129----------------------------------
49130
50131The last but certainly not the least why Intel® SDC cannot compile your code is because it cannot infer the type
51132at the time of compilation. The most frequent cause for that is the type instability.
52-
133+
53134The static compilation is a powerful technology to obtain high efficiency of a code but the flip side is the
54135compiler should be able to infer all variable types at the time of compilation and these types remain stable
55136within the region being compiled.
56-
137+
57138The following is an example of the type-unstable variable ``a ``, and hence this code cannot
58139be compiled by `Numba* `_
59140
@@ -65,9 +146,6 @@ be compiled by `Numba*`_
65146 else:
66147 a = np.ones(10)
67148
68- .. todo ::
69- Discuss the workaround, show the modified code
70-
71149 The use of :func: `isinstance ` function often means type instability and is not supported. Similarly, function calls
72150should also be deterministic. The below example is not supported since the function :func: `f ` is not known in advance:
73151
@@ -80,35 +158,34 @@ should also be deterministic. The below example is not supported since the funct
80158 f = np.random.ranf
81159 a = f(10)
82160
83- .. todo ::
84- Discuss the workaround, show the modified code
85- Discuss other typical scenarios when Numba or hpat cannot perform type inference
86-
87161 Dealing With Integer NaN Values
88162-------------------------------
89163
90164The :py:class: `pandas.Series ` are built upon :py:class: `numpy.ndarray `, which does not support
91- ``NaN `` values for integers. For that reason `Pandas* `_ dynamically converts integer columns to floating point ones
92- when ``NaN `` values are needed. Intel SDC can perform such a conversion only if enough information about
93- ``NaN `` values is available at compilation time. When it is impossible the user is responsible for manual
94- conversion of integer data to floating point data.
95-
96- .. todo ::
97- Show example when Intel SDC can infer ``Nan `` in integer Series. Also show example where information about
98- ``NaN `` cannot be known at compile time and show how it can be worked around
99-
165+ ``NaN `` values for integers and booleans. For that reason `Pandas* `_ dynamically converts integer columns to floating point ones
166+ when ``NaN `` values are needed. Intel SDC doesn't perform such convertion and it is user responsobility to manually
167+ convert from integer data to floating point data.
168+
169+
100170Type Inference In I/O Operations
101171--------------------------------
102172
103173If the filename is constant, the Intel SDC may be able to determine file schema at compilation time. It will allow
104174to perform type inference of columns in respective `Pandas* `_ dataframe.
105-
106- .. todo ::
107- Show example with reading file into dataframe when Intel SDC can do type inferencing at compile time
108-
175+
176+ .. code-block ::
177+
178+ df = pandas.read_csv("data.csv")
179+
109180 If Intel SDC fails to infer types from the file, the schema must be manually specified.
110181
111- .. todo ::
112- Show example how to manually specify the schema
113-
182+ .. code-block ::
183+
184+ names = ['A', 'B']
185+ usecols = ['A']
186+ dtypes={'A': np.float64}
187+ pd.read_csv(file_name, names=names, usecols=usecols, dtype=dtypes)
188+
114189 Alternatively you can take file reading out of the compiled region.
190+
191+ Note: if data file contains integer data with empy positions (Nans) it is highly recomended to manually specify column type to float.
0 commit comments