Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Commit f74a7d6

Browse files
author
Ehsan Totoni
committed
add readme
1 parent 17cbf05 commit f74a7d6

1 file changed

Lines changed: 58 additions & 0 deletions

File tree

README.rst

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
*****
2+
HPAT
3+
*****
4+
5+
A compiler-based framework for big data in Python
6+
#################################################
7+
8+
High Performance Analytics Toolkit (HPAT) is a compiler-based framework for big
9+
data analytics and machine learning on cluster/cloud environments that
10+
is both easy to use and extremely fast; it is orders of magnitude faster than
11+
alternatives like `Apache Spark <http://spark.apache.org/>`_.
12+
13+
HPAT automatically or semi-automatically parallelizes analytics tasks written in
14+
Numpy and pandas and generates efficient MPI code using Numba and LLVM.
15+
These academic papers describe the underlying methods in HPAT:
16+
17+
- `HPAT paper at ICS'17 <http://dl.acm.org/citation.cfm?id=3079099>`_
18+
- `HPAT at HotOS'17 <http://dl.acm.org/citation.cfm?id=3103004>`_
19+
- `HiFrames on arxiv <https://arxiv.org/abs/1704.02341>`_
20+
21+
Installing HPAT
22+
===============
23+
24+
These commands install HPAT and its dependencies such as Numba, LLVM and HDF5
25+
on Ubuntu Linux::
26+
27+
$ sudo apt install llvm-4.0 make libc6-dev gcc-4.8
28+
$ # download and install Anaconda python distribution
29+
$ conda create -n HPAT
30+
$ source activate HPAT
31+
$ conda install numpy scipy pandas gcc mpich2 llvmlite
32+
$ git clone https://github.com/IntelLabs/numba.git
33+
$ cd numba
34+
$ git checkout hpat_req
35+
$ python setup.py develop
36+
$ cd ..
37+
$ # download hdf5 and cd inside
38+
$ CC=mpicc CXX=mpicxx ./configure --enable-parallel
39+
$ make; make install
40+
$ cd ..
41+
$ export HDF5_DIR=/home/user/hdf5-1.10.1/hdf5/
42+
$ export C_INCLUDE_PATH=$C_INCLUDE_PATH:$HDF5_DIR/include
43+
$ export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:$HDF5_DIR/include
44+
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HDF5_DIR/lib
45+
$ export LIBRARY_PATH=$LIBRARY_PATH:$HDF5_DIR/lib
46+
$ git clone https://github.com/h5py/h5py.git
47+
$ cd h5py
48+
$ python setup.py configure --hdf5=$HDF5_DIR
49+
$ LDSHARED="mpicc -shared" CXX=mpicxx LD=mpicc CC="mpicc" python setup.py install
50+
$ cd ..
51+
$ git clone https://github.com/IntelLabs/hpat.git
52+
$ cd hpat
53+
$ LDSHARED="mpicxx -shared" CXX=mpicxx LD=mpicxx CC="mpicxx -std=c++11" python setup.py develop
54+
55+
Commands for running the logistic regression example::
56+
57+
$ python generate_data/gen_logistic_regression.py
58+
$ mpirun -n 2 python examples/logistic_regression.py

0 commit comments

Comments
 (0)