|
| 1 | +***** |
| 2 | +HPAT |
| 3 | +***** |
| 4 | + |
| 5 | +A compiler-based framework for big data in Python |
| 6 | +################################################# |
| 7 | + |
| 8 | +High Performance Analytics Toolkit (HPAT) is a compiler-based framework for big |
| 9 | +data analytics and machine learning on cluster/cloud environments that |
| 10 | +is both easy to use and extremely fast; it is orders of magnitude faster than |
| 11 | +alternatives like `Apache Spark <http://spark.apache.org/>`_. |
| 12 | + |
| 13 | +HPAT automatically or semi-automatically parallelizes analytics tasks written in |
| 14 | +Numpy and pandas and generates efficient MPI code using Numba and LLVM. |
| 15 | +These academic papers describe the underlying methods in HPAT: |
| 16 | + |
| 17 | +- `HPAT paper at ICS'17 <http://dl.acm.org/citation.cfm?id=3079099>`_ |
| 18 | +- `HPAT at HotOS'17 <http://dl.acm.org/citation.cfm?id=3103004>`_ |
| 19 | +- `HiFrames on arxiv <https://arxiv.org/abs/1704.02341>`_ |
| 20 | + |
| 21 | +Installing HPAT |
| 22 | +=============== |
| 23 | + |
| 24 | +These commands install HPAT and its dependencies such as Numba, LLVM and HDF5 |
| 25 | +on Ubuntu Linux:: |
| 26 | + |
| 27 | + $ sudo apt install llvm-4.0 make libc6-dev gcc-4.8 |
| 28 | + $ # download and install Anaconda python distribution |
| 29 | + $ conda create -n HPAT |
| 30 | + $ source activate HPAT |
| 31 | + $ conda install numpy scipy pandas gcc mpich2 llvmlite |
| 32 | + $ git clone https://github.com/IntelLabs/numba.git |
| 33 | + $ cd numba |
| 34 | + $ git checkout hpat_req |
| 35 | + $ python setup.py develop |
| 36 | + $ cd .. |
| 37 | + $ # download hdf5 and cd inside |
| 38 | + $ CC=mpicc CXX=mpicxx ./configure --enable-parallel |
| 39 | + $ make; make install |
| 40 | + $ cd .. |
| 41 | + $ export HDF5_DIR=/home/user/hdf5-1.10.1/hdf5/ |
| 42 | + $ export C_INCLUDE_PATH=$C_INCLUDE_PATH:$HDF5_DIR/include |
| 43 | + $ export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:$HDF5_DIR/include |
| 44 | + $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HDF5_DIR/lib |
| 45 | + $ export LIBRARY_PATH=$LIBRARY_PATH:$HDF5_DIR/lib |
| 46 | + $ git clone https://github.com/h5py/h5py.git |
| 47 | + $ cd h5py |
| 48 | + $ python setup.py configure --hdf5=$HDF5_DIR |
| 49 | + $ LDSHARED="mpicc -shared" CXX=mpicxx LD=mpicc CC="mpicc" python setup.py install |
| 50 | + $ cd .. |
| 51 | + $ git clone https://github.com/IntelLabs/hpat.git |
| 52 | + $ cd hpat |
| 53 | + $ LDSHARED="mpicxx -shared" CXX=mpicxx LD=mpicxx CC="mpicxx -std=c++11" python setup.py develop |
| 54 | + |
| 55 | +Commands for running the logistic regression example:: |
| 56 | + |
| 57 | + $ python generate_data/gen_logistic_regression.py |
| 58 | + $ mpirun -n 2 python examples/logistic_regression.py |
0 commit comments