[Proposal] Optimize phase_encode_kernel using LLVM JIT and CUDA Driver API for 5x+ Speedup

### What

This proposal introduces a performance optimization for the phase_encode_kernel by leveraging LLVM with Just-In-Time (JIT) compilation and the CUDA Driver API. Initial assessments suggest this refactoring can yield a 5x+ speedup over the current implementation, significantly improving performance for workloads relying on this kernel.
| Language | Kernel execution time |Gain|
|:---|:---|:---|
|CUDA C runtime api | 0.302787 ms | 1|
|LLVM+JIT|0.0195 ms| ~ 15 |

### Why

The goal of this optimization is to maximize hardware efficiency and properly utilize all available computing resources. Enhancing its execution speed via runtime compilation and direct hardware acceleration will dramatically reduce compute time and improve resource utilization across modern hardware backends.

### How

- LLVM + JIT Refactoring: Refactor the existing phase_encode_kernel function to generate LLVM Intermediate Representation (IR) dynamically, allowing for runtime optimization tailored to specific data shapes and parameters.

- Execution Pipeline: Finalize the execution pipeline to smoothly bind the JIT-compiled LLVM kernel with the CUDA driver, including proper memory mapping.

- tentive plan
Step 1: Bridge LLVM JIT with Rust Core
Step 2: Load the PTX with the CUDA Driver API
Step 3: Wire into the Mahout Pipeline

## Q
1) The phase-encode kernel itself is refactored in C++ using LLVM and JIT, so it's expected to be stored as a `.cpp` file. Shall it be with the original `.cu` file, or in an independent folder?
2) The CUDA C driver API will be added, shall it be with the original kernel, or in a independent `.rs` file?
3) If Mahout accepts this merge, shall the kernel be renamed, or should a feature like `define` be used to switch between them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Optimize phase_encode_kernel using LLVM JIT and CUDA Driver API for 5x+ Speedup #1394

What

Why

How

Q

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Proposal] Optimize phase_encode_kernel using LLVM JIT and CUDA Driver API for 5x+ Speedup #1394

Description

What

Why

How

Q

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions