Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 2.6k
Star 23.4k

Code
Issues 985
Pull requests 150
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: Dao-AILab/flash-attention

Labels 9 Milestones 0

New pull request New

150 Open 546 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Disable 2CTA fwd non-causal on CUDA 12.9 to work around codegen regression

#2461 opened Apr 15, 2026 by Johnsonms Collaborator

Loading…

3 tasks done

[FA3] uv installation support

#2458 opened Apr 14, 2026 by bbuschkaemper • Draft

Add CLC scheduler heuristic

#2455 opened Apr 13, 2026 by drisspg Collaborator

Loading…

[FA4][CuTe DSL] Add head_dim=256 support with exp2 FMA emulation optimization (forward + backward, SM100)

#2454 opened Apr 13, 2026 by Johnsonms Collaborator • Draft

layers/rotary: add rope_scaling support (linear, NTK, YaRN)

#2451 opened Apr 11, 2026 by GitGlimpse895

Loading…

Support batch invariant forward for FA3

#2450 opened Apr 10, 2026 by Edenzzzz

Loading…

[Perf] SM103 tcgen05.ld.red for fused TMEM load + row-max in softmax

#2449 opened Apr 9, 2026 by LopezCastroRoberto

Loading…

expose num_splits for FA2 and add option for kernel blocksize alignment

#2448 opened Apr 8, 2026 by liangel-02 Contributor

Loading…

Add forward and backward support for FA4 hdim=256 on SM100

#2447 opened Apr 8, 2026 by wenkechen

Loading…

[Cute,Fwd,Sm90] Ceil div in paged kv manager to prevent size 0

#2446 opened Apr 8, 2026 by imbr92 Contributor

Loading…

[FA3][Hopper] Make the split heuristic sequence-aware for low-tile decode

#2445 opened Apr 8, 2026 by mllopartbsc

Loading…

Add dropout support to CuTe DSL attention kernels

#2439 opened Apr 6, 2026 by blake-snc Contributor

Loading…

10 of 12 tasks

Draft: native FA4 bring-up for SM100 larger attention head dims

#2435 opened Apr 3, 2026 by m0at • Draft

[CuTe,Fwd,SM90] Enable head dim 512 for SM90

#2422 opened Apr 1, 2026 by IwakuraRein

Loading…

[CuTe, SM120] Fix forward/backward bugs and add SM120 test guards

#2420 opened Apr 1, 2026 by 2imi9

Loading…

Add compress_factor for compressed causal attention

#2418 opened Mar 31, 2026 by jduprat Contributor

Loading…

Fix SM120 forward pass crash: parent __init__ overwrites arch, enabling unsupported TMA path

#2416 opened Mar 31, 2026 by moghon92

Loading…

[Cute,Fwd,Sm90] Support SplitKV

#2415 opened Mar 31, 2026 by imbr92 Contributor

Loading…

Feat([FA4][CUTE DSL]) Add head_dim=256 support (forward + backward)

#2412 opened Mar 30, 2026 by wangsiyu

Loading…

chore(tests): move benchmarks to benchmarks/cute/ and reduce test prints

#2408 opened Mar 29, 2026 by NJX-njx Contributor

Loading…

3 tasks

fix(flash_fwd_sm90): zero partial V smem to prevent 0*NaN=NaN in PV GEMM

#2407 opened Mar 29, 2026 by NJX-njx Contributor

Loading…

3 tasks

[CuTe, SM120] Forward kernel with optimized TMA path and full features support

#2406 opened Mar 28, 2026 by sisgrad

Loading…

feat: setup_context for FlashAttnFunc (torch.func.grad)

#2405 opened Mar 28, 2026 by NJX-njx Contributor

Loading…

fix(cute): SM120 forward/bwd and atomic add compatibility

#2404 opened Mar 28, 2026 by NJX-njx Contributor

Loading…

build(windows): MSVC conforming preprocessor for CUDA 13+ and ninja warning

#2403 opened Mar 28, 2026 by NJX-njx Contributor

Loading…

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2026-03-15.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!