博主的环境配置:windows11操作系统,cuda=11.8.r11.8, cudnn=8.9.7, git=2.47.1,cmake=4.0.0-rc4,ninja=1.12.1, vs_buildTools=17.4.21, cl=19.34.31948, torch=2.3.1
编译flash-attention的环境依赖如下图
nvcc --version
Installation Guide Windows
,点击Visual Studio
Build Tools
cl
cmake --version
git --version
ninja --version
nvcc --version
git clone https://github.com/Dao-AILab/flash-attention.git
max_num_jobs_cores
,根据个人电脑核心数决定,增大该值可加快编译速度。python setup.py bdist_wheel
dist
下就存在whl文件flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl
pip install ./dist/flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl
编译较慢,博主分享自己编译好的whl文件给大家。https://pan.baidu.com/s/1_SCUEjqbNDpioV7UGCWwfQ?pwd=78vx