ms-swift
- NPU支持
- 支持范围速览
- 选择你的使用路线
- 环境准备
- 快速跑通:ModelScope 模型 + 数据集
- 训练
- 模型保存、Merge LoRA 和断点续训
- 推理
- 部署
- 评测
- 发布
- FAQ
- Q1: 如何确认当前环境已经正确识别 NPU?
- Q2: 训练时应该选择 FSDP、DeepSpeed 还是 Megatron-SWIFT?
- Q3: NPU 模型 Patch 需要手动关闭吗?
- Q4: 使用 vLLM-Ascend 部署或 RL rollout 时需要注意什么?
- Q5: 忘记执行
source set_env.sh会有什么表现? - Q6:
torch和torch_npu版本不匹配怎么判断? - Q7:
ASCEND_RT_VISIBLE_DEVICES和NPROC_PER_NODE不一致会怎样? - Q8: 多卡训练卡住时先看什么?
- Q9: HCCL 连接或超时问题如何初步排查?
- Q10: 容器里
npu-smi不可用通常是什么原因? - Q11: 原生 transformers 部署和 vLLM-Ascend 部署怎么选?
- Q12: vLLM-Ascend 报 device type 不匹配或 undefined symbol 怎么办?
- Q13: FP8 或量化模型可以直接在 NPU 上训练吗?
- Q14: Megatron-SWIFT 导入到错误的 Megatron/MindSpeed 怎么排查?
- NPU微信群
- NPU Support
- Support Scope at a Glance
- Choose Your Usage Path
- Environment Preparation
- Quick Start: ModelScope Model + Dataset
- Training
- Model Saving, Merge LoRA, and Resume Training
- Inference
- Deployment
- Evaluation
- Release
- FAQ
- Q1: How do I confirm that the current environment detects NPUs correctly?
- Q2: How should I choose between FSDP, DeepSpeed, and Megatron-SWIFT?
- Q3: Do I need to manually disable the NPU model patch?
- Q4: What should I know when using vLLM-Ascend for deployment or RL rollout?
- Q5: What happens if I forget to run
source set_env.sh? - Q6: How do I diagnose a
torchandtorch_npuversion mismatch? - Q7: What happens if
ASCEND_RT_VISIBLE_DEVICESandNPROC_PER_NODEdo not match? - Q8: What should I check first when multi-card training hangs?
- Q9: How do I initially troubleshoot HCCL connection or timeout issues?
- Q10: Why is
npu-smiunavailable inside the container? - Q11: How should I choose between native transformers deployment and vLLM-Ascend deployment?
- Q12: What should I do if vLLM-Ascend reports device type mismatch or undefined symbol?
- Q13: Can FP8 or quantized models be trained directly on NPUs?
- Q14: How do I troubleshoot Megatron-SWIFT importing the wrong Megatron/MindSpeed?
- NPU WeChat Group