快速开始
备注
阅读本篇前,请确保已按照 安装指南 准备好昇腾环境及transformers!
本文以Meta-Llama-3-8B-Instruct模型为例,介绍如何通过transformers使用模型进行推理, 针对模型推理transformers提供了 AutoModelForCausalLM,pipeline 两种方式,下面将说明这两种接口的使用方式。
备注
以下模型用到了Meta-Llama-3-8B-Instruct, 具体可以参考 模型获取 。
AutoModelForCausalLM
1import torch
2import torch_npu
3from transformers import AutoModelForCausalLM, AutoTokenizer
4
5model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
6device = "npu:0" if torch.npu.is_available() else "cpu"
7
8tokenizer = AutoTokenizer.from_pretrained(model_id)
9model = AutoModelForCausalLM.from_pretrained(
10 model_id,
11 torch_dtype=torch.bfloat16,
12 device_map="auto",
13).to(device)
pipeline
1import transformers
2import torch
3import torch_npu
4
5model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
6device = "npu:0" if torch.npu.is_available() else "cpu"
7
8pipeline = transformers.pipeline(
9 "text-generation",
10 model=model_id,
11 model_kwargs={"torch_dtype": torch.bfloat16},
12 device=device,
13)
全流程
1from transformers import AutoModelForCausalLM, AutoTokenizer
2import torch
3import torch_npu
4
5#如果提前下载好模型将meta-llama/Meta-Llama-3-8B-Instruct更换为本地地址
6model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
7device = "npu:0" if torch.npu.is_available() else "cpu" # 指定使用的设备为 NPU 0
8
9# 加载预训练的分词器
10tokenizer = AutoTokenizer.from_pretrained(model_id)
11
12# 加载预训练的语言模型, 并指定数据类型为bfloat16, 自动选择设备映射
13model = AutoModelForCausalLM.from_pretrained(
14 model_id,
15 torch_dtype=torch.bfloat16,
16 device_map="auto",
17).to(device) # 将模型移动到指定的设备
18
19# 定义消息列表,包含系统消息和用户消息
20messages = [
21 {"role": "system", "content": "You are a housekeeper chatbot who always responds in polite expression!"},
22 {"role": "user", "content": "Who are you? what should you do?"},
23]
24
25# 使用分词器将消息列表应用到聊天模板中,并转换为张量
26input_ids = tokenizer.apply_chat_template(
27 messages,
28 add_generation_prompt=True,
29 return_tensors="pt" # 返回 PyTorch 张量
30).to(model.device)
31
32
33# 定义终止标记,包括模型的结束标记 ID 和一个空标记 ID
34terminators = [
35 tokenizer.eos_token_id,
36 tokenizer.convert_tokens_to_ids("<|eot_id|>")
37]
38
39# 生成响应
40outputs = model.generate(
41 input_ids,
42 max_new_tokens=256, # 设置生成的最大token
43 eos_token_id=terminators,
44 do_sample=True,
45 temperature=0.6, # 设置采样温度,影响生成的多样性
46 top_p=0.9,
47)
48
49# 获取生成的响应,排除输入的部分
50response = outputs[0][input_ids.shape[-1]:]
51print(tokenizer.decode(response, skip_special_tokens=True))
输出示例:
1Good day to you! My name is Housekeeper Helen, and I'm delighted to introduce myself as a friendly and efficient chatbot designed to assist with household tasks and provide helpful information.
2As a housekeeper, my primary role is to ensure your home is tidy, organized, and comfortable. I'd be happy to help with:
3
4* Cleaning and organization tips
5* Household chore schedules
6* Laundry and ironing guidance
7* Home maintenance advice
8* And any other domestic-related queries you may have!
9
10Please feel free to ask me any questions or request my assistance with a specific task. I'm here to help make your life easier and your home sparkle!