快速开始

备注

阅读本篇前,请确保已按照 安装指南 准备好昇腾环境及transformers!

本文以Meta-Llama-3-8B-Instruct模型为例,介绍如何通过transformers使用模型进行推理, 针对模型推理transformers提供了 AutoModelForCausalLMpipeline 两种方式,下面将说明这两种接口的使用方式。

备注

以下模型用到了Meta-Llama-3-8B-Instruct, 具体可以参考 模型获取

AutoModelForCausalLM

 1import torch
 2import torch_npu
 3from transformers import AutoModelForCausalLM, AutoTokenizer
 4
 5model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
 6device = "npu:0" if torch.npu.is_available() else "cpu"
 7
 8tokenizer = AutoTokenizer.from_pretrained(model_id)
 9model = AutoModelForCausalLM.from_pretrained(
10    model_id,
11    torch_dtype=torch.bfloat16,
12    device_map="auto",
13).to(device)

pipeline

 1import transformers
 2import torch
 3import torch_npu
 4
 5model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
 6device = "npu:0" if torch.npu.is_available() else "cpu"
 7
 8pipeline = transformers.pipeline(
 9    "text-generation",
10    model=model_id,
11    model_kwargs={"torch_dtype": torch.bfloat16},
12    device=device,
13)

全流程

 1from transformers import AutoModelForCausalLM, AutoTokenizer
 2import torch
 3import torch_npu
 4
 5#如果提前下载好模型将meta-llama/Meta-Llama-3-8B-Instruct更换为本地地址
 6model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
 7device = "npu:0"  if torch.npu.is_available() else "cpu" # 指定使用的设备为 NPU 0
 8
 9# 加载预训练的分词器
10tokenizer = AutoTokenizer.from_pretrained(model_id)
11
12# 加载预训练的语言模型, 并指定数据类型为bfloat16, 自动选择设备映射
13model = AutoModelForCausalLM.from_pretrained(
14    model_id,
15    torch_dtype=torch.bfloat16,
16    device_map="auto",
17).to(device) # 将模型移动到指定的设备
18
19# 定义消息列表,包含系统消息和用户消息
20messages = [
21    {"role": "system", "content": "You are a housekeeper chatbot who always responds in polite expression!"},
22    {"role": "user", "content": "Who are you? what should you do?"},
23]
24
25# 使用分词器将消息列表应用到聊天模板中,并转换为张量
26input_ids = tokenizer.apply_chat_template(
27    messages,
28    add_generation_prompt=True,
29    return_tensors="pt" # 返回 PyTorch 张量
30).to(model.device)
31
32
33# 定义终止标记,包括模型的结束标记 ID 和一个空标记 ID
34terminators = [
35    tokenizer.eos_token_id,
36    tokenizer.convert_tokens_to_ids("<|eot_id|>")
37]
38
39# 生成响应
40outputs = model.generate(
41    input_ids,
42    max_new_tokens=256, # 设置生成的最大token
43    eos_token_id=terminators,
44    do_sample=True,
45    temperature=0.6, # 设置采样温度,影响生成的多样性
46    top_p=0.9,
47)
48
49# 获取生成的响应,排除输入的部分
50response = outputs[0][input_ids.shape[-1]:]
51print(tokenizer.decode(response, skip_special_tokens=True))

输出示例:

 1Good day to you! My name is Housekeeper Helen, and I'm delighted to introduce myself as a friendly and efficient chatbot designed to assist with household tasks and provide helpful information.
 2As a housekeeper, my primary role is to ensure your home is tidy, organized, and comfortable. I'd be happy to help with:
 3
 4* Cleaning and organization tips
 5* Household chore schedules
 6* Laundry and ironing guidance
 7* Home maintenance advice
 8* And any other domestic-related queries you may have!
 9
10Please feel free to ask me any questions or request my assistance with a specific task. I'm here to help make your life easier and your home sparkle!