快速开始

备注

阅读本篇前，请确保已按照安装指南准备好昇腾环境及transformers！

本文以Meta-Llama-3-8B-Instruct模型为例，介绍如何通过transformers使用模型进行推理，针对模型推理transformers提供了 AutoModelForCausalLM，pipeline 两种方式，下面将说明这两种接口的使用方式。

备注

以下模型用到了Meta-Llama-3-8B-Instruct，具体可以参考模型获取。

AutoModelForCausalLM

import torch
import torch_npu
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
device = "npu:0" if torch.npu.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
).to(device)

pipeline

import transformers
import torch
import torch_npu

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
device = "npu:0" if torch.npu.is_available() else "cpu"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device=device,
)

全流程

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import torch_npu

#如果提前下载好模型将meta-llama/Meta-Llama-3-8B-Instruct更换为本地地址
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
device = "npu:0"  if torch.npu.is_available() else "cpu" # 指定使用的设备为 NPU 0

# 加载预训练的分词器
tokenizer = AutoTokenizer.from_pretrained(model_id)

# 加载预训练的语言模型, 并指定数据类型为bfloat16, 自动选择设备映射
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
).to(device) # 将模型移动到指定的设备

# 定义消息列表，包含系统消息和用户消息
messages = [
    {"role": "system", "content": "You are a housekeeper chatbot who always responds in polite expression!"},
    {"role": "user", "content": "Who are you? what should you do?"},
]

# 使用分词器将消息列表应用到聊天模板中，并转换为张量
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt" # 返回 PyTorch 张量
).to(model.device)


# 定义终止标记，包括模型的结束标记 ID 和一个空标记 ID
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

# 生成响应
outputs = model.generate(
    input_ids,
    max_new_tokens=256, # 设置生成的最大token
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6, # 设置采样温度，影响生成的多样性
    top_p=0.9,
)

# 获取生成的响应，排除输入的部分
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

输出示例：

Good day to you! My name is Housekeeper Helen, and I'm delighted to introduce myself as a friendly and efficient chatbot designed to assist with household tasks and provide helpful information.
As a housekeeper, my primary role is to ensure your home is tidy, organized, and comfortable. I'd be happy to help with:

* Cleaning and organization tips
* Household chore schedules
* Laundry and ironing guidance
* Home maintenance advice
* And any other domestic-related queries you may have!

Please feel free to ask me any questions or request my assistance with a specific task. I'm here to help make your life easier and your home sparkle!