快速开始
============

.. note::

    阅读本篇前，请确保已按照 :doc:`安装指南 <./install>` 准备好昇腾环境及transformers！


本文以Meta-Llama-3-8B-Instruct模型为例，介绍如何通过transformers使用模型进行推理，
针对模型推理transformers提供了 AutoModelForCausalLM_，pipeline_ 两种方式，下面将说明这两种接口的使用方式。

.. note::

    以下模型用到了Meta-Llama-3-8B-Instruct， 具体可以参考 `模型获取 <./modeldownload.html>`_ 。

AutoModelForCausalLM
-----------------------------------------------

.. code-block:: python
    :linenos:

    import torch
    import torch_npu
    from transformers import AutoModelForCausalLM, AutoTokenizer

    model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
    device = "npu:0" if torch.npu.is_available() else "cpu"
    
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    ).to(device)


pipeline
-------------------------

.. code-block:: python
    :linenos:

    import transformers
    import torch
    import torch_npu   
    
    model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
    device = "npu:0" if torch.npu.is_available() else "cpu"
    
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device=device,
    )


全流程
----------

.. code-block:: python
    :linenos:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch
    import torch_npu 

    #如果提前下载好模型将meta-llama/Meta-Llama-3-8B-Instruct更换为本地地址
    model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
    device = "npu:0"  if torch.npu.is_available() else "cpu" # 指定使用的设备为 NPU 0

    # 加载预训练的分词器
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    # 加载预训练的语言模型, 并指定数据类型为bfloat16, 自动选择设备映射
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    ).to(device) # 将模型移动到指定的设备
    
    # 定义消息列表，包含系统消息和用户消息
    messages = [
        {"role": "system", "content": "You are a housekeeper chatbot who always responds in polite expression!"},
        {"role": "user", "content": "Who are you? what should you do?"},
    ]
    
    # 使用分词器将消息列表应用到聊天模板中，并转换为张量
    input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt" # 返回 PyTorch 张量
    ).to(model.device)
    

    # 定义终止标记，包括模型的结束标记 ID 和一个空标记 ID
    terminators = [
        tokenizer.eos_token_id,
        tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]
    
    # 生成响应
    outputs = model.generate(
        input_ids,
        max_new_tokens=256, # 设置生成的最大token
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.6, # 设置采样温度，影响生成的多样性
        top_p=0.9,
    )

    # 获取生成的响应，排除输入的部分
    response = outputs[0][input_ids.shape[-1]:]
    print(tokenizer.decode(response, skip_special_tokens=True))

输出示例：

.. code-block:: shell
    :linenos:

    Good day to you! My name is Housekeeper Helen, and I'm delighted to introduce myself as a friendly and efficient chatbot designed to assist with household tasks and provide helpful information. 
    As a housekeeper, my primary role is to ensure your home is tidy, organized, and comfortable. I'd be happy to help with:

    * Cleaning and organization tips
    * Household chore schedules
    * Laundry and ironing guidance
    * Home maintenance advice
    * And any other domestic-related queries you may have!

    Please feel free to ask me any questions or request my assistance with a specific task. I'm here to help make your life easier and your home sparkle!