ChatGLM-6B

https://github.com/THUDM/ChatGLM-6B

ChatGLM-6B 是一个开源的、支持中英双语的对话语言模型，基于 General Language Model (GLM) 架构，具有 62 亿参数。结合模型量化技术，用户可以在消费级的显卡上进行本地部署（INT4 量化级别下最低只需 6GB 显存）。

配置环境

Pytorch

快速开始：https://pytorch.org/get-started/locally/

1

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

注意安装Cuda版本，否则只能拿CPU跑

检查是否是 cuda 版本

1
2


import torch
print(torch.cuda.is_available())

输出为 True 即可

Python

下载该项目GitHub仓库下的 requirements.txt 文件到本地

1

pip install -r requirements.txt

配置好python相关环境

下载预训练文件

目前ChatGLM的硬件需求（截止230413）

量化等级	最低 GPU 显存（推理）	最低 GPU 显存（高效参数微调）
FP16（无量化）	13 GB	14 GB
INT8	8 GB	9 GB
INT4	6 GB	7 GB

INT4量化的预训练文件下载地址：https://huggingface.co/THUDM/chatglm-6b-int4/tree/main

git clone https://huggingface.co/THUDM/chatglm-6b-int4

运行

加载

注意替换预训练模型路径为自己所存放的位置

1
2
3
4


from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("F:\\Chatglm\\chatglm-6b-int4", trust_remote_code=True, revision="")
model = AutoModel.from_pretrained("F:\\Chatglm\\chatglm-6b-int4", trust_remote_code=True, revision="").half().cuda()
model = model.eval()

不含历史记录的执行

1
2


response, history = model.chat(tokenizer, "你好", history=[])
print(response)

含历史记录的执行

1
2


response, history = model.chat(tokenizer, "你好", history=history)
print(response)

参考链接

https://github.com/THUDM/ChatGLM-6B

https://zhuanlan.zhihu.com/p/620455056

https://huggingface.co/THUDM/chatglm-6b-int4/tree/main