想要自己搭建 chatGPT？这份教程会让你事半功倍

相信你已经了解了 GPT 的基本原理，下面我来介绍如何在本地配置并训练 GPT 模型。

import random

现在，我们已经准备好开始训练模型。以下是训练 / 预测模型的代码。在这种情况下，我们使用的是 GPT2 中的自回归模型，该模型在 117M 参数下训练。

def predict(model, input_text, seq_len):

input_ids = input_ids.to(device)

model = GPT2LMHeadModel.from_pretrained('gpt2')

现在，你可以运行代码，并将模型和代码载入到应用程序中了。

start = time()

n_ctx = model.config.n_ctx

```

input_ids = input_ids.to(device)

import torch

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

```python

# print("Time elapsed: ", time() - start, " seconds")

model.to(device)

!unzip -q wikitext-2-v1.zip

# from pt_util import context_timer, print_memory_info

# 如果以下训练代码的模型用显存超过 6 GB, 代码将会出错

#if model_max_memory + model_next_memory > cuda_max_memory_bytes:

pip install torch

model = model.to(device)

break

def train():

第一步：环境搭建

return f.read()

```

#cuda_max_memory_bytes = 6.0 * 1024 * 1024 * 1024

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# print_memory_info('End predicting')

with open(filename, 'r') as f:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

optimizer.step()

# 加载训练好的模型

# print_memory_info('End training')

torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

model = get_trained_model()

loss.backward()

from time import time

# 假设这里你已经安装了 NVIDIA GPU Computing Toolkit, 并且使用了 nvidia-smi 命令来查看 GPU 资源

from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel

!wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip

return "Training successful! Model saved at: {}".format(model.save_pretrained('./models/'))

print(predict(model=model, input_text=text, seq_len=50))

!mkdir -p data && mv wikitext-2/ data/

使用 GPT2 / GPT3 等大模型，需要高内存 GPU 支持，并且网络连接至少需要 100 Mbps。建议使用一台强大的云服务器，例如 Google Cloud、AWS、Microsoft Azure、阿里云，将存储资源部署在远程服务器上来实现远程训练。

为了正常运行 GPT，需要安装 PyTorch。 PyTorch 是一个能够适应不同实验需求的机器学习库。我们还需要安装 PytorchTransformers，这是一个基于 Pytorch 的最新 NLP 深度学习模型预训练库，例如 BERT，XLNet 等，同时支持面向生产的模型转换和训练。安装好 PyTorch 和 PyTorchTransformers 之后，使用以下命令：

banned_tokens = []

import torch

#model_next_memory = batch_size * text_length * model.config.to_dict()['n_embd'] * 4

return model

optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)

generated += input_text

第二步：安装必要的库

# 假设你的显存为 6GB

input_text = get_data_from_file("data/wikitext-2/wiki.train.tokens")

```

next_token = int(np.random.choice(len(next_token_probs), p = next_token_probs))

next_token_probs = torch.softmax(next_token_logits, dim=-1).detach().cpu().numpy()[0]

train()

现在，我们将处理数据，数据的选择和预处理对训练结果至关重要。在可以使用现有数据集时，使用大型数据集是很有帮助的。最好使用通用、开放数据集，例如 Book Corpus 数据集。从论文中可以获得此数据集的下载链接，或者使用以下 Python 命令：

import numpy as np

model.to(device)

```

outputs = model(input_ids)

text = "一只小猫咪"

next_token_logits = next_token_logits.squeeze()

pip install pytorch-transformers

if next_token == tokenizer.sep_token_id:

next_token_logits[token] = filter_value

第三步：数据准备

model = GPT2LMHeadModel.from_pretrained('gpt2')

from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel

next_token_logits = outputs[0][:, -1, :]

generated += tokenizer.decode([next_token])

filter_value = -float('Inf')

训练好模型后，可以将其用于生成文本或进行问答。可以通过输入一些示例文本，来产生模型预测的输出。

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

```

input_ids = torch.tensor(tokenizer.encode(input_text)).unsqueeze(0)

def get_data_from_file(filename):

第五步：使用训练好的模型进行预测

for token in banned_tokens:

generated = ''

return generated

对于人工智能爱好者来说，Chatbot 或 GPT 应该不再陌生。但是，你是否曾想过开发自己的聊天机器人？在市场上，一些聊天机器人架构框架成为非常受欢迎的选择，其中最流行的之一就是 OpenAI 的 GPT ( Generative Pretrained Transformer )。但是，要搭建一个高效的聊天机器人，懂得如何使用 GPT 和相应分布式架构就变得至关重要，这将使整个过程变得简单快捷。

next_token_logits = next_token_logits[:, :].contiguous()

```

for _ in range(seq_len):

#batch_size, text_length = input_ids.shape

当你理解建立自己的聊天机器人的过程，并且能够成功处理数据集并训练模型后，你就能够很容易地构建基于 GPT 的 chatbot 了。假如你需要更精细的预测，只需使用一些额外的算法和进一步的优化即可。

# from pt_util import context_timer, print_memory_info

#model_max_memory = torch.cuda.max_memory_allocated()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

第四步：模型训练

next_token_logits = next_token_logits / 0.8

# torch.cuda.empty_cache()

input_ids = torch.cat((input_ids, torch.ones((1, 1)).long().to(device) * next_token), dim=1)

output, past = model(input_ids[:,:-1], past=None)

当我们完成了这些步骤，我们就有了一个可以用于训练的 GPT 模型。现在运行它就可以看到结果了。

loss = torch.nn.CrossEntropyLoss()(output.view(-1, output.size(-1)), input_ids[:,1:].contiguous().view(-1))

```python

input_ids = torch.tensor(tokenizer.encode(generated)).unsqueeze(0)

def get_trained_model():