In the following tutorial we will explore a simple and fast way to deploy GPT-2 model at scale. No need to know how docker, or Kubernetes works.
Do you know what’s better than AI models in PyCharm or Jupyter Notebook? AI models working on production! It feels great when anyone can use your model, but there are many potential problems with AI model deployment.
We will need to create a web service with Flask, recreate the environment in Docker, set up the infrastructure, and deploy the model to the Google Cloud or Amazon AWS, right? Fortunately not! This is how Machine Learning Operations looked like. In this tutorial, I will show you how to deploy a GPT-2 model with one tool called Syndicai in a few simple clicks.
Okay, but before we deploy anything, we need some cool AI model. Therefore we will use one of the hottest NLP models straight out of the OpenAI labs - GPT-2. If you already have your proprietary model ready, then you can skip this section.
The GPT-2 is a “transformer-based language model with 1.5 billion parameters trained on a dataset of 8 million webpages”. The main goal of the model is to predict the next word given the collection of previous words. It achieved state-of-the-art results (now suppressed by GPT-3) on a variety of different datasets. The most amazing thing about it is the fact that it wasn’t trained on any domain-specific NLP task. Nevertheless, it was superior to other hand-crafted models. This way of comparing the performance of the model is called “zero-shot”.
We have already prepared the GPT-2 model on our GitHub repository. You don’t have to do anything with the repository for now.
Let's find out how to skip all the repetitive steps in the deployment process. Enter Syndicai - the tool that takes a GitHub repository and returns the Rest API. Under the hood, Syndicai setups the entire infrastructure with one click. Moreover, it takes care of the scalability of resources. The resulting API offers great flexibility because you can connect it to any device.
We also have a tutorial on how to deploy a PyTorch model.
Apart from putting your model in the GitHub repository, you have to upload two additional files there: requirements.txt
and syndicai.py
.
requirements.txt
- a file with all libraries and frameworks needed to recreate model's environment
torch
transformers==2.3.*
wget==3.*
syndicai.py
- main file with the PythonPredictor
python class responsible for model prediction.
import wget
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config
import generator
class PythonPredictor:
def __init__(self, config):
medium_config = GPT2Config(n_embd=1024, n_layer=24, n_head=16)
model = GPT2LMHeadModel(medium_config)
wget.download(
"https://convaisharables.blob.core.windows.net/lsp/multiref/medium_ft.pkl",
"/tmp/medium_ft.pkl",
)
weights = torch.load("/tmp/medium_ft.pkl")
weights["lm_head.weight"] = weights["lm_head.decoder.weight"]
weights.pop("lm_head.decoder.weight", None)
model.load_state_dict(weights)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"using device: {device}")
model.to(device)
model.eval()
self.device = device
self.model = model
self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
def predict(self, payload):
conditioned_tokens = self.tokenizer.encode(payload["text"]) + [generator.END_OF_TEXT]
prediction = generator.generate(self.model, conditioned_tokens, self.device)
return self.tokenizer.decode(prediction)
These two files are necessary for the Syndicai tool to be able to recreate the environment and know which function to use for prediction.
When we have the GitHub repository with requirements.txt
and syndicai.py
ready, we can proceed to connect it to the Syndicai platform. In order to that, go to https://app.syndicai.co/, login, click New Model on the Overview page, and follow the steps in the form. As soon as you finish, the infrastructure will start building. You will need to wait a couple of minutes for the model to become Active.
For more information about the model preparation or deployment process go to Syndicai Docs.
Congratulations!
You now deployed a model on production and have your Rest API ready. To test it out quickly you can paste a sample input script in the Run a model section on Syndicai platform.
Remember that your model needs to be Active in order to work!
{
"text": "What is Artificial Intelligence?"
}
If everything works fine, you can now connect the API with any device or service. As an example, you can go to the Showcase page to explore sample implementations.
Today, you have become a slightly better person. You now know how to deploy really cool AI models on production no matter if you are a Data Scientist, Machine Learning Engineer, Backend Developer, DevOps or just enthusiast.
If you found that useful, or you want to get more of those types of tutorials – please drop us a line by mail or catch us on slack.