A quick guide on how to deploy GPT-2 at scale.

Michał Zmysłowski

Let’s explore a simple and fast way to deploy GPT-2 model at scale in minutes without docker and Kubernetes setup. We also have a tutorial on how to deploy a PyTorch model.

Introduction

Do you know what’s better than AI models in PyCharm or Jupyter Notebook? AI models working on production! It feels great when anyone can use your model, but there are many potential problems with AI model deployment.

We will need to create a web service with Flask, recreate the environment in Docker, set up the infrastructure, and deploy the model to the Google Cloud or Amazon AWS, right? Fortunately not! This is how Machine Learning Operations looked like. In this tutorial, I will show you how to deploy a GPT-2 model with one tool called Syndicai in a few simple clicks.

Step 1: Develop a GPT-2 model

Okay, but before we deploy anything, we need some cool AI model. We will use one of the hottest NLP models straight out of the OpenAI labs – GPT-2. If you already have your proprietary model ready, then you can skip this section.

GPT-2 is a “transformer-based language model with 1.5 billion parameters trained on a dataset of 8 million webpages”.  The main goal of the model is to predict the next word given the collection of previous words. It achieved state-of-the-art results (now suppressed by GPT-3) on a variety of different datasets. The most amazing thing about it is the fact that it wasn’t trained on any domain-specific NLP task. Nevertheless, it was superior to other hand-crafted models. This way of comparing the performance of the model is called “zero-shot”.

GPT-2 model completion task
GPT-2 Model Completion Task: https://openai.com/blog/better-language-models/#sample2

We have already prepared the GPT-2 model on our GitHub repository. You don’t have to do anything with the repository for now.

Step 2: Deploy GPT-2 model

Let’s find out how to skip all the repetitive steps in the deployment process. Enter Syndicai – the tool that takes a GitHub repository and returns the Rest API. Under the hood, Syndicai setups the entire infrastructure with one click. Moreover, it takes care of the scalability of resources. The resulting API offers great flexibility because you can connect it to any device.

ai model deployment traditional approach vs syndicai
AI Model Deployment: Traditional Approach vs Syndicai

We also have a tutorial on how to deploy a PyTorch model.

Prepare a repository with a GPT-2 model

Apart from putting your model in the GitHub repository, you have to upload two additional files there: requirements.txt and syndicai.py.

requirements.txt – a file with all libraries and frameworks needed to recreate model’s environment

torch
transformers==2.3.*
wget==3.*

syndicai.py – main file with the PythonPredictor python class responsible for model prediction.

import wget
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, GPT2Config
import generator


class PythonPredictor:
    def __init__(self, config):
        medium_config = GPT2Config(n_embd=1024, n_layer=24, n_head=16)
        model = GPT2LMHeadModel(medium_config)
        wget.download(
            "https://convaisharables.blob.core.windows.net/lsp/multiref/medium_ft.pkl",
            "/tmp/medium_ft.pkl",
        )

        weights = torch.load("/tmp/medium_ft.pkl")
        weights["lm_head.weight"] = weights["lm_head.decoder.weight"]
        weights.pop("lm_head.decoder.weight", None)

        model.load_state_dict(weights)

        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"using device: {device}")
        model.to(device)
        model.eval()

        self.device = device
        self.model = model
        self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

    def predict(self, payload):
        conditioned_tokens = self.tokenizer.encode(payload["text"]) + [generator.END_OF_TEXT]
        prediction = generator.generate(self.model, conditioned_tokens, self.device)
        return self.tokenizer.decode(prediction)

These two files are necessary for the Syndicai tool to be able to recreate the environment and know which function to use for prediction.

Connect the repository to Syndicai

When we have the GitHub repository with requirements.txt and syndicai.py ready, we can proceed to connect it to the Syndicai platform. In order to that, go to https://app.syndicai.co/, login, click New Model on the Overview page, and follow the steps in the form. As soon as you finish, the infrastructure will start building. You will need to wait a couple of minutes for the model to become Active.

deploy a model via Syndicai
Deploy a model via Syndicai Platform

For more information about the model preparation or deployment process go to Syndicai Docs.

Step 3: Integrate the model API

Congratulations!

You now deployed a model on production and have your Rest API ready. To test it out quickly you can paste a sample input script in the Run a model section on Syndicai platform. 

Remember that your model needs to be Active in order to work!

{
    "text": "What is Artificial Intelligence?"
}

If everything works fine, you can now connect the API with any device or service. As an example, you can go to the Showcase page to explore sample implementations.

Summary

Today, you have become a slightly better person. You now know how to deploy really cool AI models on production no matter if you are a Data Scientist, Machine Learning Engineer, Backend Developer, DevOps or just enthusiast. 

If you found that useful, or you want to get more of those types of tutorials – please drop us a line by mail or catch us on slack.