Content is user-generated and unverified.

Google's BERT Paper Coined "OpenAI GPT" — Not OpenAI

The phrase "OpenAI GPT" and the expansion "Generative Pre-trained Transformer" were first documented on October 11, 2018 in Google's BERT paper, not in any OpenAI publication. OpenAI's original June 2018 release never used the "GPT" abbreviation — the model was simply called "Finetuned Transformer LM" and the technique was described as "generative pre-training." The now-famous "GPT" name is effectively a retronym that emerged four months after the original model's release.

OpenAI's original 2018 release used no "GPT" terminology

When Alec Radford and colleagues published their breakthrough paper on June 11, 2018, they titled it "Improving Language Understanding by Generative Pre-Training" — using "pre-training" as a verb phrase, not "pre-trained" as an adjective. The paper itself never uses the acronym "GPT" anywhere in its 12 pages. In benchmark tables, the model is consistently labeled "Finetuned Transformer LM (ours)" rather than any GPT designation.

The GitHub repository OpenAI created was named finetune-transformer-lm, not "gpt" or "gpt-1." Even Radford's announcement tweet on June 11, 2018 described the work as "a single transformer language model can be finetuned to a wide variety of NLP tasks" without ever mentioning "GPT." The OpenAI blog post used the title "Improving Language Understanding with Unsupervised Learning" and similarly avoided any GPT branding.

BERT paper introduced both "OpenAI GPT" and its expansion

The earliest documented usage of "OpenAI GPT" and "Generative Pre-trained Transformer" appears in arXiv:1810.04805, submitted October 11, 2018. Google researchers Devlin, Chang, Lee, and Toutanova wrote:

"The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters."

The BERT paper uses "OpenAI GPT" extensively throughout — in Figure 3's caption ("OpenAI GPT uses a left-to-right Transformer"), in benchmark comparison tables, and in direct architectural comparisons stating "BERT_BASE was chosen to have the same model size as OpenAI GPT for comparison purposes." The authors needed a concise label for the OpenAI model and created one by combining:

  • "Generative" and "Pre-trained" from the original paper's methodology
  • "Transformer" from the architecture (Vaswani et al., 2017)

This produced the acronym GPT = Generative Pre-trained Transformer, which the BERT authors then attached to "OpenAI" for clarity.

No earlier academic usage has been found

Exhaustive searches of papers published between June and October 2018 that cited Radford et al. found no usage of "GPT" or "OpenAI GPT" before the BERT paper. Other papers from this period simply cited the work as "Radford et al., 2018" or referred to "generative pre-training" as a technique. The SWAG dataset paper, which the OpenAI model was tested on, similarly did not use GPT terminology.

The evidence strongly indicates Google researchers — not OpenAI — coined the now-ubiquitous "GPT" naming convention, likely because they needed a memorable shorthand for comparative analysis in their highly influential BERT paper.

OpenAI adopted the naming with GPT-2 in February 2019

The "GPT" brand became official when OpenAI released GPT-2 on February 14, 2019. The blog post announcement explicitly referenced the predecessor as part of a lineage: "GPT-2 (a successor to GPT)." This retroactively established "GPT" (later "GPT-1") as the name for the June 2018 model that was originally nameless.

DateSourceTerminology Used
June 11, 2018OpenAI paper"Finetuned Transformer LM"
June 11, 2018GitHub repofinetune-transformer-lm
June 11, 2018Radford tweet"transformer language model"
October 11, 2018BERT paper"Generative Pre-trained Transformer (OpenAI GPT)"
February 14, 2019GPT-2 release"GPT-2", "the original GPT"
Post-2019Common usage"GPT-1" (retronym)

"Generative Pre-trained Transformer" is a constructed retronym

The full phrase "Generative Pre-trained Transformer" does not appear in OpenAI's GPT-1 paper (which uses "Generative Pre-Training"), nor does it appear in OpenAI's GPT-2 paper ("Language Models are Unsupervised Multitask Learners"). The expansion was created by the BERT authors by transforming the gerund "pre-training" into the past participle "pre-trained" and appending "Transformer" to form a grammatically correct backronym.

Today's standard claim that "GPT stands for Generative Pre-trained Transformer" is technically accurate as a description of what the letters now represent, but historically misleading. The expansion was retrofitted to explain an acronym that was itself created by external researchers needing a convenient label for comparison purposes.

Conclusion

The absolute earliest documented usage of "OpenAI GPT" is October 11, 2018, in arXiv:1810.04805 (the BERT paper by Devlin et al.). This paper also introduced the now-canonical expansion "Generative Pre-trained Transformer." OpenAI's own materials from June 2018 never used "GPT" — the term was coined by Google researchers and subsequently adopted by OpenAI when launching GPT-2 in February 2019. The designation "GPT-1" for the original model is a retronym that did not exist until after its successor was named.

Content is user-generated and unverified.
    Google's BERT Paper Coined "GPT" — Not OpenAI | Claude