Decoding Text-to-Text Transformers

by Engati 2 months ago in business

The progress in NLP took a great leap with the introduction of Transformer architecture. Based on NLP’s successful architecture, language models such as BERT have achieved state-of-the-art results in various NLP tasks.

Decoding Text-to-Text Transformers

The progress in NLP took a great leap with the introduction of Transformer architecture.

Based on NLP’s successful architecture, language models such as BERT have achieved state-of-the-art results in various NLP tasks.

The big idea

Here’s the central idea behind these language models. It’s to train a massive corpus in an unsupervised manner to learn language structure, grammar, and semantics.

These massive pre-trained models can then be used as an encoder to generate contextual and semantic representations of text. By using transfer learning, several downstream NLP tasks can be performed with ease. Tasks such as text classification, sentiment analysis, question answering, and summarising.

Text-to-Text Transformers

Recently Google has made a significant advancement in this area by releasing a new model, Text-To-Text Transformer or T5.

How it works

T5 reframes all NLP tasks into a unified text-to-text format where the input and output of the model is text. It takes text input from various NLP tasks and predicts text output for the respective task as shown below:

Every task considered uses text as input to the model, which is trained to generate some target text.

This allows the same model, loss function, and hyper-parameters across diverse sets of tasks, including translation (green), linguistic acceptability (red), sentence similarity (yellow), and document summarization (blue).

The model was trained on Colossal Clean Crawled Corpus (C4) dataset. Which is a cleaned version of Common Crawl and is two orders of magnitude larger than Wikipedia.

The largest model has 11 billion parameters and achieved state-of-the-art results on the GLUE, SuperGLUE, SQuAD, and CNN/Daily Mail benchmarks.

The pre-trained model can be used as is without any further fine tuning for NLP/NLU tasks such as sentiment analysis, NER, POS, Question Answering, Translation, and Summarization.

Originally published at https://www.engati.com.

business
Engati
Engati
Read next: Why Denny's Is the Perfect Starter Job for a Cook
Engati

Engati is the leading chatbot platform that allows to build chatbots of varying complexities & scale with ease.

A bot marketplace to choose template, conversational flow builder, easy training, integration options, deploy on various channels

See all posts by Engati