Coursera: Transformer Models and BERT Model

10 Jan 2025

Just a record of quiz done as a test of knowledge.

What is the name of the language modeling technique that is used in Bidirectional Encoder Representations from Transformers (BERT)?

What is a transformer model?

A deep learning model that uses self-attention to learn relationships between different parts of a sequence.
A natural language processing model that uses convolutions to learn relationships between different parts of a sequence.
A computer vision model that uses fully connected layers to learn relationships between different parts of an image.
A machine learning model that uses recurrent neural networks to learn relationships between different parts of a sequence.

What kind of transformer model is BERT?

What does fine-tuning a BERT model mean?

Training the model on a specific task by using a large amount of unlabeled data
Training the model on a specific task and not updating the pre-trained weights
Training the hyper-parameters of the models on a specific task
Training the model and updating the pre-trained weights on a specific task by using labeled data

What is the attention mechanism?

A way of determining the importance of each word in a sentence for the translation of another sentence
A way of identifying the topic of a sentence
A way of predicting the next word in a sentence
A way of determining the similarity between two sentences Correct Correct! 1 / 1 point

What are the encoder and decoder components of a transformer model?

The encoder ingests an input sequence and produces a sequence of hidden states. The decoder takes in the hidden states from the encoder and produces an output sequence.
The encoder ingests an input sequence and produces a sequence of tokens. The decoder takes in the tokens from the encoder and produces an output sequence.
The encoder ingests an input sequence and produces a single hidden state. The decoder takes in the hidden state from the encoder and produces an output sequence.
The encoder ingests an input sequence and produces a sequence of images. The decoder takes in the images from the encoder and produces an output sequence.

BERT is a transformer model that was developed by Google in 2018. What is BERT used for?

It is used to solve many natural language processing tasks, such as question answering, text classification, and natural language inference.
It is used to diagnose and treat diseases.
It is used to train other machine learning models, such as Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks.
It is used to generate text, translate languages, and write different kinds of creative content.

What are the two sublayers of each encoder in a Transformer model?

What are the three different embeddings that are generated from an input sentence in a Transformer model?

Learning Resources