Generative AI with LLMs Week 3 (2)
29 Jun 2023Here are course notes I am taking from the DeepLearning.ai course on Coursera: Generative AI with Large Language Models.
Topic: LLM-powered applications
Model optimizations for deployment
Distillation
W3 Slide p.78-79
- The combined distillation and student losses are used to update the weights of the student model via back propagation
- In practice, distillation is not as effective for generative decoder models.
- Distillation is typically more effective for encoder-only models, such as BERT that have a lot of representation redundancy
Post-Training Quantization (PTQ)
W3 Slide p.81
- quantization often results in a small percentage reduction in model evaluation metrics (often be worth the cost savings and performance gains).
Pruning
W3 Slide p.82
Generative AI Project Lifecycle Cheat Sheet
W3 Slide p.83
Using the LLM in applications
Models having difficulty
- Information is out-of-date
- Wrong answer in math question (arithmatic since LLM is trained on prediction next token, not calculations)
- Hallucination
Retrieval Augmented Generation (RAG)
(back later)
- Allow the LLM to access information in external data set
W3 Slide p.92
- Lewis et al. 2020 “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”
- Avoid model hallucinating
RAG integrates with many types of data sources
External information sources:
- Documents
- Wikis
- Expert Systems
- Web pages
- Databases
- Vector Store
Vector Store
- vector representations of text
- enable a fast and efficient kind of relevant search based on similarity
Data preparation for vector store for RAG
Two considerations for using external data in RAG:
- Data must fit inside context window
- Prompt context limit: few 1000 tokens
- Single document is too large to fit in the window
- Split long sources into short chunks (e.g. LangChain can do that)
- Data must be in format that allows its relevance to be assessed at inference time: Embedding vectors
- Prompt text converted to embedding vectors
- Process each chunk with LLM to produce embedding vectors
- Vector store allows fast searching of datasets and efficient identification of semantically-related text
Vector database search
- Each text in vector store is identified by a key
- Enables a citation to be included in completion
Knowledge check
Q: How does Retrieval Augmented Generation (RAG) enhance generation-based models?
- By making external knowledge available to the model
- By applying reinforcement learning techniques to augment completions.
- By optimizing model architecture to generate factual completions.
- By increasing the training data size.
Correct: The retriever component retrieves relevant information from an external corpus or knowledge base, which is then used by the model to generate more informed and contextually relevant responses. This incorporation of external knowledge enhances the quality and relevance of the generated content.
Interacting with external applications
W3 Slides P.104,105
Helping LLMs reason and plan with Chain-of-Thought Prompting
- Cannot handle mutli-step reasoning problem
- Strategy: to prompt the model to break the problem down into steps
W3 Slides P.108
- Wei et al. 2022, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”
Program-aided language models (PAL)
W3 Slides P.113
- Gao et al. 2022, “PAL: Program-aided Language Models”
- pairs an LLM with an external code interpreter
- use chain of thought prompting to generate executable Python scripts
- python scripts is then passed to an interpreter to execute
- Example:
W3 Slides P.115
ReAct: Combining reasoning and action
ReAct: Synergizing Reasoning and Action in LLMs
- Tailored thought reasoning + action planning
- Proposed by researchers from Princeton and Google in 2022
- Yao et al. 2022, “ReAct: Synergizing Reasoning and Acting in Language Models”
- Two benchmarks:
- HotPot QA: multi-step quesetion answering
- Fever: Fact verification
Steps
- Question
- Thought
- Action
- Observation (Conclusion from action) Go back to 2
Reiterate until the full question is solved.
ReAct instructions define the action space
Solve a question answering task with interleaving Thought, Action, Observation steps.
Thought can reason about the current situation, and Action can be three types:
1. Search[entity]: searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
2. Lookup[keyword]: returns the next sentence containing keyword in the current passage.
3. Finish[answer]: returns the answer and finishes the task.
Here are some examples.
W3 Slides P.131
LnagChain
- provides you with modular pieces that contain the components necessary to work with LLMs:
- (1) prompt templates for many different use cases that you can use to format both input examples and model completions
- (2) memory that you can use to store interactions with an LLM
- (3) pre-built tools that enable you to carry out a wide variety of tasks, including:
- calls to external datasets, and
- various APIs
- (4) Agents: to interpret the input from the user and determine which tool or tools to use to complete the task
W3 Slides P.132
ReAct: Reasoning and action
This paper introduces ReAct, a novel approach that integrates verbal reasoning and interactive decision making in large language models (LLMs). While LLMs have excelled in language understanding and decision making, the combination of reasoning and acting has been neglected. ReAct enables LLMs to generate reasoning traces and task-specific actions, leveraging the synergy between them. The approach demonstrates superior performance over baselines in various tasks, overcoming issues like hallucination and error propagation. ReAct outperforms imitation and reinforcement learning methods in interactive decision making, even with minimal context examples. It not only enhances performance but also improves interpretability, trustworthiness, and diagnosability by allowing humans to distinguish between internal knowledge and external information.
In summary, ReAct bridges the gap between reasoning and acting in LLMs, yielding remarkable results across language reasoning and decision making tasks. By interleaving reasoning traces and actions, ReAct overcomes limitations and outperforms baselines, not only enhancing model performance but also providing interpretability and trustworthiness, empowering users to understand the model’s decision-making process.
ReAct.png
Image: The figure provides a comprehensive visual comparison of different prompting methods in two distinct domains. The first part of the figure (1a) presents a comparison of four prompting methods: Standard, Chain-of-thought (CoT, Reason Only), Act-only, and ReAct (Reason+Act) for solving a HotpotQA question. Each method’s approach is demonstrated through task-solving trajectories generated by the model (Act, Thought) and the environment (Obs). The second part of the figure (1b) focuses on a comparison between Act-only and ReAct prompting methods to solve an AlfWorld game. In both domains, in-context examples are omitted from the prompt, highlighting the generated trajectories as a result of the model’s actions and thoughts and the observations made in the environment. This visual representation enables a clear understanding of the differences and advantages offered by the ReAct paradigm compared to other prompting methods in diverse task-solving scenarios.
LLM application architectures
(back later)
W3 Slide P.142