Reasoning with o1 - short course from DeepLearning.ai

09 Jan 2025

Here are some notes taking from the Short Course Reasoning with o1 offered by DeepLearning.ai.

Introduction to o1

A breakthrough in reasoning: the ability to think for a longer time (at inference time) and get better results.

Improvements observed in:

usees large-scale RL to generate a chain of thought (CoT) Wei et al. 2022 before answering
CoT is longer and high-quality than what is attained via prompting
[CoT contains behavior like]:
- Error correction
- Trying multiple strategies
- Breaking down problems into smaller steps
**[Example CoTs on the research blog post!
o1 made progress in Abstract reasoning compared to 4o, e.g., being able to correctly classify 16 words into 4 categories.

For some problems, verifying a good solution is easier than generating one
- Many puzzles (Sudoku, for example)
- Math
- Programming
Examples where verification isn’t much easier
- Information retrieval (What is the capital of Bhutan?)
- Image recognition
When a generator-verifier gap exists and we have a good verifier, we can spend more compute on inference to achieve better performance

[Data Analysis]: Interpreting complex datasets (e.g., genomic sequencing results in biology) and performing advanced statistical reasoning.
[Mathematical Problem-Solving]: Deriving solutions or proofs for challenging mathematical questions or in physics theory. Experimental Design: Proposing experimental setups in chemistry to test novel reactions or interpreting complicated physics experiments’ outcomes.
[Scientific Coding]: Writing and debugging specialized code for computational fluid dynamics models or astrophysics simulations. Biological & Chemical Reasoning: Solving advanced biology or chemistry questions that require deep domain knowledge.
[Algorithm Development]: Aiding in creating or optimizing algorithms for data analysis workflows in computational neuroscience or bioinformatics.
[Literature Synthesis]: Reasoning across multiple research papers to form coherent conclusions in interdisciplinary fields such as systems biology.

The ol family of models scale compute at inference time by producing tokens to reason through the problem
ol gives you more intelligence as a trade-off against higher latency and cost
It can also perform well at tasks that require a test and learn approach where it can iteratively verify its results
Some great emerging use cases are planning, coding, and domain-specific reasoning like law and STEM subjects

Write prompts that are straightforward and concise. Direct instructions yuield the best results with the o1 model.

You can skip step-by-step (‘Chain of Thought’) reasoning prompts. The o1 models can infer and execute these itself without detailed breakdowns.

Break complex prompts into sections using delimters like markdown, XML tags, or quotes.

This structured format enhances model accuracy - and simplifies your own troubleshooting.

Rather than using excessive explanation, give a contextual example to give the model understanding of the broad domain of your task.