Researchers

Apple and Duke Researchers Present a Reinforcement Learning Approach That Enables LLMs to Provide Intermediate Answers, Enhancing Speed and Accuracy

Long CoT reasoning improves large language models’ performance on complex tasks but comes with drawbacks. The typical “think-then-answer” method slows...

Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning

Traditional approaches to training language models heavily rely on supervised fine-tuning, where models learn by imitating correct responses. While effective...