Article - Do reasoning models use their scratchpad like we do? (2025)


  • Full title: Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
  • Author(s): Fabien Roger
  • Year: 2025
  • Link: https://alignment.anthropic.com/2025/distill-paraphrases/
  • Relevant for:

Summary

  • Worry is that models might improve performance through encoded reasoning or Chain-of-Thought steganography
  • Distilling paraphrased reasoning traces back into a non-reasoning model improves the pass@1 accuracy to roughly the same as the pass@1 accuracy of the original reasoning model, indicating that the model is not using encoded reasoning to squeeze out extra performance from its scratchpad

Flashcards




Related posts