Reasoning or Simply Next Token Prediction? Stress-Testing Large LLMs arxiv.org 2 points by PaulHoule 3 months ago