Repurposing Grokking: From Research Paper to Interactive Website

My research paper looked at grokking, a phenomenon where neural networks memorize their training data fast but don’t actually learn to generalize until way later in training. The paper used six sources to work through a question that none of them fully resolved: does this delayed jump from memorization to generalization mean machines are doing something that looks like real understanding, or is it just a statistical artifact? I went through Power et al.’s original experiments, DeMoss et al.’s theory about internal compression, and Carvalho et al.’s skeptical pushback, and I used Chase and Simon’s 1973 chess study as a way to connect what machines do to how human expertise works. For this project, I turned that research into an interactive single-page website built for people who have no background in machine learning.

An interactive website works differently than an academic paper in some pretty fundamental ways. A paper is linear. You expect the reader to start at the beginning and follow your argument all the way through. A website doesn’t work like that. People scroll at their own speed, stop where something catches their attention, and skip whatever doesn’t. That means every section has to make sense on its own while still fitting into a bigger picture. Websites also lean on visuals much more heavily than papers do. Charts and interactive elements aren’t just there to look nice—they’re how the content actually gets communicated. The language has to shift too. Technical terms that make sense in an academic paper become roadblocks on a website, where the whole point is to pull people in rather than prove you know the material.

The biggest structural change I made was breaking apart the paper’s thesis-driven argument and rebuilding it around questions. Instead of walking through a five-source synthesis organized around one central claim, the website is set up around the questions someone would actually have if they stumbled onto the topic: What is grokking? What does the training process look like? What’s going on inside the network? How does this relate to how humans learn? Is it real understanding or not? That’s how people use the web. They show up with a question and click around until something answers it. Each section on the site opens with one of those questions and ends in a way that makes you want to keep scrolling to the next one, so the narrative holds together even though nobody is forced to read it in order.

The centerpiece of the site is an interactive version of the grokking training curve. In my paper I described this in words: training accuracy shoots up to near-perfect almost right away, but test accuracy just sits there doing nothing for hundreds of thousands of steps before it suddenly spikes. That description is fine when your reader is expected to build a mental picture from text. On a website it felt like a waste. I built a slider that lets users drag through the training process and watch the two lines move in real time. As they drag, text below the chart updates to explain what’s happening at each stage—memorization, the long gap where nothing seems to change, and then the moment generalization kicks in. It works because the surprise of grokking hits harder when you’re controlling the timeline yourself. You sit through that flat plateau and start thinking nothing is going to happen, and then it does. No paragraph can replicate that feeling.

The other major interactive piece was where I made the biggest departure from the original research. My paper used Chase and Simon’s chess experiment to draw a parallel between human expertise and machine generalization. Masters recall real chess positions way better than beginners, but on random boards they’re no better than anyone else. The takeaway is that expertise is about learning structure, not having better raw memory. I originally planned to put an actual chess board in the website, but once I started building it I realized a lot of my audience probably wouldn’t know what a structured chess position looks like versus a random one. The distinction wouldn’t land. So I replaced it with an eight-by-eight grid game. It shows you colored blocks for five seconds and then you try to remember where they were. One round uses a clean line pattern, the other scatters blocks randomly. People almost always do better on the structured round, and the text afterward connects it back to grokking: structure is easier to remember than randomness, for you and for neural networks. It loses some of the historical connection to Chase and Simon, but it gets the point across more clearly, which matters more for this audience.

Some smaller changes shaped the site too. I swapped in-text citations for linked references in a footer so readers can follow up without getting pulled out of the flow. The tone is more conversational than the paper but still precise—I didn’t want to dumb anything down, just make it less formal. I used analogies where the paper used technical definitions, like comparing grokking to a student who memorizes every answer key and then one day starts actually solving problems. The visual design also does work that the paper can’t. The site uses a warm cream palette with serif fonts and hover effects that feel tactile, which pushes back against the expectation that machine learning content has to look cold or intimidating.

Building this site actually reinforced something my paper argued in the abstract. The boundary between memorizing information and understanding it is less clear-cut than it seems. I had to stop reproducing my own argument and instead internalize its structure well enough to rebuild it for a completely different format and audience. In a way, the repurposing project turned into a small-scale version of the thing it was trying to explain.

Works Cited

Carvalho, Breno W., et al. “Grokking Explained: A Statistical Phenomenon.” arXiv, 3 Feb. 2025, arxiv.org/abs/2502.01774.
Chase, William G., and Herbert A. Simon. “Perception in Chess.” Cognitive Psychology, vol. 4, no. 1, 1973, pp. 55–81. Accessed 13 Mar. 2026.
DeMoss, Branton, et al. “The Complexity Dynamics of Grokking.” arXiv, 13 Dec. 2024, arxiv.org/abs/2412.09810.
Hudson, Justin, and Chase Hudson. “Reconstructive Inference Without Memory: Why Some Details Persist in Stateless Human-AI Interaction and Others Do Not.” PhilArchive, 14 Dec. 2025, philarchive.org/rec/HUDRIW.
Humayun, Ahmed Imtiaz, Randall Balestriero, and Richard Baraniuk. “Deep Networks Always Grok and Here Is Why.” Proceedings of the 41st International Conference on Machine Learning, PMLR, vol. 235, 2024, pp. 20722–20745.
Power, Alethea, et al. “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.” arXiv, 6 Jan. 2022, arxiv.org/abs/2201.02177.