When water freezes, it expands. Thus, my latest invention places a weight on a column of water, then freezes the water to lift the weight, doing work. By using a weight heavy enough that the work done is greater than the energy expenditure to freeze the water, I create a perpetual motion machine.

The Second Law of Thermodynamics tells you that this is impossible. But it tells you so in a non-constructive way. It doesn’t say which step is mistaken, what unmentioned physical mechanism is required for its satisfaction, or exactly how it works out mathematically. Even partial explanations—say, water under pressure takes more energy to freeze—aren’t completely satisfactory, leaving questions like: Why does it work out no matter how much weight I load it with? How would it be different for something like mercury that contracts when freezing? The Second Law only tells you it’s hopeless.^{a}

That’s fine when you’re deciding whether or not to invest in my startup, but if you’re trying to learn about the world, you should feel a persistent itch, a gap where an explanation should go. And it’s not the simple sensation of not knowing. You might describe it as tension, confusion, or even paradox.

There are (at least) two major reasons to pay attention to the use of paradox in teaching and learning. The first is that itch—it’s motivating. There’s not merely a blank space in your picture of the world—there’s an apparent contradiction. If you try to fill in the gap from different directions, you get different answers. One of my major qualitative takeaways from my old self-experiment on noticing confusion was that in cases like this I didn’t even need any kind of trigger-action plan to follow up on the noticing, since the need to resolve it was so powerful—and even having written that, I continue to underestimate the power of this effect.

So if you’re psychologically anything like me, for the sake of motivation alone, I suggest you look for ways to manufacture contradictions out of gaps in your knowledge.

The second thing to pay attention to is what happens when you resolve the paradox. On the most benign level, you get a fuller picture of why things are the way they are when you see that they couldn’t be some other way. In a sense, no explanation is complete without this. Going a bit deeper, doing this helps you develop a more integrated mental model—you don’t have recipes for particular calculations, but mechanisms that need to be considered wherever applicable. Explanations can’t just be one-off descriptions of unexplained phenomena, but instead are part of the machinery that keeps everything adding up to normality.

So when debugging, try to prove that your code works. If you don’t know the answer to a question, find a reasonable-seeming answer that evidently doesn’t work and ask why. If you don’t know what something is, ask how it’s different from something that seems like it should be similar. Imagine counterexamples and attempt proofs by contradiction. “Why do things this way?” can manufacture questions of the form “why not do things this other way?”—even if the other way is overly simple, or you don’t have a good explanation for doing it the other way either, replacing the former question with the latter can be very effective. “How does the argument for this position work?” gives you “why doesn’t your argument go through in this other domain?”

Be especially watchful for incomplete explanations—the ones that point to the right answer but don’t know their own limits. The sky is blue rather than red because of the wavelength dependence of Rayleigh scattering, but why is it blue rather than violet?

But be careful—the feeling of resolving paradoxes can be extremely satisfying, and the impression of depth of explanation that you get can be unduly persuasive. It’s important to avoid accidentally drilling “things that seem confusing are always nothing to worry about”, even if it’s by resolving the confusion instead of suppressing it. Instead, you want to frame this kind of pedagogy as “things that seem unsatisfactory are worth poking”—using points of confusion not to show the power of the right simple foundations to resolve them, but to expose new ideas and emphasize that they’re necessary to make things work out right.

A lot of E. T. Jaynes’s writing works by setting up veridical paradoxes and knocking them down. It has a powerful psychological effect on a certain mindset^{b}—as exposition it’s both engaging and persuasive. But it can easily leave one in ignorance of why the advocated approach might be unsatisfactory. You begin to feel like anything opaque or confusing has some natural explanation in terms of what you already know. You feel safe black-boxing any mystery as essentially already resolved, as it awaits only the application of the same foundations as everything else.

This resolving-paradoxes-as-rhetorical-style is especially explicit in Jaynes’s “Clearing Up Mysteries”, from a 1988 MAXENT workshop:

The recent emphasis on the data analysis aspect stems from the availability of computers and the failure of “orthodox” statistics to keep up with the needs of science. This created many opportunities for us, about which other speakers will have a great deal to say here. But while pursuing these important applications we should not lose sight of the original goal, which is in a sense even more fundamental to science. Therefore in this opening talk we want to point out a field ripe for exploration by giving three examples, from widely different areas, of how scientic mysteries are cleared up, and paradoxes become platitudes, when we adopt the Jeffreys viewpoint. Once the logic of it is seen, it becomes evident that there are many other mysteries, in all sciences, calling out for the same treatment.

In his first example, diffusion, he states a naive paradox: our particles have a symmetric velocity distribution, so how do we get nonzero flux anywhere? He then describes Einstein’s derivation of Fick’s diffusion law as unsatisfactory, and then gives a clear derivation using Bayes’ theorem.

I had to look up Einstein’s argument^{c}, and it’s true that it derives the diffusion equation in terms of velocity distributions and particle density over time without reference to any instantaneous flux, whereas Jaynes’ version is from the perspective of a single particle at a single instant in time (the rest coming in through the prior—a particle is more likely to *have come from* a denser region) and gives the flux among other results.

But the derivation/presentation I first learned in stat mech^{d}is at least as clear about what I see as the main point—given a surface, more particles pass through from the denser side—without invoking Bayes. It’s also more immediately obvious to me how to adapt it to other problems, although that might just be because I’m not as used to the methods Jaynes likes.

Did Jaynes find that one unsatisfactory too? He doesn’t bring it up, maybe because he hadn’t seen it or because it made a weaker point than talking about how handicapped Einstein was by not knowing the true logic of science. I’d guess it was unsatisfactory to him, since from a certain perspective it still depends on “forward integration” that Jaynes prefers to think about as prior information, although it can be trivially rephrased.

But why should I care about deriving a flux from a probability distribution for a single particle rather than from the perspective of a single surface? That seems less important to me than the clarity of the moral lesson and the general adaptability. And the “paradox” is still resolved by the same moral lesson and by noting that while that the statistical velocity distribution of all particles has zero mean, you can still have flux across a specified surface—the mean velocity only tells you about the motion of the particles’ center of mass, which indeed won’t move under free diffusion! It’s natural that taking some subset of the particles should give you a different distribution, and there are natural ways of thinking about it that aren’t in Jaynes’s terms—just think of the motion of rocks in Saturn’s rings.^{e}

Thus, a slightly more specific warning: a benevolent & complete Paradox Pedagogy needs to remember that the resolution to a paradox (e.g. how do we get net particle flux anywhere if the velocity distribution is symmetric) has to explain

- a correct solution
^{f} - why other solutions with a claim to correctness are unsatisfactory
^{g} - why it’s not a paradox; i.e., why you shouldn’t have expected the naive reasoning to work
^{h}

ideally making it clear that the reason it’s not a paradox is somewhat independent of any particular resolution itself.

If you only give one or two of these, then you’re at greater risk of being unduly persuasive regarding the One-True-Way-ness of your methods. And it’s not just a matter of intellectual etiquette or scrupulous hedging. You really won’t have explained everything—there will be lingering confusion, noticed or otherwise.

How far could this be taken? Could one organize a physics course around rejecting naive arguments for false conclusions? Of course, a lot of classic paradoxes are uninteresting edge cases, but you can take care not to make it a game of “spot the hidden violated assumption that’s never violated in practice”.

How much thermodynamics can you get with “why doesn’t this perpetual motion device work?” What do we lose by presenting quantum mechanics by fiat? Do we beat into students an instinct for suppressing their confusion and accepting non-explanations? My experience is that typical undergrad courses tend to leave students confused about what exactly is special or surprising about quantum mechanics, at least until they start trying to show whether various experimental results have [semi]classical explanations.^{i}

It certainly feels like a lot of my new insights come from asking why I can’t just do the stupid, obvious thing, or how two vaguely contradictory things can be compatible—in contrast to, say, thinking really hard about how to solve a given problem. But it’s not the answer to everything; plenty of approaches are better than just thinking real hard.

- And you can get even trickier: you notice that pressure lowers the freezing point. So if you’re operating at the right point on the phase diagram, you can freeze the water by decreasing pressure above it, thus lifting your weight and doing work. Then you let it warm up and melt, then increase the pressure and cool it back down to get where you started in the liquid phase, giving you a different engine cycle where it’s arguably harder to show why it can never give perfect (or better) efficiency. (back)
- as it sounds like Eliezer Yudkowsky experienced and [I think unfortunately] tried to emulate (back)
- section 4 in his Brownian motion paper, which is roughly how I learned to think about transport in condensed matter, I think mostly because it gives a good entry point into the Sommerfeld approximation (back)
- which is basically the one on Wikipedia, and more recent than Einstein’s but still prior to 1988 (back)
- From this perspective, Jaynes
*avoids*making clear why you shouldn’t have expected the total velocity distribution alone to tell you about flux. (back) - “you can incorporate information about where a particle came from to get a velocity distribution from the perspective of a single particle, which can be used to calculate the flux” (back)
- “forward integration”, at least to Jaynes (back)
- “the overall mean velocity tells us about the motion of the mean of the particle distribution, involving total flux across all surfaces, which is indeed zero in the absence of boundaries or the density going off to infinity, and is totally consistent with nonzero flux across any given surface; but for a given surface, nearby particles may be more likely to have come from one direction, so it doesn’t make sense to reason from just the overall distribution without incorporating that additional information from the particles of interest” (back)
- Jaynes himself spent a while attempting a semiclassical account of the Lamb shift, which despite some progress ended with losing his $100 bet on it. (back)