The operative limitation is the interaction model, so quote it and translate. Apple's grant US12645292B2, "Head mountable display" (issued June 2, 2026; assignee Apple Inc.), carries just two CPC codes — G06F 3/013 (eye-tracking input) and G06F 3/017 (gesture input). When a head-mounted-display claim classifies to gaze and gesture and nothing else, the invention is the control scheme, not the optics.
Here's what it reads on. The product everyone pictures is Apple's spatial headset, where you look at a button and pinch your fingers to select it — no controller in hand. That look-and-pinch interaction is exactly what gaze-input plus gesture-input classification describes. The element that does the work is the fusion of where you're looking with what your hands are doing, turned into a UI selection. That is the device's signature, and it is what the claim is anchored to.
Why does Apple anchor a hardware-sounding patent ("head mountable display") to a software-sounding interaction? Because the optics and the headset form factor are crowded art — Magic Leap, Meta, Microsoft, and others have deep optical portfolios. The differentiated, defensible piece is the controller-free interaction model. Read the claim and the scope tracks that judgment: it is narrow on the interaction, not broad on the hardware.
The discipline to keep here: a grant on a gaze-and-gesture head-mounted display does not give Apple "AR headsets." It gives Apple this specific interaction architecture on a head-mounted device. A competitor shipping a headset with physical controllers, or with a fundamentally different selection mechanism, does not read on this claim. Scope ends at the look-and-pinch model.
For a strategist, the value is in what the classification reveals about Apple's bet. The company is not trying to out-optics the incumbents; it is trying to own the input paradigm — the way humans tell a spatial computer what they want. A granted patent on that paradigm, anchored to gaze and gesture, is a more durable competitive instrument than any single lens design, because the interaction is the thing users actually feel.