Our from-scratch switch OS is now trading routes with a Cisco Nexus over a real routing protocol. Yesterday I posted that it could forward packets. Today it speaks OSPF.

This was a different kind of hard, and it is worth explaining why. A switch ASIC forwards in hardware at line rate and deliberately ignores the CPU — that is what makes it fast. But OSPF runs on the CPU. So every one of the protocol’s control packets has to be individually trapped and copied up to it: the hellos to 224,0,0,5, and the adjacency packets that arrive with a TTL of 1, which the chip otherwise drops on sight. On a commercial stack this is handled for you. We had to build the chip’s packet classifier from dead silicon just to start.

Then the wall that ate a week. A match-everything test rule set to DROP killed both uplinks instantly — proof the classifier was matching. The exact same rule set to COPY-to-CPU delivered nothing. Match worked, copy did not. Around ninety build-test-reboot cycles disappeared into that gap.

What finally broke it was not another guess. We dumped every configuration register on the chip and diffed it, address for address, against a capture of the OS that used to run on this hardware. In minutes the diff lit up a cluster of registers set on the reference and zero on ours: the chip’s multicast replication engine. That was the missing idea — a copy-to-CPU is internally a multicast replication to the CPU port. No replication engine, no copy, no matter how perfectly the rule matches. We programmed those, added the TTL=1 traps, and the packets reached the daemon.

ospfd: Neighbor 10,101,1,241 Loading -> Full.

Stable and bidirectional, and every route the Nexus advertises is now forwarded in hardware.

The takeaway I am keeping: when a known-good reference keeps setting something you don’t, stop guessing one register at a time and diff all of them at once. That single diff found in minutes what ninety cycles could not.

More fiber being added. More to come.