Due to the way lockstep networking works, it's just passing around button presses. The reason the magic works is because everyone started in the same state (A) and the same sequence of button presses will always result in the current state (B) because everything in the game is deterministic.
You don't need to send the master state to all the clients at the start of the session because they all have a local copy of the initial state, the .map file and the settings file that adjusts some parameters. They all just load that data into RAM and activate it from the same point, because they know everyone else has the same data and the same starting point.
Now, dropping out a player. This is the easier of the two, since you just need to get rid of one play from the button press relay. It still requires a lot of work, but it's definitely less of a headache. FUN FACT ABOUT REACH/QUIRK OF THIS FUNCTIONALITY: get a checkpoint in co-op. Go past the checkpoint for a while, and have someone quit. You'll notice that it'll say "xxxxxx has left the game" each time you revert - because it's literally replaying the player leaving the game to keep everyone's simulation in sync.
Now why can't we just add players to the session? Can't we just add them to the button press relay? Nope. They need to be given the same state as everyone else; not a close estimate, not a good guess, literally 1:1. Or the simulations will drift and the game desyncs.
The game's state can be huge. I don't know the exact size of Halo's game state, but it could be anywhere in size. It is going to be considerably big. Let's just take a rough guess and say it's 40MB-100MB. This is a lot of freaking data to just send over the pipe to someone, considering some people are using 512/512kb DSL still. And nobody can do anything while the data is being sent to the person joining. Not really feasible. So the binary blob of the game state is right out.
Another option is to send the joining client a recording of the game up to that point, and play the record in ultrafast-forward until it gets to the current state. You can do neat tricks like capture snapshots of the game state along the way, but this replay file can get huge, even bigger than the game's state. Remember, drop-in has to work if we've been playing on The Ark for 20 minutes or 200 minutes or even 2000 minutes. If you've played an RTS that lets you join a game in progress, they tend to use this method. Once again though, you're running afoul of Microsoft's bandwidth limits and home user connections. Whoops. So both our "best" methods of allowing someone to join a lockstep session are technically possible, but not feasible.
Asynchronous networking modes like competitive multiplayer care LESS about being out of synch. The game state is not only smaller, it's not even the same size on all the boxes. The host actually has much more information than the clients and doesn't share everything with them to keep as little data as possible from being sent down the wire. This is why you lose sprees and perfections in Halo titles if the host has to move to another box, because only the host determines if you get a spree medal ("oh hey client 10, i see you got 5 hammer kills in a row, please award this medal to yourself and display it on your UI"), and it does not share the ongoing spree data of every player in the game with everyone else. The new host after a bluescreen doesn't have that data available when it's reconstructing the session. The state is small enough that when you join a session, it just needs to send you the bare minimum to get working.. an asynch session can miss a huge chunk of updates but get back on track and "heal" once it digest enough new data on everyone's positions and status.
The tradeoff to the lower sized game state is that it requires a lot more ongoing bandwidth to keep going. So to have AI with full functionality in asynch, they pretty much have the bandwidth cost of a normal player. The more AI you have, the more bandwidth they cost. Versus lockstep, which uses the same bandwidth if you have 10 or 10,000 AI on one screen. On the upside,you have no button latency. But the asynch bandwidth cost for all the AI is too much for most home connections and Microsoft regulations.
Now put the asynch host on an Azure dedi and you've solved the bandwidth problem.
sidenote: this is why party film clips barely worked in halo 3 but full game films tended to work all of the time. The film clips needed all 4 people in the session to start a the same state, which meant the host starting the film, figuring out state frame 0, then sending that state to the other 3 players, then playing all the button presses to the other players. and it could be a pretty hefty binary blob. Connections were so slow you were more likely to time out of the session waiting for the host to send gigantic state files to 3 different clients than actually see the film ever play.
tl;dr a wizard did it