The modeling behind simulating the audio of a longplay of a rando seed of a Zelda game isn't that complex actually.
You'd pretty much need to track arbitrary pick-up moments for various items that make sounds, then just flip back and forth between overworld and dungeon themes and do a series of attack sounds into defeat sounds into item pickups.
It doesn't even need to flow "correctly" as long as you don't do something so wrong people using it for chillout audio twinge on it (boss growls + overworld theme, frex)