First, hide the next button on enter.
You'll need a conditional advanced action (or shared action) on each of the audio buttons and a variable for each of them.
Variables (all set to zero):
v_btn1Played
v_btn2Played
v_btn3Played
The action will:
1. set the variable for the clicked button to 1
2. To ensure the learner listens to the whole audio, also DISABLE the buttons
3. DELAY the number of seconds that the button's audio plays
4. ENABLE all the buttons
5. Check the other buttons to see if they've been clicked:
IF v_btn1Played is equal to 1 AND
v_btn2Played is equal to 1 AND
v_btn3Played is equal to 1
THEN show the next button