Until then, you might want to consider using the layer picker behavior. Create frames that match the appearance that you need and use layer picker to select the frames based on your audio. I've used it to link light intensity to a robot's voice, you should be able to apply the same principle.