Okay, let me explain what's happening. When you think you've put a split in a file, you haven't really done anything of the sort. You have to bear in mind that Multitrack view is essentially a giant file-player, and clips, in reality, are just sections of files arranged how you want them on the time line. You can use them over and over again if you want (looping) but the one thing you can't do with them is physically split the file they've come from - this can only happen in Waveform view.
If you merge clips, then essentially what you are doing is playing everything you've merged into a localised mix-down. And that creates a new file. And of course that's going to be bigger, as it's got all the audio in it, essentially repeated.
So in fact you're doing it correctly - it's just that you haven't quite appreciated what it is about non-linear editing that makes it the way it is. The splits don't matter - in fact if you want to alter anything after the event it's rather useful that they are still there, because they can be manipulated if needs be.
But it is only a visibility thing. Premiere, for instance, handles the display of this differently, so you can create submixes that don't appear to have splits all over them - but actually they still do. Audition has never handled it like that, and personally I prefer seeing what's actually there, rather than a sanitised representation of it - but maybe that's just me.