I love a healthy debate!
The one major factor you're forgetting is time. Consider a 10 stop ND. The light it captures has been produced over say 30 seconds. You can't expose the sensor for 30 secs without a 10 stop, as it will massively over expose. And it's the events that happen during those 30 secs (waves moving, for example) that cause the effect.
A camera can't reproduce that effect over the course of a standard exposure time.