So a couple of thoughts:

The Timeout errors lead me to believe that you are running out of disk io.

Your object-server workers are a lot higher than I would run for a server with 4 drives. That means there are 8 workers for each drive. I would cut it back to 4-8, and only increment if further testing doesn't cause issues. If things are stable at that point, then you can look at increasing workers incrementally with further testing.

I would also look at turning your disk caches off. They are likely masking the issue at the beginning. The methods will vary by os and if you are using RAID controller, but is pretty easy to google.

The other thing that I would check is to see if the background processes are running too agressively. You could try running with replication/auditors/etc. turned off just as a test to see if that produces any effect. If so, then it would be worth tuning their concurrency, and run delays.

That should be a good start.

-- Chuck