Think of the GPU CUs as buckets.
The theoretical maximum teraflops is how much water they can all hold if they were all filled to the brim multiplied by how many times per minute they are all moved from the source pool of water and poured out into the destination pool of water.
How full each bucket gets to be depends on the efficiency of the game code and the rest of the GPU machine that moves them around.
XSX is like having 52 buckets moving at 182mph.
PS5 is like having 36 buckets moving at 223mph.
Typical game code can only fill each bucket 30% full each time it comes around to the source pool again to fill up.
It takes time for the water to flow into these buckets as they scoop up water.
Sometimes when a bucket scoops up some water it can also catch a fish in it, and we don’t want that because my weird ass analogy says so.
In the 52 bucket version, if a fish is detected in the bucket as it’s moving from pool to pool, it has to empty the entire bucket out and go back and start scooping again.
In the 36 bucket version, if a fish is detected in the bucket as it’s moving from pool to pool, Mr Coherency Engine can tell Mr GPU cache scrubber to yoink the fish out and let the bucket and remaining water carry on its way.
This is simplified and the analogy could be expanded to account for more of what is going on and be a better fit for the kind of process that is happening, but this is a rough notion of what Matt was talking about.
Because the fish’d bucket hasn’t got to empty everything and start again the 36 bucket system averages a higher fill percentage per bucket because it never has empty ones to bring the average down.
Because the fish’d bucket doesn’t need to be refilled again from scratch, there are never any buckets being filled that will be tipped out and need to come back, meaning the average fill rate of buckets is higher in the 36 bucket system.
Because of this Matt is saying it’s not just as simple as counting the amount of buckets or the speed they travel.
In a perfect world with no fish the 52 bucket system delivers more water.
In a world with fish in your source water pool, you can’t make that calculation, and more than just that, Matt seems to think this fish plucking system is actually more significant than just counting buckets and bucket speed and multiplying them.
He’s suggesting the Coherency Engine and GPU cache scrubbers effectively increase CU occupancy by having them stall less often due to cache misses caused by entire caches being flushed instead of selectively pruned.
He’s suggesting that because refetching of cache data from system memory when entire caches are invalidated isn’t required, the system memory bandwidth requirement is less, because it’s not having to keep refetching the same pages as the good get thrown out with the bad.
It’s about efficiency vs brute force.
It’s about smarter buckets versus shit loads of them.
It’s suggesting what has been murmured for a while by developers in that there’s just not much difference between the two in real world game code. Not as much as comparing theoretical numbers might suggest.
PS5 is quite exotic, and these cache scrubbers are just another example of how streamlined and efficient the entire IO pipeline is to the point comparing numbers while assuming all other things are equal doesn’t really work. At least not to the extent some people are thinking.
It’s like if we had gasoline engines whereby spark plugs only fired 30% of the time in real world driving, even if they could do so 100% of the time in an engine stand in ideal conditions. If we wanted to compare two engines with different cylinder counts and maximum RPMs, but neglected to factor in that one has got some new and unconventional spark plugs that fire more than 30% of the time in real world driving, just multiplying the cylinder count by the RPM wouldn’t make for an accurate comparison.
PS5’s Coherency Engines and GPU cache scrubbers will make a real world difference to CU occupancy and effective GPU system memory bandwidth. We have no idea to what extent in typical game code. Matt thinks it will be significant.