Xyleph
L99: LIMIT BREAKER
- Seit
- 9 Mai 2006
- Beiträge
- 42.331
Tegra X1 kann bis maximal 500GF hoch gehen bei den relevanten 32bit Flops. Keine Ahnung wieso Nvidia hier so eine Verwirrungstaktik mit ihren 16bit 1TF fährt. Die interessiert in der Spielepraxis niemanden.
Das würde ich so nicht sagen.
Meanwhile GCN 1.2’s other instruction set improvements are quite interesting. The description of 16-bit FP and Integer operations is actually very descriptive, and includes a very important keyword: low power. Briefly, PC GPUs have been centered around 32-bit mathematical operations for some number of years now since desktop technology and transistor density eliminated the need for 16-bit/24-bit partial precision operations. All things considered, 32-bit operations are preferred from a quality standpoint as they are accurate enough for many compute tasks and virtually all graphics tasks, which is why PC GPUs were limited to (or at least optimized for) partial precision operations for only a relatively short period of time.
However 16-bit operations are still alive and well on the SoC (mobile) side. SoC GPUs are in many ways a 5-10 year old echo of PC GPUs in features and performance, while in other ways they’re outright unique. In the case of SoC GPUs there are extreme sensitivities to power consumption in a way that PCs have never been so sensitive, so while SoC GPUs can use 32-bit operations, they will in some circumstances favor 16-bit operations for power efficiency purposes. Despite the accuracy limitations of a lower precision, if a developer knows they don’t need the greater accuracy then falling back to 16-bit means saving power and depending on the architecture also improving performance if multiple 16-bit operations can be scheduled alongside each other.
Quelle: http://www.anandtech.com/show/8460/amd-radeon-r9-285-review/2
You have got it backwards. It is not a case of FP32 catching up to FP16, it is FP16 capabilities being enhanced. It is happening in PC graphics as well, witness AMDs Tonga and where Intel is going since Gen8.
The reasons have been given before. All else being equal, FP16 operations take less power, requires less internal (and external) bandwidth, the hardware takes much less die area which for a given level of performance which lowers cost and improves yield which lowers cost again. Alternatively, for a given budget of die space and power draw, FP16 yields much better performance. Routinely using a compact numerical representation and only using larger formats when actually needed simply makes sense. Why waste limited resources?
I would contend, and recent developments in PC graphics space agrees, that rather than mobile graphics slavishly following in the footsteps of designs targeting high-end desktop PC/HPC, PC graphics will actually be more influenced by mobile solutions. Personal computing is moving to higher pixel densities (making small errors perceptually irrelevant) and laptops are moving towards lighter designs with longer battery lives, increasing demands on power efficiency. So rather than mobile loosing their constraints and being more enthusiast desktop like (SLI! Crossfire! 1200W PSUs!) which is a ridiculous notion, what is actually happening is that the bulk of personal computing is moving towards mobile constraints.
(Indeed, many who aren't emotionally rooted in PC space would contend that mobile is where the bulk of personal computing takes place these days. Windows PCs have become a (large) computing niche.)
If we project forward, these trends don't seem likely to turn around. New silicon tech is unlikely to make compromises unnecessary, rather the lithographic challenges going forward are increasing. If you want development to move forward, regardless of whether you are a tech hungry consumer, or a manufacturer who needs new stuff to sell, being ever more intelligent about how you use available resources seems like a very good idea.
Quelle: https://forum.beyond3d.com/threads/native-fp16-support-in-gpu-architectures.56180/#post-1805823
Sometimes it requires more work to get lower precision calculations to work (with zero image quality degradation), but so far I haven't encountered big problems in fitting my pixel shader code to FP16 (including lighting code). Console developers have a lot of FP16 pixel shader experience because of PS3. Basically all PS3 pixel shader code was running on FP16.
It is still is very important to pack the data in memory as tightly as possible as there is never enough bandwidth to lose. For example 16 bit (model space) vertex coordinates are still commonly used, the material textures are still dxt compressed (barely 8 bit quality) and the new HDR texture formats (BC6H) commonly used in cube maps have significantly less precision than a 16 bit float. All of these can be processed by 16 bit ALUs in pixel shader with no major issues. The end result will still be eventually stored to 8 bit per channel back buffer and displayed.
Could you give us some examples of operations done in pixel shaders that require higher than 16 bit float processing?
EDIT:
One example where 16 bit float processing is not enough: Exponential variance shadow mapping (EVSM) needs both 32 bit storage (32 bit float textures + 32 bit float filtering) and 32 bit float ALU processing.
However EVSM is not yet universally possible on mobile platforms right now, as there's no standard support for 32 bit float filtering in mobile devices (OpenGL ES 3.0 just recently added support for 16 bit float filtering, 32 bit float filtering is not yet present). Obviously GPU manufacturers can have OpenGL ES extensions to add FP32 filtering support if their GPU supports it (as most GPUs should as this has been a required feature in DirectX since 10.0).
Quelle: https://forum.beyond3d.com/threads/...n-gpu-architectures.56180/page-2#post-1805847
Register pressure is another bottleneck of GCN architecture. It's been discussed in many presentation since the current console gen launch. Fp16/int16 are great ways to reduce this bottleneck. GCN3 already introduced fp16/int16, but only for APUs. AMD marking slides state that GCN4 adds fp16/int16 for discrete GPUs (http://images.anandtech.com/doci/104...542.1449038245). This means that fp16/int16 is now a main feature on all GCN products. Nvidia is only offering fp16 on mobile and professional products. Gaming cards (GTX1070/1080) don't support it.
Quelle: https://forum.beyond3d.com/threads/amd-speculation-rumors-and-discussion.56719/page-182#post-1927339
Polaris beherrscht FP16/Int8 nativ
http://images.anandtech.com/doci/10446/P3.png?_ga=1.18704828.484432542.1449038245