In addition to its other capabilities Xenos has a special instruction which is presently unique to this graphics processor and may not necessarily even be available in WGF2.0 and this is the MEMEXPORT function. In simple terms the MEMEXPORT function is a method by which Xenos can push and pull vectorised data directly to and from system RAM. This becomes very useful with vertex shader programs as with the capabilities to scatter and gather to and from system RAM the graphics processor suddenly becomes a very wide processor for general purpose floating point operations. For instance, if a shader operation could be run with the results passed out to memory and then another shader can be performed on the output of the first shader with the first shader's results becoming the input to the subsequent shader.
MEMEXPORT expands the graphics pipeline further forward and in a general purpose and programmable way. For instance, one example of its operation could be to tessellate an object as well as to skin it by applying a shader to a vertex buffer, writing the results to memory as another vertex buffer, then using that buffer run a tessellation render, then run another vertex shader on that for skinning. MEMEXPORT could potentially be used to provide input to the tessellation unit itself by running a shader that calculates the tessellation factor by transforming the edges to screen space and then calculates the tessellation factor on each of the edges dependant on its screen space and feeds those results into the tessellation unit, resulting in a dynamic, screen space based tessellation routine. Other examples for its use could be to provide image based operations such as compositing, animating particles, or even operations that can alternate between the CPU and graphics processor.
With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies. For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.
Seeing as MEMEXPORT operates over the unified shader array the capability is also available to pixel shader programs, however the data would be represented without colour or Z information which is likely to limit its usefulness.
ATI indicate that MEMEXPORT functions can still operate in parallel with both vertex fetch and filtered texture operations.