Happy new year!
Most importantly here’s the document: GCN Reference Card
(Use it only on your own risk, as it can contain errors. The red color means the glorious GCN3)
Last autumn I had the opportunity to program a GCN3 Fury card. It is a real 8TFlops/s beast, even games are flowing incredibly on that while I still have a potato of a CPU 😀
So I had a job on it, and in order to do it, first I had to upgrade my assebler to support the changes that GCN3 introduced. I’ve spent the first days on comparing the old (Southert/Sea Islands) manual against the new one (GCN3 Isa Manual), spotting the differences, making notes on every important changes. The new instruction set is totally incompatible with the old one, but maybe first I was a bit angry, later I really liked the new changes. Just 2 things out of the many: Sub DWord Addressing, Data Parallel Processing. These two (mostly the latter) are kinda redefining how I think in parallel programming from now. Basically there’s no need to think in mass parallel, we can interweave adjacent threads without any wasted cycles(* there are penalties, though) to collaborate on the same job, while using the same memory resources as it would do with only 1 individual thread.
Long ago I thought about optimizing Scrypt on 4 adjacent lanes. I just guessed it is useful as it uses 1/4 the memory costs. On GCN1 it had to be implemented with the ds_swizzle instruction, but now it is free on the GCN3 to connect 4 lanes this way. Too bad I don’t have time to play with this…
5 years ago there was a mass parallel revolution (I’d call it hype), when sequential programmers were so optimistic to wait for new solutions that can make their sequential programs run on massive amounts of treads automatically. Now there is that inter-thread data sharing thing. If memory is a bottleneck, then we can think about connecting 2,4,16 or even 32 threads to work together on the same job. By this, it became even harder to optimize classic sequential code automatically to new hardware. But those guys are still optimistic as hell. I heard about someone who willing to wait these parallel things to be implemented under Java. Geeeeez 😀 Why not start OCL today?!