GCN specific macros

( [Download Link] section updated. Produced ELF is NOT compatible with cat14.6b. Works well with cat13.4, and if you want a working disassembler, you should use cat12.10)

With the #include directive it is possible to inline headers. So I made an stdgcn.inc to help doing ‘everyday’ programming tasks.

A kernel code may start like this:

var code:=asm_isa(
  #include stdgcn.inc
  KernelInitUC(64,64,8192)
  ...

KernelUnitUC(WorkGroupSize, VRegCount, LDSBytes) is defined in stdgcn.inc and does the following:

  • sets important kernel parameters: WorkGroupSize, VRegCount and LDSBytes
  • specifies buffers and reads pointers to them. In this case4 UC means 1 uav and 1 constant buffers.
  • prepares kernel indexes, and stores them in grpId=s0, lid=v0, gid=v1 aliases respectively.
  • allocates vector and scalar registers for temp variables (more info later) making sure that it not include resource constants and other important registers.
  • measures the start time of the kernel. (well, maybe this should be optional)

 Temp registers

This is a new feature which helps using variables is a structured form.
Before using this, a register pool must be allocated with s_temp_range and v_temp_range instructions. For example:

s_temp_range 1..7, 27..103
v_temp_range 2..64

From now on there will be a scope for variables allocated with the v_temp and s_temp instructions:

v_temp X, Y, Z  //note: the data type is always 32bit, for 64bit types you can use arrays
s_temp i,j,k
s_temp data[16] align:16  //allocates a 16 dword array of sregs aligned to 16 dword boundary

 Managing temp register scope

There are two special instructions for this: enter and leave. In a block between enter and leave; a new scope is created. One can allocate registers with s_temp and v_temp inside a block and the leave instruction will release all those variables that are allocated inside the block. It is very useful inside macros.

Program structure macros

_if(), _else, _end: Lets you create if/else statements without using jumps and labels. The _if statement has to know what register are you going to sheck with it so the proper form of _if instruction is this:

  • s_if(vccz) //scalar IF checking a scalar flag.
  • s_if_i32(s6>-32769) //scalar if checking 32bit signed integer relation
  • v_if_f64(v10<>s20) //vector if with 64bit float operand (and a 64bit float scalar)

Possible types for s_if are: i32, u32. And for v_if: i32, u32, i64, u64, f32, f64.

_while(), _endw: Makes a while block. You must use the same prefixes and suffixes for _while macro as you would use for the _if macro.

_repeat, _until(): Makes a repeat-until block. Prefix and suffix must be specified for _until().

_break, _continue: Can be used inside a _while-_endw or a _repeat-_until block.

Memory IO macros

dwAddr is a dword index. uavId is 0-based. AOption can one or more option of the tbuffer_ instruction, for example: glc.

uavWrite(uavId,dwaddr,value)
uavWrite(uavId,dwaddr,value,AOption)
uavRead(uavid, dwaddr,value)
uavRead(uavid, dwaddr,value,AOption)
cbRead(dwaddr,value)

note: They are so slow that should not be used in az inner loop. But they provide easy acces to memory.

 Accessing HW_INFO

They are easy access macros for the bitfields of the HW_INFO value. The result is placed in the provided scalar reg.

getWaveId(ghwRes) 
getSIMDId(ghwRes) 
getCUId(ghwRes) 
getSHId(ghwRes) 
getSEId(ghwRes) 
getThreadGroupId(ghwRes) 
getVirtualMemoryId(ghwRes)
getRingId(ghwRes) 
getStateId(ghwRes)

And a complicated one that calculates the Global SIMD Id. You can identify the SIMD on which your program is running.

getGlobalSIMDId(ggsRes)

 GDS macros

gwAddr: dword indeg in GDS memory

gdsWrite(gwAddr,gwData)
gdsRead(gwAddr,gwData)
gdsAdd(gwAddr,gwData)

 Global Wave Synch

Id is a unique id chosen by you. gwsThreads: the number of total workgroups (or wavefronts, I’m not sure… The wrong one will crash :D)

gwsInit(gwsId,gwsThreads)
gwsBarrier(gwsId)

 Measuring execution time

_getTickInit     //initializes T0 time. All other timing macros will work relative to this.
getTick(gtRes)   //returns current time elapsed from T0 //with lame 32bit calculations
breakOnTimeOut(botTimeoutMS)  //ensures that a loop cannot be infinite. Calls s_endpgm if timeOutMS is reached.

 Kernel initialization

Must be called right after including stdgcn.inc.

AGrpSize: no of workItems in a workGroup. ANumVGPRS: allocaten no of vector regs. ALdsSizeBytes: as its name.

KernelInitUUUC(AGrpSize,ANumVGPRS,ALdsSizeBytes)  //3 UAVs and 1 ConstBuffer

Other buffer variants implemented: UU, UC, U

 

Advertisements
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s