HetPas ver0.00

* update 160102: Please always check for the latest version in the download area
* update 130514: A new version is available for Cat 13.4
Successfully tested on HD7770 and on HD6970 with the following examples:


Download link -> in the header of this blog.

Software requirements: Windows XP, AMD Catalyst driver
Win7+ users (The app will need a classic XP-like win32 environment):
– Use “Run as Administrator”, because it will generate some temp/result files into C:\.
– Disable Data Execution Prevention, as it will use runtime generated machine code.

What is this?

HetPas is a small script compiler/executor and a small IDE.  It supports 3 languages with syntax highligt, and code-inside to help faster development.

The supported languages are:

  • Pascal with some C inspired things, this is the main/host language.
  • AMD_IL. Middle level ams-like language for cards HD4xxx..HD7xxx
  • GCN ISA. Lowest level asm language for HD77xx+ gfx cards.

What kernel files it can produce?

  • CAL .elf image with AMD_IL code inside (uses AMD’s internal compiler), all cards where amd_il is working, except HD77xx with new drivers.
  • CAL .elf image with GCN ISA binary  (generated with own compiler) hd77xx+ only
  • OpenCL .elf image loaded with GCN ISA binary, hd77xx+ only

What about this release?

It’s a very first one, so it can contain tons of bugs, also the GCN ISA compiler is a reduced one: It lacks some instruction groups, for example double precision encodings. Also anything can change in the future, so don’t  use it for serious projects. Just take it as a toy, with it you can try out ideas on the GCN architecture.

Is there documentation?

Unfortunately not much: here’s a small reference of language elements -> HetPas Reference

Official documentation for AMD_IL and GCN_ISA -> amd-accelerated-parallel-processing-app-sdk/documentation
Check the documents “AMD Intermediate Language (IL) Specification (v2.0e)” and “AMD Southern Islands Instruction set Architecture”!

Indeed it’s not that much, how to start then?

(First if you’re a win7 user, you should disable UAC on this program, because it will write many temporary files in the C:\ path. Use Run as Administrator or XP compatibility mode or something.)

Note that at the moment this project is in early beta/preview stage, so use it on your own risk only.
I suggest, first check out some hpas programs in the examples folder and learn from them!

  • HetPasDemo.hpas – Contains many language elements of the host language.
  • mandel.hpas – a small mandelbrot renderer

Then you can choose a gpu target:

a) HD4xxx..HD7xxx with CAL+AMD_IL.

  • AMDIL_CAL_HelloWorld.hpas

b) HD77xx+ with OpenCL+GCN_ISA (Use latest drivers, I’ve tested with 12-10 on win7 64) *Note that: this is the most up to date target

  • GCN_OpenCL_HelloWorld.hpas
  • GCN_OpenCL_mandel.hpas – Single Precision mandelbrot renderer
  • GCN_OpenCL_latency_test.hpas – You can measure how many cycles an instruction sequence takes.
  • GCN_OpenCL_Fibonacci_recursive.hpas – Some advanced GCN tricks, like indirect S register addressing, goto to a specific address, also this example demonstrates  C style precompiler macroes.

c) HD77xx+ with CAL+GCN_ISA   (Use cat11-12 driver on win7 64bit, or 12-2 on linux 32bit) This is a bit deprecated but works flawlessly with the right drivers, with the wrong drivers it simply crashes when you access UAV.

  • GCN_CAL_mandel.hpas – similar to the OpenCL+GCN_ISA version.
  • GCN_CAL_latency_test.hpas – “
  • GCN_OpenCL_Fibonacci_recursive.hpas – “
  • GCN_CAL_FractalComputeUnit.hpas – This is a big one, I’m not sure if it still works (don’t want to reinstall old drivers right now) but I included it because it contains seriuos macro examples: for example the __for__() macro, and array_aliases.

Why I’m sharing this?

I really like to program efficient hardware in an efficient way. (Also have some experience using SSE) And I’m kinda amazed of this fresh, well designed architecture called GCN. Unfortunately there’s no official assembler for it. So feel free to try my reduced assembler to get a sneak peak of GCN asm, but don’t expect too much 😀

Some cool things that you can reach when you’re close to the metal:

  • True x86 like program flow. You can do jumps/calls/rets to any location in gpu memory.
  • 32bit integer ADD with carryOUT and optional carryIN, 24bit bit integer MAD (good for highprecision math)
  • You can use registers like an array (+1 cycle)
  • You can control register usage, so you can stay under 84 or 64 vregs for fast performance, or use the all 256 vregs if you have to.
  • It has a QueryPerformanceCounter() equivalent. Though it’s very complicated to relate it to final kernel duration because of latency hiding. It can be a good tool to understand how the chip works internally (You can identify big stalls with it, and possibly reorder your code lines to perform better with less threads)
This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

12 Responses to HetPas ver0.00

  1. Michael Wilson says:

    This is really interesting, but sadly I can’t get it to work on my machine. Running the HelloWorld example, it gets as far as the run command to execute the kernel, but then crashes the display driver (which restarts, but leaves the IDE process hung). I am testing on a machine with two 6-core Xeons and a single 7970, Windows 7 64 bit and the latest Catlyst drive (13.1). Here is the dump from GCN_OpenCL_HelloWorld.hpas if the last section is commented out;

    CALDevice #0 DevHandle:$10000000
    target : CAL_TARGET_TAHITI
    maxResource1DWidth : 16384
    maxResource2DWidth : 16384
    maxResource3DHeight : 16384
    target : CAL_TARGET_TAHITI
    localRAM : 3072 MB
    uncachedRemoteRAM : 2047 MB
    cachedRemoteRAM : 2047 MB
    engineClock : 1000 MHz
    memoryClock : 1425 MHz
    wavefrontSize : 64
    numberOfSIMD : 32
    doublePrecision : true
    localDataShare : true
    globalDataShare : true
    globalGPR : true
    computeShader : true
    memExport : true
    pitch_alignment : 256 elements
    surface_alignment : 4096 bytes
    numberOfUAVs : 256
    bUAVMemExport : false
    b3dProgramGrid : true
    numberOfShaderEngines : 2
    targetRevision : 5
    availLocalRAM : 3032 MB
    availUncachedRemoteRAM : 2030 MB
    availCachedRemoteRAM : 2030 MB

    ShaderType = IL_SHADER_COMPUTE
    TargetChip = t
    ; ————- SC_SRCSHADER Dump ——————
    —-End of ELF dump—-

  2. Michael Wilson says:

    Sorry to clarify the AMD_IL version is working, the CAL version is not, so is it just that the Catalyst 13 release broke the assembler?

    • realhet says:

      Hi, and thanks for feedback!

      Am I get it right that the GCN_OpenCL_HelloWorld.hpas example worked at you? (If not, please tell, and I’ll check it.)

      And about the *_CAL_*.hpas examples:
      The CAL API is broken in the recent Catalyst drivers. *_CAL_*.hpas examples will work only when you have an older driver (older than 12.4 If I remember).
      But CAL is officially deprecated anyways. I’ve made a CAL_deprecated folder in the examples, so from now there only 3 types of programs exists in the examples folder:
      OpenCL_OpenCL_* : OpenCL language (HD4xxx and up)
      AMDIL_OpenCL_* : AMD_IL language (HD4xxx and up)
      GCN_OpenCL_* : GCN ISA language (HD77xx and up)
      All 3 (hopefully) runs under the officially supported OpenCL API.

      “so is it just that the Catalyst 13 release broke the assembler?”
      Only the CAL interface is broken.

  3. huanhuan says:

    Thank you are providing this useful tool! I am the one who asked ds_swizzle and answered by you.

    • realhet says:

      Hi, you’re welcome!

      I should write a more up to date examples soon. There are new NAMS style macro and register allocation features. And the current examples aren’t showing about that anything.
      Hopefully I’ll get a GCN card next week, and that mandelbrot example would be much simpler and nicer with macroes.

  4. ukasz says:

    Could you tell me what is the license on HetPas, I can’t find that information into sources. Also is there any more private way to contact you like an email?

    • realhet says:

      It’s a free hobby project. Feel free to make and use whatever kernel file you wan’t with it, but I can’t guarantee that every instruction will work as it should be. Note: There is a DD (define dword) for those. Also I haven’t got the chance to check it with all the cards and with every new Catalyst there is a chance that AMD will change something that I have to react.

      * As I just said, the new cal.elf format has been changed in catalyst 13.4. I’m exploring what did they changed… Seems like they just removed the amd_il section from it.

  5. Ryan White says:

    Thank you sharing everything that you have learned about low-level AMD GPUS – your shared knowledge is greatly appreciated! The HetPas program has been a great tool in my education of AMD GCN as well as all your posts on AMD’s forums.

    • realhet says:

      Hi, and welcome You too!
      I’m planning to do a tiny recursive raytracer showcasing GCN features and my macro system. Stay tuned, in a month I wanna publish it. (Now I said it, I gotta do it :D)

      • Michael Wilson says:

        I’m interested to know how this is going as well. Would it be helpful to have more hardware? I have some spare 7970 cards.

      • realhet says:

        Hi, Thanks for support, but I have the necessary hw to develop now. What I don’t have is free time. I’m in a long project now which unfortunately doesn’t involve GCN at all. But it needed some Intel SSE, so at least I can still do low level things. Also I had some inspiration from using MASM (JWAsm) compiler recently, so in the future I want to implement goodies such as .if, .while, and maybe masm-like structs.
        In the summer I did some hobby project in GCN asm, so the compiler improved, just I was too lazy to document it, make examples and upload here. :S This hobby project is a realtime piano string physic simulator (Finite Differences Time Domain method), and it needs a lot of work too. (Right now I’m stuck because I gotta learn some math to understand the wave equation and implement longitudinal string vibrations.) Here’s a sample of what sounds it produces now:
        So the assembler is slowly improves and you can do quiet complicated projects in it, but it’s documentation is so harsh just as GCN’s was a year ago.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s