New example code: TEA encryption with CUDA

I’ve written some more CUDA demonstration-code: The Tiny Encryption Algorithm implemented in CUDA.

The code demonstrates 100% occupancy, 100% coalesced 128bit memory transactions and use of page-locked memory. It performed at around 380 mb/s on a GTX 260. Compare that to 40mb/s on a 2×2.5Ghz Core2Duo (without using SSE).

Beware some pitfalls when playing with the execution parameters. Especially beware those implicit memory/threadblock alignment requirements from hell!

Get it here and compile with ‘nvcc -Xptxas “-v” -maxrregcount=10 tea_cuda.cu

Leave a comment

No comments yet.

Comments RSS TrackBack Identifier URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

  • RSS Unknown Feed

    • An error has occurred; the feed is probably down. Try again later.