Rev69 committed (Updated)

The update to revision 69 that was committed a few minutes ago is by far the largest update to Pyrit’s codebase ever. It brings a general cleanup of the codebase and a lot of improvements behind the scenes. I’ll update the documentation (what documentation?) in the next few days.

Some of the most important changes:

*  Support for Multi-GPU should work on NVidia-CUDA devices.
*  For the moment you can no longer configure the core-layout yourself. The rule of thumb is that Pyrit auto-configures all available GPU-cores and (NumberOfCPUs – NumberOfGPUs) CPU-cores. In general that means a slight performance increase for users of Quad-CPUs.
*  Pyrit no longer filters/expands passwords on it’s own. That means Pyrit is less picky about what it imports into the local blobspace so beware the cruft!
*  The whole compile/install process has had a major overhaul and almost looks sane now. The commandline client is now called ‘pyrit’ and installed to /usr/bin/. This overhaul also brings support for creating binary RPMs with distutils’ bdist_rpm.
*  Pyrit now comes in three discrete packages: The core-package which includes the commandline client and the CPU-driven core (VIA-Padlock support is now auto-detected and enabled if available). The two other packages add support for NVidia-CUDA / AMD-Stream, can be compiled/installed optionally and are automatically used if present and supported by hardware. This makes third-party-distribution in binary form (Live-CDs) much easier.
*  The spell ‘Arcane Missiles’ is now working as intended.

People who know how to use svn should always use the code directly from trunk.  You can also download rev69 (current version 0.2) as packages:
Core (required)
CUDA (optional)
Stream (optional)

Multi-GPU for ATI is still in the pipeline; there are some major problems with AMD’s SDK at this point with AMD showing a general lack of interest to the whole community.  Due to my ridiculous lack of hardware, the Multi-GPU capabilities for NVidia are completley untested at this point. Please comment about any problems below.

Thanks to the team at Pentoo for support.

Update:

*  Rev72 fixes compilation on x86_64 and Darwin (Mac OS). Pyrit now also uses ~/.pyrit/ instead of ./ as a basis for the blobspace directories. This fixes a major annoyance introduced with pyrit getting installed to /usr/bin

Major update ahead

I hope to finish a major update to Pyrit within the next few days. It will finally bring support for using more than one GPU which includes using the full potential of the GTX 295. Stay tuned.

Compiling AMD-Stream core with G++ 4.3

G++ 4.3 will throw an error message like the following when compiling the AMD-Stream core.

/usr/local/atibrook/sdk/include/brook/CPU/brtvector.hpp:190: explicit template specialization cannot have a storage class

To fix this:
*  open ‘/usr/local/atibrook/sdk/include/brook/CPU/brtvector.hpp
*  search for ‘
#define SPECIALGETAT(TYP) template <> static TYP GetAt (const TYP& in,int i) {return in;}
*  remove the word ‘static‘ from the line above.

Pyrit -> coWPAtty passthrough

Rev67 just got committed with some (minor) changes:

* A new command ‘passthrough‘ has been added to Pyrit’s CLI which allows to pipe passwords through Pyrit’s fast CUDA/AMD/Padlock cores and their results directly to coWPAtty. This skips storing passwords/PMKs on disk and can help using Pyrit on LiveCDs. That way – however – you will not benefit from the huge performance gain which can be provided by re-using precomputed PMKs later on. If you have the disk-space available, you should use the “-f -” option to pass results to stdout while writing them to the local blobspace. There is a bug in vanilla-coWPAtty so you must apply a patch for this command to work.

* A quite serious bug was fixed that may have corrupted exports to on-disk coWPAtty files.

* Some minor cleanups.

* Fixed a bug with the casting of Arcane Missiles.

The following example will read passwords from ~/cowpatty/dict, pipe them through the CUDA-kernel and feed them into coWPAtty: ./pyrit.py -c ‘Nvidia CUDA’ -e linksys -f ~/cowpatty/dict passthrough | ./cowpatty -d – -r ~/cowpatty/wpa2psk-linksys.dump -s linksys

New example code: TEA encryption with CUDA

I’ve written some more CUDA demonstration-code: The Tiny Encryption Algorithm implemented in CUDA.

The code demonstrates 100% occupancy, 100% coalesced 128bit memory transactions and use of page-locked memory. It performed at around 380 mb/s on a GTX 260. Compare that to 40mb/s on a 2×2.5Ghz Core2Duo (without using SSE).

Beware some pitfalls when playing with the execution parameters. Especially beware those implicit memory/threadblock alignment requirements from hell!

Get it here and compile with ‘nvcc -Xptxas “-v” -maxrregcount=10 tea_cuda.cu

Follow

Get every new post delivered to your Inbox.