© 2008 W.Ehrhardt Last update Sep. 07, 2008
Home CRC/Hash Crypto Misc. Info Links Deutsch

Cryptographic algorithms

On this page there are several cipher algorithms: the 128-bit block ciphers AES and Twofish, the 64-bit block cipher Blowfish, and the stream cipher Salsa20. Please note: In the section other block ciphers there are some more block ciphers with short descriptions and source code archives: AnubisCamelliaSEEDSerpentSHACAL-2XTEA. The streams ciphers ABC and Phelix are discontinued.

The basic/test/demo source codes can be compiled with most current Pascal (TP 5/5.5/6, BP 7, VP 2.1, FPC 1.0/2.0/2.2) and Delphi versions (tested with V1 up to V7/9/10).

Free Pascal users: Please read the note related to range and overflow checking in Debug mode!

Before downloading any software from this site please read this legal notice.


A E S

Here are the AES (Advanced Encryption Standard) related Pascal / Delphi sources: basic AES routines and recommended block cipher modes of operation (with test programs that verify compilation and results), demo programs for file encryption / authentication and decryption / verification, and a look at other implementations.


Last changes: 

The block level routines in aes_2008-08-04.zip include separate units for encryption and decryption, the source code for basic encryption/decryption is split into several include files. At the lowest level there are type definitions and common routines. Key sizes of 128, 192, and 256 bits are supported.

Building on these routines, the following block cipher modes of operation are implemented: CBC, CFB128, CFB8, CTR, ECB, OFB as well as OMAC, CMAC, EAX, XTS. All modes allow plain and cipher text lengths that need not be multiples of the block length (for ECB, CBC, and XTS cipher text stealing is used for the short block; only one short block is allowed and there must be at least one full block). CTR mode can use 4 built-in incrementing functions or a user supplied one.

All AES routines are included in the AES_DLL.DLL, there are two interface units for this DLL (one for Virtual Pascal, the second for the other Win32 compilers). The source code archive contains many test programs which are used to check compilation processes, the implementation against known test values, etc.

With the t_gsp128 test program the CPU cycles per block and the (theoretical) encryption speed in MB/s were measured for the different compilers (this program needs the high resolution timer routines from hrtimer). Here are the results for a Pentium 4, 1.8 GHz, Win98, 128 bit key size; the first data pairs are for full tables, the second for compressed tables (8 byte aligned for 32 bit compiler):

Compiler Cyc/Bl-F MB/s-F Cyc/Bl-C MB/s-C
TP5 6302 4.6 5356 5.4
TP55 6321 4.5 6980 4.1
TP6 1436 20.0 1762 16.3
BP7 1493 19.2 1927 14.9
Delphi2 365 78.6 398 72.1
Delphi3 373 76.9 398 72.1
Delphi4 386 74.3 398 72.1
Delphi5 375 76.5 398 72.1
Delphi6 380 75.5 398 72.1
Delphi7 380 75.5 398 72.1
Delphi9 381 76.3 397 72.3
Delphi10 380 75.0 398 72.1
VPC 426 67.3 425 67.3
FPC 2.2 -O3 416 69.0 417 68.8
FPC 1 GoV2 542 53.0 541 53.1

TP5 and 5.5 have pure 8086 Pascal code, TP6/BP7 use 16-bit 80386 BASM code in the core routines. The Delphi 2..7/9/10, VPC, and FPC sources are written in pure 32-bit Pascal code.



In the following table my routines (WE/.., program t_aes_ws) are compared with other open source packages. The source code to build the test programs for the third party packages is in aes_cmpsrc_2006-07-30.zip. The original sources are required in order to compile the test programs, the URLs are given on the Links page. The times are in s for encrypting 512 MB (8000000 blocks of 64 byte) with 128-bit keys using a 1.8 GHz P4 with Win98.

Package/Compiler CTR CFB OFB ECB CBC OMAC
LibTom117/VC6 13.0 16.3 16.4 12.9 13.9 15.6
LTC117/GCC4.2.1 10.1 14.5 10.6 9.0 9.3 14.2
dcpcrypt2/D6 28.8 32.7 28.6 - 32.7 -
DEC5.1/D6 - 13.9 10.9 10.2 11.8 -
StrSecII/D6 9.0 11.5 9.1 7.7 9.3 -
WE/D3 9.0 8.1 8.1 7.7 9.1 9.1
WE/D6 9.0 8.0 8.0 7.7 8.4 9.1
WE/FPC 2.2 -O3 9.9 9.1 9.1 9.0 11.2 9.0
WE/VPC 2.1 10.4 10.2 10.3 9.3 13.9 12.0
WE/BP7 47.1 41.4 41.4 34.3 51.0 45.3


The comparison with one of the fastest known open source implementations (Brian Gladman using C/ASM) is a little more detailed. With the program t_gspeed the cycles/block of the encryption and decryption core routines are measured for all AES key lengths, in fact with the same method that Gladman uses in his aestmr.cpp file. ASM is Gladman's highly optimized NASM version, C++ his optimized standard version, the other entries are my versions from above.

Func/Bit ASM C++ D6 VPC FPC2 BP7
Enc/128 295 385 370 425 546 1490
Dec/128 293 376 382 405 551 1545
Enc/192 352 439 434 532 648 1768
Dec/192 346 443 451 476 653 1723
Enc/256 403 497 498 580 751 1948
Dec/256 407 507 518 549 755 1971


Since the July 2006 release there are conditional defines to support compressed tables: one 2K encryption table (calculated with t_mkctab) replaces the four 1K tables (same for decryption, here the inverse SBox is no longer needed). Besides using less static memory, compressed tables are considered as a countermeasure against cache timing attacks.

It seems that modern x86 processors can use these inherently unaligned/overlapping data from cache with very little speed penalty, IF (a big if) the whole table is at least aligned to a 8 byte boundary. Unfortunately Delphi has no means to force this alignment for a single table. So there are some other defines and diagnostic variables to deal with the proper alignment (gathered in the configuration file aes_conf.inc). If the compressed tables are not 8 byte aligned, the cycle counts nearly double.

Another countermeasure against cache timing attacks is the forced disabling of large tables for the last rounds if compressed tables are used (the compressed tables have copies of the SBox and InvSBox bytes).

Compressed tables should be used if timing data are available for an attacker, he may be able to recover some or all key bits. Here are some links to background material on AES cache timing attacks:
The basic routines are used to build the simple FCA demo program for file encryption / authentication and decryption / verification (a dual OS program running as 16-bit EXE under DOS and as a 32-bit console application supporting long file names under Win32). It uses two modes, the (old) one was inspired by Brian Gladman's encfile with 128 bit AES-CTR mode and HMAC-SHA1-128; the second mode is EAX. The 96 bit salt is a truncated SHA1 hash over date, time, randseed, and TSC; keys and pass phrase verifier are derived from the pass phrase according to RFC 2898.

Since Nov. 2006 the archive contains the contributed fcaes256 unit started by Giorgio Tani using 256 bit AES-EAX mode or AES-CTR and HMAC-Whirlpool-128; fcaes256 is used in Giorgio's PeaZip project. fca256 is the 256 bit equivalent of FCA.

Last changes: The changed KDF unit and the updated zlib archive are used.

To compile the demo programs from the sources in fca_2008-07-17.zip you need units from aes_2008-08-04.zip, util_YYYY-MM-DD.zip, and crc_hash_YYYY-MM-DD.zip. The FZCA program optionally compresses the data with zlib before encryption / authentication.

Note: the demo programs should only be applied to files with sizes less than 2 GB if compiled with Delphi or 16-bit Pascal (Delphi eof bug and/or 32-bit filesize function). Newer FreePascal versions with 64 bit filesize function may work for such big files.


Twofish

The Twofish algorithm as an alternative to AES was implemented as a response to user requests. The source layout found in tf_2007-06-24.zip is similar to AES, there is code for a DLL and the following modes of operation are supported: CBC, CFB128, CTR, ECB, OFB, OMAC, and EAX. All modes allow plain and cipher text lengths that need not be multiples of the block length and (except OMAC/EAX) support a reset function that re-initializes the chaining variables without the (time consuming) recalculation of the round keys. For the ECB and CBC modes cipher text stealing is used for the last short block, CTR mode can use 4 built-in incrementing functions or a user supplied one.

Last changes:  EAX and OMAC modes of operation

My implementation is compared to the same or updated libraries as used in the AES part, the source for the test programs is in tf_cmpsrc_2006-06-15.zip. The times are in s for encrypting 512 MB (8000000 blocks of 64 byte) with 128-bit keys using a 1.8 GHz P4 with Win98.

Package/Compiler CTR CFB OFB ECB CBC OMAC
LibTom112/VC6 11.9 14.6 14.4 10.3 11.2 14.4
LTC112/GCC4.0.1 13.4 14.9 13.5 12.5 11.7 14.2
dcpcrypt2/D6 21.5 25.4 21.3 - 25.7 -
DEC5.1/D6 - 31.9 28.7 27.7 29.6 -
StrSecII/D6 13.0 14.7 13.0 10.7 12.1 -
WE/D3 14.8 14.0 13.9 13.8 16.0 16.3
WE/D6 15.2 14.0 13.9 13.9 15.6 15.0
WE/FPC2 20.4 20.4 20.3 19.5 21.9 20.1
WE/VPC 17.6 17.3 17.6 17.0 22.9 18.4
WE/BP7 72.0 68.5 67.9 61.6 76.0 72.2


Blowfish

The Blowfish source layout found in bf_2007-06-24.zip is similar to AES. There is code for a DLL and the following modes of operation are supported: CBC, CFB64, CTR, ECB,OFB, OMAC, and EAX . All modes allow plain and cipher text lengths that need not be multiples of the block length (for ECB and CBC cipher text stealing is used for the short block). CTR mode can use 4 built-in incrementing functions or a user supplied one.

All modes (except OMAC/EAX) support a reset function that re-initializes the chaining variables without the (time consuming) recalculation of the round keys.

Last changes:  EAX and OMAC modes of operation

My implementation is compared to the same or updated libraries as used in the AES part, the source for the test programs is in bf_cmpsrc_2006-06-15.zip. The times are in s for encrypting 512 MB (8000000 blocks of 64 byte) with 128 bit keys using a 1.8 GHz P4 with Win98.

Package/Compiler CTR CFB OFB ECB CBC OMAC
LibTom112/VC6 12.0 14.1 13.8 11.3 15.6 16.7
LTC112/GCC4.0.1 13.1 12.8 11.8 9.9 10.5 17.8
dcpcrypt2/D6 21.0 28.5 20.8 - 30.3 -
DEC5.1/D6 - 13.7 12.0 12.1 13.7 -
StrSecII/D6 14.1 15.1 14.0 10.9 13.5 -
WE/D3 14.0 12.2 10.8 10.2 12.4 11.5
WE/D6 14.1 12.3 10.8 10.6 12.4 11.4
WE/FPC2 18.2 17.4 18.8 14.5 16.8 16.5
WE/VPC 20.0 21.0 19.2 19.2 26.3 22.6
WE/BP7 64.8 57.1 57.1 40.7 72.8 58.3


Salsa20

Last change:  The new defaults are 12 rounds for 128 bit keys and 20 rounds for 256 bit keys.

Salsa20 by D.J. Bernstein is a stream cipher that is included in the final eSTREAM Portfolio.

The core of Salsa20 is a hash function with 64 bytes input and 64 bytes output, the input blocks consist of 128 or 256 bits of key material, the nonce or IV, and block numbers. The hashed output is used as a pseudo random key stream or is xored with the plain text to produce the cipher text.

The original ECRYPT Salsa20 specification demands 20 rounds (or 10 double rounds) for the hash function, but D.J. Bernstein himself has formally proposed two variants with 8 and 12 rounds; the portfolio version has 128 bit keys and 12 rounds. My defaults are: 12 rounds for 128 bit keys and 20 rounds for 256 bit keys.

In the following table my implementations (program t_salwe, 1.8 GHz P4 with Win98) are compared with the reference C and ASM code from D.J. Bernstein. The times in s for processing 576 MB (1 Mio 576 bytes blocks) are measured: the columns kx show the values for key stream generation with x rounds, and the ex columns have the corresponding encryption times (all with 128 bit keys). The reference C code for the key stream generation is suboptimal because it encrypts a zero stream.

Compiler k8 k12 k20 e8 e12 e20 Rem.
ASM P4 F12 - - - 2.54 3.62 5.57 (1)
GCC4.01 -O3 8.30 10.28 14.45 7.31 9.34 13.46 (2)
VC6 SP4 /O2 6.48 8.19 11.53 6.32 7.91 11.26 (2)
Delphi 3 3.57 4.93 7.88 8.70 10.05 12.75 -
Delphi 6 3.58 4.94 7.88 8.83 10.18 12.88 -
VPC 2.1.279 4.38 5.72 8.44 7.79 9.12 11.82 -
FPC 2.0.2 4.72 6.06 8.78 9.57 10.92 13.63 -
BP7 Real 4.89 6.24 8.93 23.75 24.90 27.59 (3)

(1) Calculated from D.J. Bernstein's cycles for 576 bytes
(2) Modified reference C code with static rounds variable
(3) ex bottleneck is the bytewise xor via three 16 bit pointers

My source implementation in the archive salsa_2008-08-04.zip uses BASM for the core hash function (TP5.X uses Pascal with some inline).

Salsa20 is used as cryptographic random number generator in the salsar unit, see the PRNG section on the Misc. page.


Other block ciphers

Last change:  32 bit Camellia implementation

The source layout in the archives is as usual: the CBC, CFB, CTR, ECB, OFB modes of operation are implemented; Anubis, Camellia, Serpent, and SEED additionally support OMAC and EAX.

ABC and Phelix

The stream ciphers ABC V3 (by V. Anashin, A. Bogdanov, I. Kizhvatov) and Phelix (by D. Whiting, B. Schneier, S. Lucks, F. Muller) have not been advanced to eSTREAM Phase 3. Therefore I am not planning to update my Pascal implementations. The archived sources are still available in abcv3_2006-08-04.zip and phelix_2006-11-20.zip; please read the Phase II Report before using them.


Home CRC/Hash Crypto Misc. Info Links Deutsch