Next: , Previous: , Up: SXEmacs OpenSSL API   [Contents][Index]


62.2.3 Message Digests (aka hashes)

Message digests are widely used in modern information infrastructure. They are derived from (collision free) one-way hash functions.

A hash function (such as ‘md5’ or ‘sha1’) is a function with following properties:

  1. reduction: data of arbitrary length is mapped onto data of fixed length
  2. dispersion: a change of one bit in input data changes (ideally) half the bits of the hash value.
  3. well definedness: computing a hash value from the same source data twice yields the same result
  4. efficiency: computing hash values is efficient (ideally with complexity O(n)) on the input, but it is hard to compute a preimage for a given hash value.

Often, the last property is too weak in practice, therefore most hash functions comply with the even stronger:

Message digests fulfill several tasks in daily use. Most commonly used are so called checksums. In modern days hash functions are used almost exclusively for their error detecting facilities in contrast to other checksum algorithms like CRC32.

Beyond that, message digests play an important role in digital signatures. Since public key crypto systems map long plaintexts on long ciphertexts, message digests are used to obscure the length of a plaintext.

Therefore in digital signatures not the message itself is signed but the hash value of that message. That also assures a certain upper bound of the length of a digital signature which is (as in real life) rather short compared to the message that was signed.

Okay, after this short introduction to message digests, here are the functions to access them from elisp.

Function: ossl-digest digest string

Return the message digest of string computed by digest. digest may be one of the OpenSSL digests you have compiled. See ossl-available-digests.

Note: You probably want to put a wrapping encoder function (like base16-encode-string) around it, since this returns binary string data.

In order to compute digest sums from files without actually looking at the file contents explicitly, there is the companion function ossl-digest-file which works similarly.

Function: ossl-digest digest file

Return the message digest of the contents of file computed by digest. digest may be one of the OpenSSL digests you have compiled. See ossl-available-digests.

Note: You probably want to put a wrapping encoder function (like base16-encode-string) around it, since this returns binary string data.

The current implementation of the OpenSSL API in SXEmacs uses the EVP layer of OpenSSL to access the digests.

(base16-encode-string (ossl-digest 'md5 "hash me"))
  ⇒ "17b31dce96b9d6c6d0a6ba95f47796fb"
(base16-encode-string (ossl-digest 'SHA1 "hash me"))
  ⇒ "43f932e4f7c6ecd136a695b7008694bb69d517bd""

Let’s do some performance tests.

;; this is the SXEmacs built-in implementation of MD5
(let ((st (current-btime)))
  (dotimes (i 100000)
    (md5 "Some test string to hash"))
  (- (current-btime) st))
  ⇒ 6194289
  ;; time in microseconds, so this is about 6 seconds
;; now compare to the OpenSSL implementation
(let ((st (current-btime)))
  (dotimes (i 100000)
    (ossl-digest 'md5 "Some test string to hash"))
  (- (current-btime) st))
  ⇒ 10589408
  ;; which is about 10 seconds

As we can see, the built-in implementation has slightly better performance when hashing short strings. The following example shows performance on long strings, like the buffer string here.

(length (buffer-string))
  ⇒ 16861
;; we begin with the built-in implementation
(let ((st (current-btime))
      (b (buffer-string)))
  (dotimes (i 100000)
    (md5 b))
  (- (current-btime) st))
  ⇒ 74350982
  ;; which is about 74 seconds
;; compare to the OpenSSL API
(let ((st (current-btime))
      (b (buffer-string)))
  (dotimes (i 100000)
    (base16-encode-string
      (ossl-digest 'md5 b)))
  (- (current-btime) st))
  ⇒ 31697926
  ;; which is about 31 seconds

This latter example shows digest hashing “under real conditions” since in practice messages to be hashed are typically in the range of 1000 to 30000 characters. This range is even vastly exceeded when dealing with checksums for files.

Since the built-in md5 implementation cannot handle file streams, we have to turn them into strings. A possible way to achieve this has been suggested by Steve Youngs. I shall illustrate it with a tarball file.

freundt@muck:~> ls -sh ~/temp/pdftex-1.30.3.tar.bz2
3.2M /home/freundt/temp/pdftex-1.30.3.tar.bz2
(let ((st (current-btime))
      (b (with-temp-buffer
           (insert-file-contents-literally
            "~/temp/pdftex-1.30.3.tar.bz2")
           (buffer-string))))
  (dotimes (i 100)
    (md5 b))
  (- (current-btime) st))
  ⇒ 22729718
  ;; which is about 22 seconds

Compared to the file stream function ossl-digest-file:

(let ((st (current-btime)))
  (dotimes (i 100)
    (ossl-digest-file 'md5 "~/temp/pdftex-1.30.3.tar.bz2"))
  (- (current-btime) st))
  ⇒ 4189695
  ;; which is about 4 seconds

Another performance test which compares the elisp implementation of sha1 (taken from ‘No Gnus v0.4’) to the one from the OpenSSL API

(let ((st (current-btime)))
  (dotimes (i 500)
    (sha1-binary "a short test string"))
  (- (current-btime) st))
  ⇒ 2574326
  ;; which is about 2.5 seconds
  ;; the same with the OpenSSL API
(let ((st (current-btime)))
  (dotimes (i 500)
    (ossl-digest 'sha1 "a short test string"))
  (- (current-btime) st))
  ⇒ 31378
  ;; which is about 0.03 seconds

These results suggest to always use the openssl interface in favour of other implementations.


Next: , Previous: , Up: SXEmacs OpenSSL API   [Contents][Index]