Startup hacks and engineering miracles from your exhausted friends at Faraday

No MD5, SHA1, or SHA256 collisions for US addresses

Seamus Abshere on

I calculated hashes of single-family home addresses in the United States:

create table hashtest as (  
  select
    house_number_and_street,
    city,
    state,
    digest(upper(house_number_and_street || ',' || city || ',' || state), 'md5') as "md5",
    digest(upper(house_number_and_street || ',' || city || ',' || state), 'sha1') as "sha1",
    digest(upper(house_number_and_street || ',' || city || ',' || state), 'sha256') as "sha256"
  from households
)

E.g.,

=> select upper(house_number_and_street || ',' || city || ',' || state) "key", digest(upper(house_number_and_street || ',' || city || ',' || state), 'md5') "md5" from households limit 1;
             key               |                md5
-------------------------------+------------------------------------
 1024 PENINSULA DR,WESTWOOD,CA | \x511cdfb25d6b77d45742ed0407b5c2ef
(1 row)

Then I counted the distinct hashes:

=> select count(distinct md5) md5, count(distinct sha1) sha1, count(distinct sha256) sha256, count(*) from hashtest;
   md5    |   sha1   |  sha256  |  count
----------+----------+----------+----------
 78224992 | 78224992 | 78224992 | 81087108
(1 row)

Some of the addresses are repeated in the database because the APNs are identical, but the conclusion is that we have 78 million uniques and no hash collisions with the algorithms tested.