Looking at RIPEMD-160 Bitcoin Addresses for Fun and No Profit

Keir Finlow-Bates
7 min readMay 11, 2019

--

Sometimes for fun, I poke around in that wonderful messy data pool that is the Bitcoin blockchain. It’s full of interesting transactions, data, and weird quirky goings-on. Today I’m publishing this article, which looks at strange repetitive Bitcoin addresses, and what they could mean.

A large number of Bitcoin addresses are derived from 160-bit strings that look very odd and decidedly unrandom, which I initially concluded means that many were probably the result of coding errors. However, the highest value addresses have output transactions, which suggests they are vanity addresses.

I deduced that about 89.2 BTC has been lost due to coding errors, which is about half a million dollars at today’s exchange rate.

The question still remains: why would someone want to make Bitcoin addresses starting with lots of 1s?

The computer code bit

Last night I found myself wondering, “how many Bitcoin addresses have a balance?”, so I went and searched on Github, and of course, almost immediately found a neat Python script that trawls through the Bitcoin chainstate database and generates a file with exactly that data. It’s available at https://github.com/graymauser/btcposbal2csv.

Extracting the transactions

To start with, you need to have an up to date copy of the Bitcoin chainstate database. This resides in the chainstate folder of your Bitcoin core data directory.

It took me about 20 minutes to resolve all the installation problems for the dependencies need in graymauser’s script, so I took time out to produce https://github.com/kf106/btcposbal2csv, which sorts all that out with a single install script:

git clone https://github.com/kf106/btcposbal2csv
cd btcposbal2csv
sudo ./install.sh
source venv/bin/activate
python btcposbal2csv.py /path_to_Bitcoin/chainstate addresses.csv

It turns out the file it generates on May 9 is 24,157,834 lines long and takes up about 1.2 Gb of disk space. Each line shows a Bitcoin address, the balance in satoshis, and the last block that there was any activity for that address. Very neat.

The file size means that you can’t really use a spreadsheet to start analyzing the data contents. Time to break out sed, awk and other Linux command line scripts that can handle that kind of data quantity…

First step: let’s add up the total number of satoshis the script has recorded:

awk -F"," '{x+=$2}END{print "Satoshi total is " x}' ./ripe.csv

When I run that, I get 1,571,401,006,662,989 satoshis, which is 15.7 million bitcoins. Hmm, the script did warn that it couldn’t process all transactions, and as I write this about 17.7 million have been issued, so 2 million are missing. Never mind — it’s close enough.

Introducing RIPEMD-160

There is also a script that will add the RIPEMD-160 address to the comma separated file generated. If you go to https://gobittest.appspot.com/Address you can see a useful page that shows all the steps involved in generating a Bitcoin address from an ECDSA private key (note — for safety’s sake don’t enter your own private key in this or any other web page). The RIPEMD-160 address is halfway down and is used to obfuscate the ECDSA public key until you use your Bitcoin address to reduce the risk of address compromise. It is usually encoded in base58 with a checksum in order to prevent typing errors and make the system more readable.

python convert2ripemd160.py > ./ripe.csv

When you run the RIPEMD-160 script on the initial output you end up with a 2.2 Gb file, with each line containing the Bitcoin address, the balance, the last block in which it was active, and the RIPEMD-160 address. I called this file ripe.csv. If you want to check the number of lines yourself, just run:

sed -n '$=' ./ripe.csv

Next, I decided to strip off the Bitcoin addresses, balances and block sizes, to just have a look at the RIPEMD-160 addresses. I also decided to sort them. I’m not sure why I did this, but the following commands do just that:

awk -F "\"*,\"*" '{print $4}' ripe.csv > ripeonly.csv
sort ripeonly.csv > ripesort.csv

And now, the very interesting bit. With the head or tail command, you can look at the first nine lines or last nine lines of a file. So look at this output from head:

keir@chainfrog$ head ripesort.csv
0000000000000000000000000000000000000000
0000000000000000000000000000000000000000
0000000000000000000000000000000000000001
0000000000000000000000000000000000000002
0000000000000000000000000000000000000003
0000000000000000000000000000000000000004
0000000000000000000000000000000000000005
0000000000000000000000000000000000000006
0000000000000000000000000000000000000007
0000000000000000000000000000000000000008

And here’s the output from tail:

keir@chainfrog$ tail ripesort.csv 
ffffffffffffffffffffffdb004301555a5a7869
ffffffffffffffffffffffdb004301ffffffffff
ffffffffffffffffffffffffffdb004301ffffff
ffffffffffffffffffffffffffffffc200110801
ffffffffffffffffffffffffffffffffdb004301
ffffffffffffffffffffffffffffffffffc00011
ffffffffffffffffffffffffffffffffffffdb00
fffffffffffffffffffffffffffffffffffffe36
ffffffffffffffffffffffffffffffffffffffff

These do not look like random addresses!

Being lazy is a lot less work

Now, I could do a lot of calculations to determine the probability of lots of repeated digits appearing in a RIPEMD-160 address, but that sounds like a lot of work. Obviously, a few repeated digits are to be expected. But at what point can we conclude that an address is not the product of pseudo-random chance, but rather due to programmer’s error? (And in the tail output, what is with that “db004301” string that keeps turning up?)

A quick tutorial: RIPEMD-160 takes as its input any string, and returns a 160 bit number, which in hexadecimal is 40 octets, i.e. forty characters consisting of 0–9 or A-F. That’s about 1.5*10⁴⁸ possible outputs.

So my first thought was to produce a 24,157,834 line file with genuinely random RIPEMD-160 outputs. Time to cobble together a bash script! An initial attempt to generate random numbers and piping them through the hash function failed because of memory problems (despite 16 GB of RAM in my machine). So instead, how about using the actual address list as input to generate a random RIPEMD-160 list:

while read line; do echo $line | openssl rmd160 -binary | xxd -p; done < ripeonly.csv >> pseudorand.csv

For this, I’m hashing each of the RIPEMD-160 addresses again and putting them into a new file. The hash function RIPEMD-160 is meant to be a cryptographic hash function, so its output should be pseudo-random.

Okay, that takes a bit of time. 12 hours in fact — sometimes being lazy takes a long time. So it’s a good thing that delivery pizza has arrived and it’s time to feed the kids and then put them to bed, watch some TV, and then go to bed myself.

In a given RIPEMD-160 hash represented in hexadecimal, if it is truly random, the chance of a given digit being followed by nine more is 1/(16⁹) — given a digit (e.g. ‘F’), the chance of the next digit being the same is 1/16, and the one after that another 1/16, and so on nine times. That’s 1/68,719,476,736.

Now, in a 40 digit string, there are 31 opportunities for a digit to be matched nine more times, and there are 24,157,834 addresses, so a rough estimate of the odds of an address containing ten repeated digits is (31*24,157,834)/68,719,476,736, which is about 0.0105. Or simply put, there is about a 1% chance that this could occur naturally.

In general, the odds of N digits repeating somewhere are about

((41-N)*24157834)/(16^N)

So it turns out that 9 repeating digits are quite likely in our file, at 18%. (These calculations are estimates — to do this properly I’d have to calculate the odds of a run, as shown in this Wolfram page, which is a pain in the neck).

The following command counts the number of lines in a file in which there are ten repeated characters:

grep --count '\(.\)\1\1\1\1\1\1\1\1\1' pseudorandsort.csv
grep --count '\(.\)\1\1\1\1\1\1\1\1\1' ripesort.csv

So, ripesort.csv contains 5476 matches, and pseudorandsort.csv contains 0, as expected. In fact, there are no nine character repetitions in pseudorandsort.csv either, and only 3 eight character repetitions. There is definitely something going on in the list of real Bitcoin addresses. Here’s a sample of some of the matches:

764d544f51655a49586276536d7a0a0000000000
76616e206461616c0a0000000000000000000000
76616e206461616c0a0a0a0a0000000000000000
766173652c2062616c6c6f6f6e2e000000000000
766520796f750a00000000000000000000000000
76657273696f6e3a302e3031ffffffffffffffff
7669430a00000000000000000000000000000000
7669612040636f696e6465736b00000000000000
766ac43156a7f2d26e9099f10000000000000000
766c612055726261637a6b790a00000000000000
76cc000000002b9f24000000000002988b2c230e
76d575876f00000000000000000000000042a0ec
7700000000000000000000000000000000000000

That does look suspicious, doesn’t it?

Adding up the satoshis

It’s time to return to ripe.csv, the original file with all the information, and add up all the satoshis stored in addresses with ten or more repeated digits. These two commands should do that (the first extracts all the rows with strange RIPEMD-160 addresses, and the second is our old satoshi-adding command again):

grep ",.*,.*,.*\(.\)\1\1\1\1\1\1\1\1\1.*$" ripe.csv > ripeweird.csv
awk -F"," '{x+=$2}END{print "Satoshi total is " x}' ./ripeweird.csv

First I test it on a small sample to check it works (always do this if you a dealing with a lot of data), and then run it over the whole set. It takes about three hours. It’s a good thing I’m doing this in the background while busying myself with other tasks, but I should have done all the file operations in memory rather than onto disk using tmpfs. Lesson learned.

The satoshi total is 23734753110. That's about 237 bitcoins, which at today’s prices is about $1.5 million.

Vanity of vanities

However, after looking at the addresses that contain ten leading zeros (which were the ones with the highest balances) it turns out that they have output transactions. This means that the private keys for these addresses are known.

Sorting the ripeweird.csv file by the number of satoshis and looking at the addresses with the highest balance might be helpful:

sort --field-separator="," -n -k 2 ripeweird.csv > ripeweirdsort.csv
tail ripeweirdsort.csv

This gives the following output:

12H2TgmSqUJnNkRoCAtpKxk83LeJHHhVTA,2106000,297114,0e0000000000000001000000008ba4d185000000
111111GQBM6cV1NPHRrwkUWY1EnGNCVo,15000000,565919,0000000000541e4b4dc4cac85e3d550fbf82ff37
111111XQU2hAwaF2YjLHVkKoqQiFnDEF,100000000,348797,0000000000a614b2c1b4c37f30905d978b352a4b
111111W1yUY2vSgXDQBUp95qGoVwej5v,100000000,353912,00000000009e800cc10ba10802bd6afaf4462073
111111e4BfZw5QZPDBxKNekL2wApVCVd,344129179,512371,0000000000ca6840ea97057ac93b617024d0198c
111111nFTRm5GCoUNS7mURKmZEJAu1j,500044493,550136,00000000000442f272ba5cb84c0259ccd91d10f0
111111811zDiVJNrGoufFmZUdetqXMqV,1600000000,512707,0000000000263cf663dd808113f7428e021f6e05
111111YnPJJ6Zb6Gri3yP9UEi4hbk7xT,3678801024,487760,0000000000ad9b82ad38b0f0c0c0f76f3eeca574
1111111111111111111114oLvT2,6720404010,491637,0000000000000000000000000000000000000000
1111116d87CjjDyP8SF5v1LTvUq22VFg,8957970093,488065,00000000001eb6ba52e378d3be9864eb7238e9f2

The penultimate address with 67 bitcoins is definitely a case of programmer error — no outputs ever, and it’s highly unlikely that anyone has or will generate a private ECDSA key that ends up with that Bitcoin address.

But the other addresses starting with ten zeroes appear to be vanity addresses. With a vanity address, you keep generating keys and deriving Bitcoin addresses from them until you find one that starts with the characters you want.

So, for example, 1111116d87CjjDyP8SF5v1LTvUq22VFg currently has a balance of 89.5 BTC, but has received nearly 400 BTC over its lifetime.

Summing up

I manually removed the vanity addresses that are actually active and ran the satoshi count again. This time I’m left with 8,925,496,933 satoshis, which is 89 bitcoins, and therefore worth about half a million dollars.

I have put “retrieving addresses with inputs but no outputs ever” on the list of features to add to the btcposbal2csv script at some point in the future.

Well, that’s it for this weekend — hope you enjoyed that random exploration of bitcoin transactions, and that it’s given you some inspiration to go exploring yourself. The Bitcoin blockchain is full of interesting stuff, so you can easily spend an enjoyable weekend poking around in it, and hopefully this article has given you some tools to do so.

--

--

Keir Finlow-Bates
Keir Finlow-Bates

Written by Keir Finlow-Bates

I walk through the woods talking about blockchain

Responses (2)