An idea to implement GPU based vanity address generator safely
Vanity wallet addresses such as multiple leading zeros wallet addresses (0x00000…) have many advantages. One of them is that it helps reduce gas usage on each transaction. For this reason, many MEV bots are using this kind of address. Not only are MEV bots using this, but typical users looking to own a good-looking wallet address are also using this.
We can generate these addresses using tools such as Vanity-eth and Profanity. Profanity is much faster than Vanity-eth in a magnitude of a thousand times as it uses GPU. However, Profanity has been hacked. So, it is best to stay away from Profanity as of now. We must first learn how it gets hacked to fix the issue.
To generate a wallet, one must generate a private key. A private key is simply a 32 bytes number. The safest way to generate a private key is to generate a completely random 32 bytes number. In total, there are 2²⁵⁶ different private keys.
How does Profanity get hacked?
Profanity uses a randomized seed to generate many deterministic wallet addresses correlated to that seed. However, there are only 2³² (approximately 4 billion) different seeds. Its entropy is limited. We have calculated in our previous blog that it requires less than three days for an enormous mining farm to brute force all wallets generated by Profanity up to six leading zeros.
If we look at the code on how Profanity created its seed, only a random number of size uint32 is created from rd(). The final four 64-bit seed (combined to 256-bit) is deterministically calculated from the first uint32 seed.
Why doesn’t Profanity simply random bytes32 to get a fully randomized private key?
That is because all computations done by GPU must be deterministic. The CPU computations are also deterministic by design, but it has access to more external data sources, such as date and time, than the GPU. Hence it can provide better randomness.
Because of these reasons, the CPU must provide a random seed to the GPU. GPU will use provided random seed to generate wallet addresses deterministically from that seed.
How to fix the Profanity issue?
To fix the issue used in the Profanity hack, you need to increase the entropy of the random seed to be at least 2²⁵⁶. We can do it by combining multiple uint32 (2³²) seeds to get a final result. We must combine at least eight uint32 seeds to have 2²⁵⁶ different seeds required.
Next, implement “Marsaglia’s xorshift” or “xorwow” algorithm to have an efficient uniform pseudo-random number generator. Eventually, the CURAND library has provided an easy-to-use interface for the “xorwow” algorithm.
Afterward, combine each random number generated from each thread of GPU to get the final private key, a randomized 32 bytes number (2²⁵⁶ different private keys).
Let’s dive deep into the implementation of each step.
Passing eight different seeds to the GPU
First, Profanity will generate a 256-bit initial seed divided into four 64-bit numbers. Since the initial seed is a 256-bit number, the initial seed is a private key itself. GPU will then use these four 64-bit seed numbers to calculate other private keys deterministically.
Instead of passing a random number from std::random_device to std::mt19937_64, then using the same random number from std::mt19937_64 as a seed to generate four 64-bit numbers (Represent each part of a 256-bit seed), you should generate eight 32-bit numbers directly from std::random_device and then use these numbers to built 64-bit numbers as shown below.
BlackTrace has implemented this step in a Profanity forked repository: https://github.com/BlackTrace/profanity/issues/1.
Implementing “Marsaglia’s xorshift”
“Marsaglia’s xorshift” is an algorithm to generate random numbers with uniform distribution from a seed.
This algorithm is done in a recursive way where a previous random number is used as a seed, and the first seed is generated externally from the CPU.
We will pass each part of the seed (four 64-bit numbers) from the previous step to the rand_xorshift function. rand_xorshift will provide a random number. This number is calculated deterministically from the randomized seed so we can treat the returned number as randomized. The returned number will be recursive as a seed to generate the next random number by passing it to rand_xorshift. This recursive pattern can generate an infinite amount of numbers.
uint rand_xorshift(uint rng_state){
rng_state ^= (rng_state << 13);
rng_state ^= (rng_state >> 17);
rng_state ^= (rng_state << 5);
return rng_state;
}
Credit: https://towardsdatascience.com/how-to-generate-a-vector-of-random-numbers-on-a-gpu-a37230f887a6
Use the CURAND library for “xorwow” (Optional)
“xorwow” is an improvement of “Marsaglia’s xorshift” algorithm. “xorwow” is adding Weyl sequence to the output of “Marsaglia’s xorshift” to amplify the uniformness. The xorwow generator is the CUDA’s default generator in the CURAND library. There is an excellent lecture about the CURAND library at https://wlandau.github.io/gpu/lectures/curand/curand.pdf.
Evaluating the private key against the criteria
The private key generated currently consists of four 64-bit numbers. We calculated the required metrics, such as the number of leading zeros for each 64-bit number generated. Next, combine each metric to get the final metrics. For example, number of leading zeros for each 64-bit number is incorporated into the number of leading zeros for the 256-bit number. If it is better than the previously known best or passes a criteria cutoff, we choose that seed for the next step. Otherwise, we filtered it out.
Combining results to get the final private key
Each private key that passes the criteria will have four 64-bit numbers returned to the CPU. CPU will merge these 64-bit numbers to get a 256-bit private key. The private key will be used as an output by printing to the console.
Summary
Profanity is a vanity wallet address, such as an address with multiple leading zeros (0x0000…) generation tool. Profanity is using GPU to accelerate its performance. Nevertheless, Profanity is hacked with total damage of more than $160M. It is hacked because Profanity randomness is limited to a 32-bit seed number. To fix this, we must pass eight different 32-bit seed numbers combined to get a fully-randomized 256-bit seed number split into four 64-bit numbers to fit the GPU requirement. Each 64-bit number is repeatedly passed into the “xorshift” or “xorwow” algorithm to get limitless 64-bit random numbers from a single seed. Each 64-bit number is evaluated against criteria. If the evaluation is passed, these numbers will be passed on to the CPU to merge into a 256-bit private key.