Hi,
I want to present my thoughts and observations about current lottery algorithm. Also I want to suggest one change which could improve fairness and randomness, since at this point its not exactly random.
I Was not able to upload images between text, so all images I refer to are available below this post.
For simplicity I assume that algorithm whitelists 1000 participants.
Let me start with discussing code from Polkastarter Github profile and explain it line by line: https://github.com/polkastarter/polkastarter-lottery
This method (refer first image) is creating a list of winners. At the very beginning there is an empty list. Later on all top holders are added to the list without any lottery. Its fine because they are top holders and they have guaranted whitelist slot. After that to this list are added participants who hold Golden Paulie. Thats fine as well, they have guaranted whitelist slot. Next there are some priviliged participants added which I dont want to discuss now, so lets forget about this line. And finally at the end all other participants are added to this list.
At this point there is list where at the very beginning we have participants with guaranted slot and all other participants later.
Fine, whats next? Next from this list are removed all duplicated entries (for the reason I will discuss later) and taken first 1000 participants. They are WINNERS. What does it mean? It means we NEED to be high on this list, otherwise we won't be taken.
We already know that top holders, Golden Paulie holders and priviliged users are placed on the list first. What about all the rest? They are positioned based on sampling method (refer second image).
On second image you can see methods named weighted_random_sample and shuffled_elligible_participants. They are computing values (from 0 to 1) based on which participants are placed. The closer to 1, the better.
weighted_random_sample is explicitly responsible for computing values. Its using clever mathematic function to compute participant chance based on some luck factor and number of his tickets. You can find more details here: https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf
NOW IS THE CRUCIAL PART TO UNDERSTAND WHY THIS ALGORITHM IS WRONG
Without going into much details its using function which can be represented as you can see on image (refer third image). On X axis there is a number of tickets and on Y axis there is a value based on which participant is placed on list.
This value is also dependent on luck factor I mentioned earlier. This luck factor should be determined once per participant and then its fair!
So where is the problem? The problem is that luck factor is computed not once per user but 5 * 1000 = 5000 times! If its selected 5000 times for every user then we really dont have much randomness and algorithm HEAVILY favorize those with largest number of tickets.
Why EXACTLY is it a problem? Because if lucky factor is computed that many times for each participant then its not any more luck factor and it could be simplified to constant number. And If its constant number then participants are being choosen ONLY based on number of tickets.
Is it all? Ohh guys, its not. Take a look at shuffled_eligible_participants method. There is a loop running 5000 times. It produces winners but it doesnt guarant uniqueness. Meaning some users will be choosen many times (thats why duplicated entries are removed later) and some users will not be choosen at all.
My proposition: change algorithm so that luck factor is computed once for participant, sort list based on output and choose winners
EDIT: IN A COMMENT YOU CAN FIND SIMULATION I DID, WHICH DRASTICALY CHANGES MY FIRST FINDINGS