In-depth analysis of Lottery Algorithm | Voters

In-depth analysis of Lottery Algorithm

under review

Michał Łabuz

Hi,
I want to present my thoughts and observations about current lottery algorithm. Also I want to suggest one change which could improve fairness and randomness, since at this point its not exactly random.
I Was not able to upload images between text, so all images I refer to are available below this post.
For simplicity I assume that algorithm whitelists 1000 participants.
Let me start with discussing code from Polkastarter Github profile and explain it line by line: https://github.com/polkastarter/polkastarter-lottery
This method (refer first image) is creating a list of winners. At the very beginning there is an empty list. Later on all top holders are added to the list without any lottery. Its fine because they are top holders and they have guaranted whitelist slot. After that to this list are added participants who hold Golden Paulie. Thats fine as well, they have guaranted whitelist slot. Next there are some priviliged participants added which I dont want to discuss now, so lets forget about this line. And finally at the end all other participants are added to this list. 
At this point there is list where at the very beginning we have participants with guaranted slot and all other participants later.
Fine, whats next? Next from this list are removed all duplicated entries (for the reason I will discuss later) and taken first 1000 participants. They are WINNERS. What does it mean? It means we NEED to be high on this list, otherwise we won't be taken.
We already know that top holders, Golden Paulie holders and priviliged users are placed on the list first. What about all the rest? They are positioned based on sampling method (refer second image).
On second image you can see methods named weighted_random_sample and shuffled_elligible_participants. They are computing values (from 0 to 1) based on which participants are placed. The closer to 1, the better.
weighted_random_sample is explicitly responsible for computing values. Its using clever mathematic function to compute participant chance based on some luck factor and number of his tickets. You can find more details here: https://utopia.duth.gr/~pefraimi/research/data/2007EncOfAlg.pdf
NOW IS THE CRUCIAL PART TO UNDERSTAND WHY THIS ALGORITHM IS WRONG
Without going into much details its using function which can be represented as you can see on image (refer third image). On X axis there is a number of tickets and on Y axis there is a value based on which participant is placed on list.
This value is also dependent on luck factor I mentioned earlier. This luck factor should be determined once per participant and then its fair!
So where is the problem? The problem is that luck factor is computed not once per user but 5 * 1000 = 5000 times! If its selected 5000 times for every user then we really dont have much randomness and algorithm HEAVILY favorize those with largest number of tickets.
Why EXACTLY is it a problem? Because if lucky factor is computed that many times for each participant then its not any more luck factor and it could be simplified to constant number. And If its constant number then participants are being choosen ONLY based on number of tickets.
Is it all? Ohh guys, its not. Take a look at shuffled_eligible_participants method. There is a loop running 5000 times. It produces winners but it doesnt guarant uniqueness. Meaning some users will be choosen many times (thats why duplicated entries are removed later) and some users will not be choosen at all. 
My proposition: change algorithm so that luck factor is computed once for participant, sort list based on output and choose winners
EDIT: IN A COMMENT YOU CAN FIND SIMULATION I DID, WHICH DRASTICALY CHANGES MY FIRST FINDINGS

November 29, 2021

John Locke

Has there been any development? It has been close to a month now, for an issue that seems quite important for all participants.

Jaxeon King

I'd love an update on this.

chance tan

Hi polkastarter team, Any updates on this?? this seems a major issue for me. thanks!

Michał Łabuz

Hi Robin Ebers (Chief Strategy),

Its under review for more than 2 weeks, while it could be checked in one hour given all the information I provided. Its quite important bug, heavily affecting investors but still you guys seem to disregard it leaving community without updates. Its really disheartening

Michał Łabuz

Robin Ebers (Chief Strategy) 2 weeks later still no response.

Michał Łabuz

Hi Robin Ebers (Chief Strategy),

Do you have any update? Its been a week since your last post :)

Michał Łabuz

Hi Robin Ebers (Chief Strategy), just to summarize my analysis I am adding some vizualization

Penatron

I'm not sure I got the logic but deff prefer some1 from the team that is algo savvy to take a look.

Robin Ebers (Chief Strategy)

marked this post as

under review

Hi Michał Łabuz— thank you so much. We are currently looking into this in much detail and will revert as part of a larger refactoring of our lottery. There were sufficient reports to make us skeptical, too, and your research here helps us advance this.

Michał Łabuz

Robin Ebers (Chief Strategy): thanks a lot for reply! Waiting for your investigation

Michał Łabuz

Hi again! 
I decided to do some simulation and it appears that my initial findings are completely wrong! Current implementaion favorizes small accounts, instead of whales. For me its completely unintuitive but thats what numbers say. Below you can find my simulation results:
I assumed below configuration:
Total number of participants: 600
Number of winners: 50
Participants with 1 ticket: 180
Participants with 2 tickets: 120
Participants with 5 tickets: 120
Participants with 10 tickets: 60
Participants with 15 tickets: 60
Participants with 20 tickets: 24
Participants with 30 tickets: 18
Participants with 40 tickets: 18
Results for running 1500 times current algorithm:
winners with 1 ticket: 61040
winners with 2 tickets: 75882
winners  with 5 tickets: 161748
winners  with 10 tickets: 126396
winners  with 15 tickets: 151425
winners  with 20 tickets: 66171
winners with 30 tickets: 53097
winners with 40 tickets: 54233
Results for running 1500 times fixed version of algorithm:
winners with 1 ticket: 35299
winners with 2 tickets: 47258
winners  with 5 tickets: 114014
winners  with 10 tickets: 110319
winners  with 15 tickets: 160677
winners  with 20 tickets: 83321
winners with 30 tickets: 88373
winners with 40 tickets: 110731
Robin Ebers (Chief Strategy) please take a close look with the team.
Despite the fact I was wrong at the beginning, I still hold my view that algorithm is not working as expected. It doesnt matter who is favorized

Variant

Such a great analysis.

I hope Polkastarter team would be looking into this matter.

→