# Python samples according to discrete probability distribution

Keywords: Python

### 1, Probability list + sample list

Task description: we often have a probability list and a sample list to represent the probability that each sample is selected, and in the probability list, the sum of the probabilities is 1. For example, [0.7, 0.2, 0.1] and ['iron man', 'Captain America', 'Thor'], the elements in the two lists correspond one by one; Moreover, the two lists jointly indicate that 'iron man' has a probability of 0.7 selected, 'Captain America' has a probability of 0.2 selected and 'Thor' has a probability of 0.1 selected. Our purpose is to sample ['iron man', 'Captain America' and 'Thor' through discrete probability distributions such as [0.7, 0.2 and 0.1], and only one sample (of course, multiple samples can be taken).

In fact, such tasks can be implemented in a fairly simple way in Python. Please see my code for details.

code

```import random

# input: probability distribution and correspondence
list_probability = [0.005, 0.015, 0.08, 0.25, 0.3, 0.25, 0.08, 0.015, 0.005]
list_player_role = ['Black widow', 'Spider-Man', 'The Incredible Hulk', 'Thor', 'Iron Man', 'Dr. strange', 'Captain America', 'panther', 'Eagle eye']
# sampling
result = random.choices(list_player_role, weights=list_probability, k=1)[0]
# output: sampling one by probability distribution
print(result)

# check the sampling whether is following the probability distribution or not
frequency = [0, 0, 0, 0, 0, 0, 0, 0, 0]
trying_times = 100000
for i in range(trying_times):
result = random.choices(list_player_role, weights=list_probability, k=1)[0]
if result == list_player_role[0]:
frequency[0] += 1
elif result == list_player_role[1]:
frequency[1] += 1
elif result == list_player_role[2]:
frequency[2] += 1
elif result == list_player_role[3]:
frequency[3] += 1
elif result == list_player_role[4]:
frequency[4] += 1
elif result == list_player_role[5]:
frequency[5] += 1
elif result == list_player_role[6]:
frequency[6] += 1
elif result == list_player_role[7]:
frequency[7] += 1
elif result == list_player_role[8]:
frequency[8] += 1
else:
raise Exception('There is something wrong in sampling...')
for i in range(len(frequency)):
print('Role:%s\t probability: %.3f\t frequency: %d/%d=%.4f' % (list_player_role[i], list_probability[i], frequency[i], trying_times, frequency[i]/trying_times))
```

output

Iron Man
Role: Black Widow      Probability: 0.005      Frequency: 489 / 100000 = 0.0049
Role: spider man      Probability: 0.015      Frequency: 1558 / 100000 = 0.0156
Role: hulk      Probability: 0.080      Frequency: 8011 / 100000 = 0.0801
Role: Thor      Probability: 0.250      Frequency: 25094 / 100000 = 0.2509
Role: Iron Man      Probability: 0.300      Frequency: 29957 / 100000 = 0.2996
Role: Dr. strange      Probability: 0.250      Frequency: 24958 / 100000 = 0.2496
Role: Captain America      Probability: 0.080      Frequency: 7867 / 100000 = 0.0787
Role: Panther      Probability: 0.015      Frequency: 1551 / 100000 = 0.0155
Role: eagle eye      Probability: 0.005      Frequency: 515 / 100000 = 0.0052

It can be seen that each frequency in the output result is close to its corresponding probability, which shows that the sampling process does follow the probability distribution specified by us.

### 2, Probability list only

Task description: do not specify a sample list, only a probability list, and then output an index in the probability list after sampling. For example, if you input [0.7, 0.2, 0.1] and output 1, then 1 indicates that the probability of acquisition is 0.2. If the output is 2, it indicates that the probability of acquisition is 0.1; If the output is 0, it means that the probability of acquisition is 0.7.

code

```import random

# input: probability distribution and correspondence
list_probability = [0.005, 0.015, 0.08, 0.25, 0.3, 0.25, 0.08, 0.015, 0.005]

# sampling
index = list(range(len(list_probability)))
probability_index = random.choices(index, weights=list_probability, k=1)[0]

# output: sampling one by probability distribution
print(probability_index)```

output

5

The above sampling process is only tested on the list in Python. It is reasonable that open source libraries such as numpy and pytorch will also have corresponding implementation methods.

### 3, Reference

Posted by project168 on Tue, 26 Oct 2021 06:27:52 -0700