Hi all. Firstly, thanks for such a fantastic site. I haven't studied maths for 20 years, so I'm a bit rusty. I’m not sure, but I think it’s a statistics problem, or perhaps a sampling problem.
Anyway, the problem is as follows:
Say I have 1000 ‘bins’ of data. Each bin has a series of numbers in them. The numbers can be from 1 – 100. The bins do not have the same number of numbers in them. Some may have 1, some may have 5 etc. The number of numbers and the value of the numbers in each bin is basically random. I know which numbers are in each bin, and I can calculate a unique number total for each bin. The question is: how can I calculate the minimum number of bins I would need to sample in order to find out the total number of *unique* numbers across all of the bins.
An example:
Bin1: 5, 6, 2, 6 – 3 unique numbers in the bin (6 is duplicated in the bin)
Bin2: 1, 2, 18, 98, 18, 6 – 6 unique numbers in the bin (no duplications in the bin)
Bin3: 5, 2, 27, 55, 23, 27 – 5 unique numbers in the bin (27 is duplicated in the bin)
Total unique numbers for above is 10 (because 2, 5 and 6 are duplicated in other bins) and in this case I would only need to sample Bin2 and Bin3 to get all the unique numbers across all bins. All numbers in Bin1 are duplicated in other bins.
So, is there a way to ‘solve’ the problem of finding the minimum number of bins to sample to get all unique numbers across the bins?
A slight variation on this is: I am only prepared to sample 10 bins – which bins should I sample to get the most unique numbers?
Thanks in advance to all you maths wizards. I truly envy your talents!!
Anyway, the problem is as follows:
Say I have 1000 ‘bins’ of data. Each bin has a series of numbers in them. The numbers can be from 1 – 100. The bins do not have the same number of numbers in them. Some may have 1, some may have 5 etc. The number of numbers and the value of the numbers in each bin is basically random. I know which numbers are in each bin, and I can calculate a unique number total for each bin. The question is: how can I calculate the minimum number of bins I would need to sample in order to find out the total number of *unique* numbers across all of the bins.
An example:
Bin1: 5, 6, 2, 6 – 3 unique numbers in the bin (6 is duplicated in the bin)
Bin2: 1, 2, 18, 98, 18, 6 – 6 unique numbers in the bin (no duplications in the bin)
Bin3: 5, 2, 27, 55, 23, 27 – 5 unique numbers in the bin (27 is duplicated in the bin)
Total unique numbers for above is 10 (because 2, 5 and 6 are duplicated in other bins) and in this case I would only need to sample Bin2 and Bin3 to get all the unique numbers across all bins. All numbers in Bin1 are duplicated in other bins.
So, is there a way to ‘solve’ the problem of finding the minimum number of bins to sample to get all unique numbers across the bins?
A slight variation on this is: I am only prepared to sample 10 bins – which bins should I sample to get the most unique numbers?
Thanks in advance to all you maths wizards. I truly envy your talents!!