I have no idea what branch of math this is called so haven't looked to see if an answer has alredy been provided here. If so please direct me to it.
This is a two part question for a software function I need to write.
I'm trying to
For example, assume 8 list with varying items:
With these 8 lists there are (3*7*7*5*3*6*4*3) 158,760 combinations of possible unique output. ie
Each list occupies one column in the ouput. So each column reflects 12.5% uniqueness or sameness.
Comparing any of the first 3 outputs to each other, they are all only 12.5% unique since they are identical except for the Action column - run, jump, hop. The same is true if you compare outputs 4 thru 6 with each other.
If we compare output 1 to output 4 it is again only 12.5% unique since it varies only by the Thing column - ball vs chair. If we compare output 1 to outputs 5 or 6, it is 25% unique - the first 6 columns are identical, only the last 2 of output 1 differ from the last 2 of outputs 5 & 6 - ball run vs chair jump and chair hop.
If I were looking for combinations that were at least 25% unique, from these first 6 I could choose (1,5), (1,6), (2,4), (2,6), (3,4) or (3,5).
The first question: Is there is a formula, recursive or not, that can determine how many combinations there are that would be at least n% unique? It would need to work with varied data sets, each having a different number of lists numbering in the hundreds with each list containing from 1 to hundreds, possibly thousands of items.
The second question: Is there an algorithm that would allow the generation of only n% unique output? So instead of generating all possible combinations then going back and comparing them (as I did manually in the example above), to determine as each output row is being generated which columns should be populated with which item from that list.
My fall-back is to generate all possible combinations then go through and programmatically compare each output to all the others, which could take quite a long time to run. Any assistance that could speed up that effort would be greatly appreciated.
This is a two part question for a software function I need to write.
I'm trying to
- 1) pre-determine how many n% unique ways there are to intermix items from various lists and
2) produce output of only combinations that are n% unique
For example, assume 8 list with varying items:
- Size list (3 items): small, medium, large
Color list (7 items): orange, blue, red, purple, yellow, green, black
Weight list (7 items): 4oz, 12oz, 1lb, 4lb, 10lb, 15lb
Age list (5 items): ancient, antique, very old, old, new
Temp list (3 items): cold, warm, hot
Animal list (6 items): dog, cat, mouse, bear, horse, zebra
Thing list (4 items): ball, chair, TV, floor, lamp
Action list (3 items): run, jump, hop
With these 8 lists there are (3*7*7*5*3*6*4*3) 158,760 combinations of possible unique output. ie
- 1) small orange 4oz ancient cold dog ball run
2) small orange 4oz ancient cold dog ball jump
3) small orange 4oz ancient cold dog ball hop
4) small orange 4oz ancient cold dog chair run
5) small orange 4oz ancient cold dog chair jump
6) small orange 4oz ancient cold dog chair hop
... etc
Each list occupies one column in the ouput. So each column reflects 12.5% uniqueness or sameness.
Comparing any of the first 3 outputs to each other, they are all only 12.5% unique since they are identical except for the Action column - run, jump, hop. The same is true if you compare outputs 4 thru 6 with each other.
If we compare output 1 to output 4 it is again only 12.5% unique since it varies only by the Thing column - ball vs chair. If we compare output 1 to outputs 5 or 6, it is 25% unique - the first 6 columns are identical, only the last 2 of output 1 differ from the last 2 of outputs 5 & 6 - ball run vs chair jump and chair hop.
If I were looking for combinations that were at least 25% unique, from these first 6 I could choose (1,5), (1,6), (2,4), (2,6), (3,4) or (3,5).
The first question: Is there is a formula, recursive or not, that can determine how many combinations there are that would be at least n% unique? It would need to work with varied data sets, each having a different number of lists numbering in the hundreds with each list containing from 1 to hundreds, possibly thousands of items.
The second question: Is there an algorithm that would allow the generation of only n% unique output? So instead of generating all possible combinations then going back and comparing them (as I did manually in the example above), to determine as each output row is being generated which columns should be populated with which item from that list.
My fall-back is to generate all possible combinations then go through and programmatically compare each output to all the others, which could take quite a long time to run. Any assistance that could speed up that effort would be greatly appreciated.