Hi guys,
I'm a little stuck in my research, and I don't have enough math/statistics background to tackle it myself.
I need to calculate (estimate) mutual information from my data, in order to find dependencies between variables. Doing this when the variables are discrete is fairly doable by just calculating frequencies. Handling continuous variables is also doable to my knowledge (with co-variance matrices etc) But what about cases when variables in my data-set can be both discrete and continuous.
For example (each line is one solution vector)
0.2 0 0 1
1.2 0 0 0
3.1 0 0 0
0.4 0 0 1
...
where the first variable is real, and the remaining three are in this case binary. In this example first two discrete variables are independent (and for simplicity always 0) while the 3rd one is dependent on the one real value ( if r < 1.0 this discrete value is 1, 0 otherwise).
With enough data-points, I should be able to notice(calculate) this relationship. But I'm not sure how to go around that..of course the problems I'm interested in are MUCH larger. I just haven't found (or understood) a methodology for discovering such mutual information.
Thanks in advance, and in case its not clear, I'm a stats n00bdata:image/s3,"s3://crabby-images/1c4fb/1c4fb4a004ac374ae735c210f8560be0dce354ac" alt="Wink ;) ;)"
Edit: sorry if this is misplaced...I wasn't sure if this is more of a probability or advanced math forum question
I'm a little stuck in my research, and I don't have enough math/statistics background to tackle it myself.
I need to calculate (estimate) mutual information from my data, in order to find dependencies between variables. Doing this when the variables are discrete is fairly doable by just calculating frequencies. Handling continuous variables is also doable to my knowledge (with co-variance matrices etc) But what about cases when variables in my data-set can be both discrete and continuous.
For example (each line is one solution vector)
0.2 0 0 1
1.2 0 0 0
3.1 0 0 0
0.4 0 0 1
...
where the first variable is real, and the remaining three are in this case binary. In this example first two discrete variables are independent (and for simplicity always 0) while the 3rd one is dependent on the one real value ( if r < 1.0 this discrete value is 1, 0 otherwise).
With enough data-points, I should be able to notice(calculate) this relationship. But I'm not sure how to go around that..of course the problems I'm interested in are MUCH larger. I just haven't found (or understood) a methodology for discovering such mutual information.
Thanks in advance, and in case its not clear, I'm a stats n00b
Edit: sorry if this is misplaced...I wasn't sure if this is more of a probability or advanced math forum question
Last edited: