Help me understand Entropy Measure formula

saurabhperiwal · Jan 11, 2023

Hi Everyone, This is my first post. I am reading following PDF and on page number 16 there is a formula for calculating Entropy. I need help in understanding it https://www.scape-consulting.com/tl...nt/05_Publications/02_Publications/OG2013.pdf

BigBeachBanana · Jan 11, 2023

saurabhperiwal said:
Hi Everyone, This is my first post. I am reading following PDF and on page number 16 there is a formula for calculating Entropy. I need help in understanding it https://www.scape-consulting.com/tl...nt/05_Publications/02_Publications/OG2013.pdf

Start with the definition of entropy (information theory). Can you state the definition in your own word? Then we can talk about the intuition behind the formula.

saurabhperiwal · Jan 12, 2023

As I understand entropy: it means - how random components are in an architecture. More random components you use the architecture becomes complex.

blamocur · Jan 12, 2023

Can you ask a more specific question?
In communications, entropy can be interpreted as the minimal number of bits required to encode an average symbol. If a symbol happens rarely then one can spend more bits on coding that symbol (e.g. Morse code). Ideally, an optimal code would spend [imath]\log_2 \frac{1}{f_i}[/imath] on [imath]i[/imath]-th symbol whose probability is [imath]f_i[/imath]. If you weight the symbols' lengths by their probability you get the average number of bits per symbol as [imath]\sum f_i \log_2 \frac{1}{f_i}[/imath].

An example: 4 symbols with two different distributions:
1) When each symbol has the same frequency 1/4 we have to use 2 bits per symbol,
2) but when the symbols' frequencies are 1/2, 1/4, 1/8 and 1/8 we can use 1, 2, 3 and 3 bits per symbol respectively which, on average, gives us 1.75 bits per symbol.

Hope this did not add to your confusion

BigBeachBanana · Jan 12, 2023

saurabhperiwal said:
As I understand entropy: it means - how random components are in an architecture. More random components you use the architecture becomes complex.

Informally, entropy is the average level of information gained from some possible outcomes.

The core idea of entropy is as follows:

If a highly likely event occurs, the message carries very little information.
Whereas, if a highly unlikely event occurs, the message is much more informative

For example, you have some knowledge about which days are NOT going to rain for the next 365 days. This provides a little information about which day it's going to rain exactly, but highly probable. On the other hand, if you know exactly the day that it's going to rain, this is highly informational, but it communicates a very low probability.

All that is to say information value and probability are inversely related. To describe the relationship, we say informational value, [imath]I_i[/imath]:
[math]I_i= \log\left({\frac{1}{f_i}}\right) ; \text{where } f_i = \frac{x_i}{\sum x_i }\text{ the probability of event } i[/math]
The reason we use [imath]\log[/imath] is because when [imath]f_i = 1[/imath], the surprisal of the event has no value [imath]I_i = 0[/imath] and when the probability decreases to 0, [imath]I_i[/imath] increases .

Formally, the entropy measure is defined as the weighted average of all informational values gained:
[math]\text{Entropy} = \sum I_i\cdot f_i= \sum \log\left(\frac{1}{f_i}\right) \cdot f_i[/math][math]\text{ where } f_i \text{ are the weights}[/math]
Understanding it on a fundamental level helps you see how it can be applied in different contexts like the example @blamocur gave, enterprise architecture in your PowerPoint, decision tree in machine learning, ect...

blamocur · Jan 13, 2023

BigBeachBanana said:
The reason we use log⁡\loglog is because when fi=1f_i = 1fi=1, the surprisal of the event has no value Ii=0I_i = 0Ii=0 and when the probability decreases to 0, IiI_iIi increases .

I saw an explanation (here, among other places) which I like more: when events are independent their total amount of information is the sum of individual amounts.

BigBeachBanana · Jan 13, 2023

blamocur said:
I saw an explanation (here, among other places) which I like more: when events are independent their total amount of information is the sum of individual amounts.

Yes, but it also satisfies conditions (1) and (2) in the link, which makes the log a perfect candidate.

blamocur · Jan 13, 2023

BigBeachBanana said:
Yes, but it also satisfies conditions (1) and (2) in the link, which makes the log a perfect candidate.

Not to be a bore, but: since [imath]0\leq p \leq 1[/imath] and [imath]I(p) \geq 0[/imath] then (1) and (2) follow from (3)

BigBeachBanana · Jan 13, 2023

blamocur said:
Not to be a bore, but: since [imath]0\leq p \leq 1[/imath] and [imath]I(p) \geq 0[/imath] then (1) and (2) follow from (3)

Fair point.

Help me understand Entropy Measure formula

saurabhperiwal

New member

BigBeachBanana

Senior Member

saurabhperiwal

New member

blamocur

Elite Member

BigBeachBanana

Senior Member

blamocur

Elite Member

BigBeachBanana

Senior Member

blamocur

Elite Member

BigBeachBanana

Senior Member