Fan & Huffman Algorithm

clytaemnestra

New member
Joined
Mar 7, 2021
Messages
7
Hi, I've got a task to implement Fan and Huffman algorithm on a given list of probablities and to calculate the optimal length of code. I've done it in Excel (screenshots
attached, if needed I can attach the sheet as well), however I'm not sure if my results are correct.

Two colleagues of mine got the same length in both algorithms, but I got different leghts (1.76 and 1.77). Have I done a mistake somewhere? I'm not really sure about codes in Huffman's algorithm, though, took me quite a lot of time to figure it out.

Thank you in advance.

orderprobability
x1
0.01​
x2
0.01​
x3
0.01​
x4
0.04​
x5
0.02​
x6
0.01​
x7
0.76​
x8
0.01​
x9
0.01​
x10
0.05​
x11
0.05​
x12
0.02​

Fan algorithm:
1616868195476.png
1616868237072.png
1616868452100.png
 
I've implemented the algorithms on my end and got the same results you did, assuming the "length" is the sum of, for each symbol, the number of bits in the symbol's encoding multiplied by the symbol's frequency.

Several of the symbols have the same frequencies, so the exact sequences of binary codes may be different depending on how you sort the symbols. This has no impact on the final data length. [citation needed]

Using JavaScript, I produced an array of nodes for testing with:
JavaScript:
let nodes = [
    { text: "x1" , weight: 0.01 },
    { text: "x2" , weight: 0.01 },
    { text: "x3" , weight: 0.01 },
    { text: "x4" , weight: 0.04 },
    { text: "x5" , weight: 0.02 },
    { text: "x6" , weight: 0.01 },
    { text: "x7" , weight: 0.76 },
    { text: "x8" , weight: 0.01 },
    { text: "x9" , weight: 0.01 },
    { text: "x10", weight: 0.05 },
    { text: "x11", weight: 0.05 },
    { text: "x12", weight: 0.02 }
];

Both algorithms will be sorting these lists and/or sub-lists in descending order of weight. This can be done in the following way:
JavaScript:
nodes.sort((a,b) => b.weight - a.weight);


Shannon-Fano coding uses the following algorithm:
  • Sort the list of symbols in descending order of frequency
  • Divide the items into two parts, such that the sums of the weights of each part are as close to equal as possible
  • The first part is prefixed with 0 and the second part is prefixed with 1
  • Continue the previous two steps for each of the new sub-lists, and so-on until every item is individually indexed with a bit sequence
My JavaScript implementation is as follows:
JavaScript:
function shannonFano(items) {

    // The sub-list contains one item
    if (items.length == 1)
        return items[0];

    // Ensure the items are sorted
    items.sort((a,b)=>b.weight - a.weight);

    // Sum the weights of all items in the sub-list
    let totalWeight = 0;
    for (let item of items)
        totalWeight += item.weight;

    // Locate the optimal pivot position
    let pivot   = 0;
    let left    = 0;
    let right   = totalWeight;
    let winDiff = totalWeight;
    for (let x = 1; x < items.length; x++) {

        // "Move" one item from the right sub-list to the left sub-list
        let weight = items[x - 1].weight;
        left      += weight;
        right     -= weight;

        // Compare the difference in total weights of the potential sub-lists
        let diff = Math.abs(right - left);
        if (winDiff <= diff)
            continue;

        // Record a new potential pivot position
        winDiff = diff;
        pivot   = x;
    }

    // Produce a node whose children are the sub-lists about the pivot
    return [
        pivot == 1 ? items[0] :
            shannonFano(items.slice(0, pivot), true),
        pivot == items.length - 1 ? items[pivot] :
            shannonFano(items.slice(pivot), true)
    ];
}

This produces the following binary codes:
Code:
x1  0.01 11011
x2  0.01 11100
x3  0.01 11101
x4  0.04 1011
x5  0.02 1100
x6  0.01 11110
x7  0.76 0
x8  0.01 111110
x9  0.01 111111
x10 0.05 100
x11 0.05 1010
x12 0.02 11010

Calculating the total data length yields 1.77.


Huffman coding uses the following algorithm:
  • Sort the list of symbols in descending order of frequency
  • Remove the two least-frequent symbols from the list
  • Produce a node whose children are the removed symbols, prefix the left node with 0 and prefix the right node with 1
  • Add the node to the list as a new symbol, whose frequency is the sum of its childrens'
  • Repeat the previous four steps until only 2 symbols remain. These symbols are prefixed with 0 for the most-frequent and 1 for the least-frequent
My JavaScript implementation is as follows:

JavaScript:
function huffman(items) {

    // Process items until only 2 remain
    while (items.length > 2) {

        // Ensure the items are ordered by frequency
        items.sort((a,b) => b.weight - a.weight);

        // Produce a node from the two least-frequent items
        let right = items.pop();
        let left  = items.pop();
        items.push({
            [0]   : left,
            [1]   : right,
            weight: left.weight + right.weight
        });

    }

    // Return the remaining nodes
    return items;
}
This produces the following binary codes:
Code:
x1  0.01 100110
x2  0.01 100111
x3  0.01 100100
x4  0.04 1000
x5  0.02 1011
x6  0.01 100101
x7  0.76 0
x8  0.01 101010
x9  0.01 101011
x10 0.05 110
x11 0.05 111
x12 0.02 10100
Calculating the total data length yields 1.76.
 
Two colleagues of mine got the same length in both algorithms [...]

If you're concerned about the fact that the data lengths are different between the two algorithms, bear in mind that Shannon-Fano does not always produce code words of optimal length, while Huffman does. It's certainly possible to produce optimal code words with Shannon-Fano depending on how you sort and split the lists, which may be what happened when your colleagues ran their tests.
 
Top