首页 > 代码库 > 关于HuffmanCoding的简单分析

关于HuffmanCoding的简单分析

 

 

1.what‘s problem we faced?

/**
*    Q: what‘s problem we faced?
*
*    A: Data compression is still a problem, even now. we want to compress
*        the space of data. This desire is more and more stronger when we
*        need to deal with some operation about data transmission. Before
*        we start this article, it may be helpful if you try to provide a valid way
*        to compress data . I tried, but failed obviously. That why I write this
*        article. ^_^
*/

 

2. How can I solve it?

/**
*    Q: How can I solve it?
*
*    A: Where have problem is where have an answer, although it not always
*        the best one. In 1951, a algorithm was introduced by David A. Huffman.
*        It is different from the normal code and is a variable length code, which
*        have different length of code for different symbol. Now, there are two
*        problems:
*
*        No.1: is  variable length code possible? How can we know the length
*                of current symbol?
*
*                The answer is prefix code. Think about this, a tree like following:
*
*                                        
*                                         O
*                                   1 /     \ 0
*                                    O       O
*                               1 /    \ 0   c
*                                O      O
*                                a       b
*
*                This is a simple binary tree. There are three leaf node: a, b ,and c.we
*                label all of left branch as 1, and all of right branch as 0. So if we want
*                to arrive the leaf node a, the path is 11. In a similar way, we can get
*                all of nodes:
*                        a : 11
*                        b : 10
*                        c : 0
*
*                By accident, we get a variable length code.
*
*
*        No.2: How can we use variable length code to compress a series of symbol?
*
*                Now that we have a ability about variable length code. Some funny thing
*                will happen. Image this, In a data, which consist of a series of symbols,
*                some of symbols have occur at high proportion. some of symbols has occur
*                at low proportion. If we use some shorter code to indicate those symbols
*                which have a high proportion, the space of data will smaller than ever.
*                That is what we want.
*
*        Now, we have been know that we could compress a data by use variable length
*        code. However, the next problem is what kind of variable length code is what we
*        want. what kind of code is optimal ?
*/

 

3. What is HuffmanCoding ?

/**
*    Q: What is HuffmanCoding ?
*
*    A:Now,the problem is how can I create a optimal tree ? Do you have any idea?
*        Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may
*        be simple, but the result is valid( this will be demonstrated below). The simplest
*        construction algorithm use a priority queue where the node with lowest probability
*        is given highest priority, the steps as following:
*
*        1. create a leaf node for each symbol, and add it to the priority queue.
*        2. while there is more than one node in the queue:
*            1. remove two nodes that have the highest priority.
*            2. create a new node as the parent node of the two nodes above. the
*                probability of this one is equal to the sum of the two nodes‘ probabilities.
*            3. add the new node to the queue.
*        3. the remaining node is the root of this tree. Read it‘s code as we do above.
*
*/

 

4. is it optimal ?

/**
*    Q: is it optimal ?
*
*    A: Hard to say. I haven‘t a valid method to measure this. About this issue, it is necessary to hear
*        about other people‘s advice. I believe there must be some exciting advice. By the way, this article
*        is just talk about compress of independent symbol, another important issue is about related symbol.
*        That maybe a serious problem.
*
*/

 

5. source code

/***    Here is an simple example*/#include <stdio.h>#include <iostream>/***    In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which*    haven't a valid symbol. we need to label it in our nodes.*/enum ELEM_TYPE {        ET_VALID,        ET_INVALID,        ET_MAX,};typedef int    INDEX;/***    this is a container, we push all of element to it, and pop element by a priority. It is*    a class template since we don't know the type of data element.*/template <class ELEM>class Container {        public:                Container( int capacity);                ~Container( );                /*            *    push a element to this container.            */                bool push( ELEM item);                /*            *    pop a element from this container, the smallest one have the most priority.            *    Of course, the element must have provide a reload function for operator '<'.            */                bool pop( ELEM &item );		        private:                bool _find_idle( INDEX &num);                bool _set_elem( INDEX num, ELEM &elem);                bool _get_elem( INDEX num, ELEM &elem);		                ELEM                *ele;                ELEM_TYPE    *stat;                int                        cap;};template <class ELEM>Container<ELEM>::Container(  int capacity){        this->ele = new ELEM[capacity] ;        this->stat = new ELEM_TYPE[capacity];        int        i;        for( i=0; i<capacity; i++)                this->stat[i] = ET_INVALID;        this->cap = capacity ;}template <class ELEM>Container<ELEM>::~Container(  ){        if( this->ele!=NULL )                delete []this->ele;        if( this->stat!=NULL )                delete []this->stat;        this->cap = 0;}template <class ELEM>bool Container<ELEM>::push( ELEM item){        INDEX        num = -1;        if( (!this->_find_idle( num))                ||(!this->_set_elem( num, item)))                return false;        return true;}template <class ELEM>bool Container<ELEM>::pop( ELEM &item ){        INDEX    i = 0;        INDEX    Min;        /*       *    find the first valid element.       */        while( (this->stat[i]!=ET_VALID)                        &&( i<this->cap))                            i++;        for( Min = i ; i<this->cap; i++)        {                if(  ( this->stat[i]==ET_VALID)                      &&( this->ele[i]<this->ele[Min]))                    {                            Min = i;                    }        }        return this->_get_elem( Min, item);}template <class ELEM>bool Container<ELEM>::_find_idle( INDEX &num){        INDEX        i;        for( i=0; i<this->cap; i++)        {                if( this->stat[i]==ET_INVALID )                {                        num = i;                        return true;                }        }        return false;}template <class ELEM>bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem){        if( (num>=this->cap)                ||(num<0) )                    return false;        this->stat[num] = ET_VALID;        this->ele[num] = elem;        return true;}template <class ELEM>bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem){        if( (num<0)                ||(num>=this->cap))                    return false;        this->stat[num] = ET_INVALID;        elem =  this->ele[num];        return true;}/***    define a type of symbol. It will be used to record all information about a symbol.*/typedef char SYMINDEX;typedef int SYMFRE;class Symbol {        public:                /*            *    In the Huffman tree, we need to compute the sum of two child symbol.            *    For convenience,build a reload function is necessary.            */                Symbol operator + ( Symbol &s);                SYMINDEX        sym;                SYMFRE            freq;};Symbol Symbol::operator +( Symbol &s){        Symbol        ret;        ret.sym = '\0';        ret.freq = this->freq + s.freq;        return ret;}/***    define a node of binary tree. It will be used to create a Huffman tree.*/class HTreeNode {        public:                /*            *    In the container, we need compare two nodes. So this node must            *    provide a reload function about '<'.            */                bool operator< ( HTreeNode &n);                HTreeNode        *lchild;                HTreeNode        *rchild;                Symbol                sym;};bool HTreeNode::operator < ( HTreeNode &n){        return this->sym.freq<n.sym.freq? true: false;}/***    This is the core structure. It will build a Huffman coding based on our input symbol.*/class HuffmanCoding {        public:                HuffmanCoding( );                ~HuffmanCoding( );                bool Set( Symbol s[], int num);                bool Work( void);	private:                /*            *    create a Huffman tree.            */                bool CreateTree(Symbol s[], int num );                bool DestroyTree( );                /*            *    read Huffman coding from a Huffman tree.            */                bool ReadCoding( );                bool TravelTree( HTreeNode *parent, char *buf, INDEX cur);                Symbol                *sym ;                int                        sym_num ;                HTreeNode        *root ;};HuffmanCoding::HuffmanCoding( ){        this->sym = NULL;        this->sym_num = 0;        this->root = NULL;}HuffmanCoding::~HuffmanCoding( ){        if( this->sym!=NULL)                delete []this->sym;        this->sym_num = 0;        this->DestroyTree( );}/***    receive data from outside. Actually, this function is not necessary.But for make the *    algorithm looks like more concise,maybe this function is  necessary.*/bool HuffmanCoding::Set( Symbol s [ ], int num){        this->DestroyTree( );        this->sym = new Symbol[num];        for( int i=0; i<num; i++)                this->sym[i] = s[i];        if( NULL!=this->sym)        {                this->sym_num = num;                return true;        }        else        {                this->sym_num = 0;                return false;        }}/***    The core function. In this function, we create a Huffman tree , then read it.*/bool HuffmanCoding::Work( void){	        //Create a Huffman tree	        if( !this->CreateTree( this->sym, this->sym_num))                return false;        //read Huffman coding        if( !this->ReadCoding( ))                return false;        return true;}bool HuffmanCoding::CreateTree( Symbol s[], int num){         /*       *    create a priority tank. It always pop the element of the highest priority in the tank.       */        Container<HTreeNode>	tank(num);        for( int i=0; i<this->sym_num; i++)        {                HTreeNode        node;                node.lchild = NULL;                node.rchild = NULL;                node.sym = s[i];                tank.push( node);        }        /*       *    always pop two nodes, if fail, that's means there is only one node remain and it       *    is the root node of this Huffman tree.       */        HTreeNode        node1;        HTreeNode        node2;        while(  tank.pop( node1)                        && tank.pop( node2) )        {                HTreeNode        parent;                parent.lchild = new HTreeNode;                parent.rchild = new HTreeNode;                *parent.lchild = node1;                *parent.rchild = node2;                parent.sym = node1.sym + node2.sym;                /*            *    push new node to the tank.            */                tank.push( parent);        }        this->root = new HTreeNode(node1);        return true;}bool HuffmanCoding::DestroyTree( ){        return false;}bool HuffmanCoding::ReadCoding( ){        char        *code;        code = new char[this->sym_num + 1];        /*       *    travel the Huffman tree and print the code of all valid symbols.       */        this->TravelTree( this->root, code, 0);        delete []code;        return true;}#define        LCHAR    '1'#define        RCHAR    '0'bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur){        buf[cur] = '\0';        if( (parent->lchild==NULL)                &&(parent->rchild==NULL) )                {//end node                        printf("[ %c] : %s\n", parent->sym.sym, buf);	        }        if( parent->lchild!=NULL )        {                buf[cur] = LCHAR;                this->TravelTree( parent->lchild, buf, cur + 1);        }        if( parent->rchild!=NULL )        {                buf[cur] = RCHAR;                this->TravelTree( parent->rchild, buf, cur + 1);        }        return true;}static Symbol	sArr[ ] = {        { '0', 0},        { '1', 1},        { '2', 2},        { '3', 3},        { '4', 4},        { '5', 5},        { '6', 6},        { '7', 7},        { '8', 8},        { '9', 9},};int main(){        HuffmanCoding    hcoding;        hcoding.Set( sArr, 10);        hcoding.Work( );        return 0;}