首页 > 代码库 > 关于HuffmanCoding的简单分析
关于HuffmanCoding的简单分析
1.what‘s problem we faced?
/**
* Q: what‘s problem we faced?
*
* A: Data compression is still a problem, even now. we want to compress
* the space of data. This desire is more and more stronger when we
* need to deal with some operation about data transmission. Before
* we start this article, it may be helpful if you try to provide a valid way
* to compress data . I tried, but failed obviously. That why I write this
* article. ^_^
*/
2. How can I solve it?
/**
* Q: How can I solve it?
*
* A: Where have problem is where have an answer, although it not always
* the best one. In 1951, a algorithm was introduced by David A. Huffman.
* It is different from the normal code and is a variable length code, which
* have different length of code for different symbol. Now, there are two
* problems:
*
* No.1: is variable length code possible? How can we know the length
* of current symbol?
*
* The answer is prefix code. Think about this, a tree like following:
*
*
* O
* 1 / \ 0
* O O
* 1 / \ 0 c
* O O
* a b
*
* This is a simple binary tree. There are three leaf node: a, b ,and c.we
* label all of left branch as 1, and all of right branch as 0. So if we want
* to arrive the leaf node a, the path is 11. In a similar way, we can get
* all of nodes:
* a : 11
* b : 10
* c : 0
*
* By accident, we get a variable length code.
*
*
* No.2: How can we use variable length code to compress a series of symbol?
*
* Now that we have a ability about variable length code. Some funny thing
* will happen. Image this, In a data, which consist of a series of symbols,
* some of symbols have occur at high proportion. some of symbols has occur
* at low proportion. If we use some shorter code to indicate those symbols
* which have a high proportion, the space of data will smaller than ever.
* That is what we want.
*
* Now, we have been know that we could compress a data by use variable length
* code. However, the next problem is what kind of variable length code is what we
* want. what kind of code is optimal ?
*/
3. What is HuffmanCoding ?
/**
* Q: What is HuffmanCoding ?
*
* A:Now,the problem is how can I create a optimal tree ? Do you have any idea?
* Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may
* be simple, but the result is valid( this will be demonstrated below). The simplest
* construction algorithm use a priority queue where the node with lowest probability
* is given highest priority, the steps as following:
*
* 1. create a leaf node for each symbol, and add it to the priority queue.
* 2. while there is more than one node in the queue:
* 1. remove two nodes that have the highest priority.
* 2. create a new node as the parent node of the two nodes above. the
* probability of this one is equal to the sum of the two nodes‘ probabilities.
* 3. add the new node to the queue.
* 3. the remaining node is the root of this tree. Read it‘s code as we do above.
*
*/
4. is it optimal ?
/**
* Q: is it optimal ?
*
* A: Hard to say. I haven‘t a valid method to measure this. About this issue, it is necessary to hear
* about other people‘s advice. I believe there must be some exciting advice. By the way, this article
* is just talk about compress of independent symbol, another important issue is about related symbol.
* That maybe a serious problem.
*
*/
5. source code
/*** Here is an simple example*/#include <stdio.h>#include <iostream>/*** In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which* haven't a valid symbol. we need to label it in our nodes.*/enum ELEM_TYPE { ET_VALID, ET_INVALID, ET_MAX,};typedef int INDEX;/*** this is a container, we push all of element to it, and pop element by a priority. It is* a class template since we don't know the type of data element.*/template <class ELEM>class Container { public: Container( int capacity); ~Container( ); /* * push a element to this container. */ bool push( ELEM item); /* * pop a element from this container, the smallest one have the most priority. * Of course, the element must have provide a reload function for operator '<'. */ bool pop( ELEM &item ); private: bool _find_idle( INDEX &num); bool _set_elem( INDEX num, ELEM &elem); bool _get_elem( INDEX num, ELEM &elem); ELEM *ele; ELEM_TYPE *stat; int cap;};template <class ELEM>Container<ELEM>::Container( int capacity){ this->ele = new ELEM[capacity] ; this->stat = new ELEM_TYPE[capacity]; int i; for( i=0; i<capacity; i++) this->stat[i] = ET_INVALID; this->cap = capacity ;}template <class ELEM>Container<ELEM>::~Container( ){ if( this->ele!=NULL ) delete []this->ele; if( this->stat!=NULL ) delete []this->stat; this->cap = 0;}template <class ELEM>bool Container<ELEM>::push( ELEM item){ INDEX num = -1; if( (!this->_find_idle( num)) ||(!this->_set_elem( num, item))) return false; return true;}template <class ELEM>bool Container<ELEM>::pop( ELEM &item ){ INDEX i = 0; INDEX Min; /* * find the first valid element. */ while( (this->stat[i]!=ET_VALID) &&( i<this->cap)) i++; for( Min = i ; i<this->cap; i++) { if( ( this->stat[i]==ET_VALID) &&( this->ele[i]<this->ele[Min])) { Min = i; } } return this->_get_elem( Min, item);}template <class ELEM>bool Container<ELEM>::_find_idle( INDEX &num){ INDEX i; for( i=0; i<this->cap; i++) { if( this->stat[i]==ET_INVALID ) { num = i; return true; } } return false;}template <class ELEM>bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem){ if( (num>=this->cap) ||(num<0) ) return false; this->stat[num] = ET_VALID; this->ele[num] = elem; return true;}template <class ELEM>bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem){ if( (num<0) ||(num>=this->cap)) return false; this->stat[num] = ET_INVALID; elem = this->ele[num]; return true;}/*** define a type of symbol. It will be used to record all information about a symbol.*/typedef char SYMINDEX;typedef int SYMFRE;class Symbol { public: /* * In the Huffman tree, we need to compute the sum of two child symbol. * For convenience,build a reload function is necessary. */ Symbol operator + ( Symbol &s); SYMINDEX sym; SYMFRE freq;};Symbol Symbol::operator +( Symbol &s){ Symbol ret; ret.sym = '\0'; ret.freq = this->freq + s.freq; return ret;}/*** define a node of binary tree. It will be used to create a Huffman tree.*/class HTreeNode { public: /* * In the container, we need compare two nodes. So this node must * provide a reload function about '<'. */ bool operator< ( HTreeNode &n); HTreeNode *lchild; HTreeNode *rchild; Symbol sym;};bool HTreeNode::operator < ( HTreeNode &n){ return this->sym.freq<n.sym.freq? true: false;}/*** This is the core structure. It will build a Huffman coding based on our input symbol.*/class HuffmanCoding { public: HuffmanCoding( ); ~HuffmanCoding( ); bool Set( Symbol s[], int num); bool Work( void); private: /* * create a Huffman tree. */ bool CreateTree(Symbol s[], int num ); bool DestroyTree( ); /* * read Huffman coding from a Huffman tree. */ bool ReadCoding( ); bool TravelTree( HTreeNode *parent, char *buf, INDEX cur); Symbol *sym ; int sym_num ; HTreeNode *root ;};HuffmanCoding::HuffmanCoding( ){ this->sym = NULL; this->sym_num = 0; this->root = NULL;}HuffmanCoding::~HuffmanCoding( ){ if( this->sym!=NULL) delete []this->sym; this->sym_num = 0; this->DestroyTree( );}/*** receive data from outside. Actually, this function is not necessary.But for make the * algorithm looks like more concise,maybe this function is necessary.*/bool HuffmanCoding::Set( Symbol s [ ], int num){ this->DestroyTree( ); this->sym = new Symbol[num]; for( int i=0; i<num; i++) this->sym[i] = s[i]; if( NULL!=this->sym) { this->sym_num = num; return true; } else { this->sym_num = 0; return false; }}/*** The core function. In this function, we create a Huffman tree , then read it.*/bool HuffmanCoding::Work( void){ //Create a Huffman tree if( !this->CreateTree( this->sym, this->sym_num)) return false; //read Huffman coding if( !this->ReadCoding( )) return false; return true;}bool HuffmanCoding::CreateTree( Symbol s[], int num){ /* * create a priority tank. It always pop the element of the highest priority in the tank. */ Container<HTreeNode> tank(num); for( int i=0; i<this->sym_num; i++) { HTreeNode node; node.lchild = NULL; node.rchild = NULL; node.sym = s[i]; tank.push( node); } /* * always pop two nodes, if fail, that's means there is only one node remain and it * is the root node of this Huffman tree. */ HTreeNode node1; HTreeNode node2; while( tank.pop( node1) && tank.pop( node2) ) { HTreeNode parent; parent.lchild = new HTreeNode; parent.rchild = new HTreeNode; *parent.lchild = node1; *parent.rchild = node2; parent.sym = node1.sym + node2.sym; /* * push new node to the tank. */ tank.push( parent); } this->root = new HTreeNode(node1); return true;}bool HuffmanCoding::DestroyTree( ){ return false;}bool HuffmanCoding::ReadCoding( ){ char *code; code = new char[this->sym_num + 1]; /* * travel the Huffman tree and print the code of all valid symbols. */ this->TravelTree( this->root, code, 0); delete []code; return true;}#define LCHAR '1'#define RCHAR '0'bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur){ buf[cur] = '\0'; if( (parent->lchild==NULL) &&(parent->rchild==NULL) ) {//end node printf("[ %c] : %s\n", parent->sym.sym, buf); } if( parent->lchild!=NULL ) { buf[cur] = LCHAR; this->TravelTree( parent->lchild, buf, cur + 1); } if( parent->rchild!=NULL ) { buf[cur] = RCHAR; this->TravelTree( parent->rchild, buf, cur + 1); } return true;}static Symbol sArr[ ] = { { '0', 0}, { '1', 1}, { '2', 2}, { '3', 3}, { '4', 4}, { '5', 5}, { '6', 6}, { '7', 7}, { '8', 8}, { '9', 9},};int main(){ HuffmanCoding hcoding; hcoding.Set( sArr, 10); hcoding.Work( ); return 0;}