A relatively simple method for compressing data works by creating a so-called Huffman tree for a file and using it to compress and decompress the data it contains. For most applications,binary Huffman trees are used (i.e., each node is either a leaf or has exactly two sub-nodes). One can, however, construct Huffman trees with an arbitrary number of sub-trees (i.e, ternary or, in general, N-ary trees).
A Huffman tree for a file containing Z different characters has Z leaves. The path from the root to a leaf that represents a certain character determines its encoding; each step towards the leaf determines an encoding character (which can be 0, 1, . . . , (N - 1)). By placing often-occurring characters closer to the root, and less often occurring characters further away from the root, the desirable compression is achieved. Strictly speaking, such a tree is a Huffman tree only if the resulting encoding takes the minimal number of N-ary symbols to encode the complete file.
For this problem, we only consider trees where each node is either an internal node, or a leaf encoding a character; there are no dangling leaves that do not encode for a character.The figure below shows a sample ternary Huffman tree; the characters 'a' and 'e' are encoded using one ternary symbol; the less frequently occurring characters 's' and 'p' are encoded using two ternary symbols; and the very rare symbols 'x', 'q' and 'y' are encoded using three ternary symbols each.
Of course, if we want to decompress a list of N-ary symbols later on, it is important to know which tree was used to compress the data. This can be done in several ways. In this problem we use the following method: the stream of data will be preceded by a header consisting of the encoded versions of the Z characters occurring in the original file, in lexicographical order.
Given the number of source symbols Z, a value N denoting the 'arity' of the Huffman tree, and a header, give the mapping from source symbols to encoded symbols.