A newly developed grammar-based lossless source coding theory and its implementation was proposed in 1999 and 2000, respectively, by Yang and Kieffer. The code first transforms the original data sequence into an irreducible context-free grammar, which is then compressed using arithmetic coding. In the study of grammarbased coding for mammography applications, we encountered two issues: processing time and limited number of single-character grammar G variables. For the first issue, we discover a feature that can simplify the matching subsequence search in the irreducible grammar transform process. Using this discovery, an extended grammar code technique is proposed and the processing time of the grammar code can be significantly reduced. For the second issue, we propose to use double-character symbols to increase the number of grammar variables. Under the condition that all the G variables have the same probability of being used, our analysis shows that the double- and single-character approaches have the same compression rates. By using the methods proposed, we show that the grammar code can outperform three other schemes: Lempel-Ziv-Welch (LZW), arithmetic, and Huffman on compression ratio, and has similar error tolerance capabilities as LZW coding undersimilar circumstances.
Li, Xiaoli; Krishnan, Sridhar; and Ma, Ngok-Wah, "Application of grammar-based codes for lossless compression of digital mammograms" (2006). Electrical and Computer Engineering Publications and Research. Paper 18.