PSI 5813  DIGITAL  SIGNAL COMPRESSION
 

FIELD:  ELECTRONIC SYSTEMS

No. OF COURSE CREDITS:

Theoretical classes :                 3
Seminars and other classes:   0
Self-study hours:                      7

COURSE DURATION IN WEEKS:    12

PROFESSOR IN CHARGE:  Miguel Arjona Ramírez

AIMS: 

This course encompasses the main topics in quantization, linear prediction, transform coding and lossless compression. This subject matter presents the main techniques and useful for building systems able to comply efficiently to design specifications envolving a trade-off between desired signal fidelity and required information transmission rate.  Thereby, the fundamentals are established for more advanced study, research and design of speech, audio, image and video compression systems.

JUSTIFICATION: 

Signal compression techniques are applied in the transmission and compact storage of  speech, audio and video signals as well as for handling text files and still images. The realm of application for these techniques keeps growing as new digital communication services
over the Internet are deployed, and their sophistication grows to attend the increasing demand for improvements in multimedia communications such as teleconference and digital television. 

TOPICAL OUTLINE:

1. Lossless coding and lossy coding 
1.1. Self-information and entropy.
1.2. Symbol emission frequencies. 
1.3. Huffman codes.
1.4. Differential entropy and mutual information.
1.5. The fundamental lossy coding problem: Rate and distortion. Distortion measures. Algorithms and complexity.

1.6. Sampling theorem. Sampling oftwo-dimensional signals and moving pictures.

1.7. Lossless coding of sequences of symbols: Arithmetic coding.

2. Quantization

2.1. Quantizing notions: sample, input-output characteristic, quantization error.

2.2. Uniform quantizer: input-output characteristic types, quantization regions.

2.3. Statistic model for quantization error.

2.4. Nonuniform quantizers: compressor, expander, A and µ companding laws.

2.5. Training vector quantizers: Linde-Buzo-Gray (LBG) algorithm.

2.6. Pulse code modulation (PCM) of speech, audio and video signals.

3. Adaptive quantization

3.1. Short-term energy: blockwise estimation and recursive estimation.

3.2. Estimation modes for the parameters of an adaptive quantizer: feed-forward estimation and feedback estimation.

3.3. Quantizer step-size adaptation.

3.4. Adaptive gain control of input signal.

4. Fixed prediction with adaptative quantization

4.1. Differential signal and prediction-quantization loop.

4.2. Basic differential PCM (DPCM), prediction gain.

4.3. Long-term spectrum of speech signals and autocorrelation models for images.

4.4. Prediction applied to lossless image coding.

4.5. Adaptive DPCM (ADPCM) and adaptation logics.

4.6. Delta modulation.

5. Linear prediction

5.1. Predicting the speech signal based on its short-term spectrum.

5.2. Variable predictor.

5.3. Predictive analysis.

6. Adaptive predictive coding

6.1. APC with feedback or feed-forward adaptive prediction.

6.2. Adaptive prediction coders with a long-term predictor.

6.3. Noise feedback coding.

6.4. Residual-excited linear predictive coder.

6.5. Vector representation of the excitation signal.

6.6. Code-excited linear predictive (CELP) coder.

7. Transform coding

7.1. Transforms and base vectors or base matrices.

7.2. Transform gain.

7.3. Karhunen-Loève transform (KLT).

7.4. Discrete cosine transform (DCT).

7.5. Quantizing and coding transform coefficients: bit allocation, zonal sampling, zigzag scanning, run-length encoding (RLE) and Huffman coding.

7.6. The Joint Photographic Experts Group (JPEG) still image coder.

8. Subband coding

8.1. Subband coding (SBC): principles and comparison with transform coding(AT) as a special case.

8.2. Downsampling and upsampling.

8.3. Perfect reconstruction.

8.4. Quadrature-mirror filter (QMF) bank.

8.5. Subband coding gain.

8.6. Bit allocation among the subbands based on the power spectrum of the signal.

8.7. Tree-structured filter banks.

8.8. Aural masking: Signal-to-mask ratio (SMR), noise-to-mask ratio (NMR), masking thresholds.

8.9. Moving Pictures Expert Group (MPEG) audio coding.


 

BIBLIOGRAPHY:
 
[1] N. S. Jayant and P. Noll, Digital coding of waveforms. Englewood Cliffs: Prentice-Hall, 1984. 
[2] B. S. Atal, V. Cuperman and A. Gersho, Eds., Advances in Speech Coding.Dordrecht: Kluwer Academic Publishers, 1991. 
[3] B. S. Atal, V. Cuperman and A. Gersho, Eds., Speech and audio coding for wireless and network applications. Dordrecht: Kluwer Academic Publishers, 1993.
[4] T. P. Barnwell III, K. Nayebi, C. H. Richardson,  Speech coding: A computer laboratory textbook. New York:  John Wiley & Sons, 1995.
[5] K. Sayood, Introduction to data compression. 2nd.ed, Morgan Kaufmann, 2000. 
[6] S. Furui, Digital speech processing, synthesis, and recognition. New York: Marcel  Dekker, 1985.
[7] W. B. Kleijn and K. K. Paliwal, Eds., Speech Coding and Synthesis. Amsterdam:  ElsevierScience, 1995. 
[8] L. R. Rabiner and R. W. Schafer, Digital processing of speech signals. Englewood  Cliffs: Prentice-Hall, 1978. 



EVALUATION
Exercises will be proposed at each class whose resolution is due for the next one. Besides, an intermediate examination will be taken and a research work will be carried out along the course with initial planning report and final complete report.
The final mark will be obtained as
       N = 0.7P + 0.3E, 
where P is the average of the examination and the complete report marks, and E is the average of the exercise marks.

 
Office hour:  Wednesdays from 15:45 through 16:45 at room D2-14
Class hours:  Wednesdays from 17:00 to 20:00


Professor: Miguel Arjona Ramírez 
                     room D2-14, tel.: 3091-5606, e-mail: miguel no lps na usp no br

Signal Processing Laboratory

SOFTWARE FOR THE EXERCISES