Speech Coding

PEE 5761 SPEECH CODING

FIELD: ELECTRONIC SYSTEMS

No. OF COURSE CREDITS:

Theoretical classes :                 3
Seminars and other classes:   0
Self-study hours:                      7

COURSE DURATION IN WEEKS:    12

PROFESSOR IN CHARGE: Miguel Arjona Ramírez

AIMS:

Getting the students acquainted with up-to-date speech coding techniques and stimulate their motivation towards the improvement of current techniques and the advancement of new ones through the exercise of critical thinking.

JUSTIFICATION:

Speech coding techniques are applied in the transmission as well as for compactly storing speech signals. They are used for sharing transmission channels in digital wireline or cellular telephony, providing for a larger degree of security as long as criptography is relied upon. Besides, the shared communication channels may carry video or data in multimedia environments which are increasing in popularity, where speech coders capable of operation at multiple rates enable the trade-off between quality of service and number of channels. This capability is necessary to cope with the increasing demand for telephony services over packet networks, especially over the Internet.

TOPICAL OUTLINE:
1. Introduction
1.1. Applications of speech coding.
1.2. Self-information and entropy.
1.3. Capacity of the telephone channel and transmission rate.
1.4. Phonetic information rate.
1.5. Rate and distortion.Distortion measures.
1.6. Functional analysis of a speech coder.

2. Quantization
2.1. Quantizing notions: sample, input-output characteristic, quantization error.
2.2. Uniform quantizer: input-output characteristic types, quantization regions.
2.3. Signal-to-noise ratio (SNR) and segmental SNR (SNRSEG).
2.4. Assumptions for a statistical model of quantization error.
2.5. Stochastic processes regarded as signal generators.
2.6. Quantization error and the 6 dB/bit rule.
2.7. Nonuniform quantizers: compressor, expander, A and µ companding laws.
2.8. Optimum quantizers, M-law companding.

3. Adaptive quantization
3.1. Short-term energy: blockwise estimation and recursive estimation.
3.2. Estimation modes for the parameters of an adaptive quantizer: feed-forward estimation and feedback estimation.
3.3. Quantizer step-size adaptation.
3.4. Adaptive gain control of input signal.

4. Fixed prediction with adaptive quantization
4.1. Differential signal and prediction-quantization loop.
4.2. Basic differential PCM (DPCM), prediction gain, slope overload.
4.3. Adaptive DPCM (ADPCM) and adaptation logics.
4.4. Delta modulation: oversampling factor, continuously variable slope (CVSD) and Jayant?s multiplier adaptation rule.

5. Linear prediction vocoders
5.1. Linear speech production model and the short-term spectrum.
5.2. Predicting the speech signal.
5.3. Variable predictor.
5.4. Predictive analysis: Normal equations - Autocorrelation and covariance methods.
5.5. The Levinson-Durbin algorithm, the Schur-Le Roux-Gueguen algorithm, Itakura-Saito partial correlation (PARCOR) algorithm and the Burg algorithm.
5.6. Linear prediction representations by line spectral pairs (LSPs) and log area ratios (LARs).
5.7. The LPC Vocoder: linear prediction and excitation model.
5.8. Parallel-processing and autocorrelation-based pitch detectors .

6. Adaptive predictive coding
6.1. APC with feedback or feed-forward adaptive prediction.
6.2. Adaptive prediction coders with a long-term predictor.
6.3. Noise feedback coding.
6.4. Residual-excited linear predictive coder (RELP).
6.5. Vector representation of the excitation signal.

7. Analysis-by-synthesis excitation search
7.1. Code-excited linear predictive (CELP) coder.
7.2. Adaptive codebook: its structure and search algorithms.
7.3. Fixed codebooks: stochastic codebook, overlapped codevectors, center-clipped codevectors, sparse stochastic codebooks.
7.4. Multipulse codebooks and sequential multistage searches.
7.5. Algebraic multipulse codebooks (ACELP), focused search and joint position and amplitude search (JPAS).
7.6. Conjugate fixed codebooks andvector-basis-structured codebooks.
7.7. Perceptual weighting and postfiltering.

8. Subband coding and transform coding
8.1. Introduction to subband coding (SBC).
8.2. Critically decimated filter banks.
8.3. Tree-structured filter banks.
8.4. Bit allocation among the subbands based on the power spectrum of the signal.
8.5. Orthogonal transform coder (TC).
8.6. Karhunen-Loève transform (KLT).
8.7. Discrete cosine transform (DCT).

BIBLIOGRAPHY:
[1] N. S. Jayant and P. Noll, Digital coding of waveforms. Englewood Cliffs: Prentice-Hall, 1984.
[2] B. S. Atal, V. Cuperman and A. Gersho, Eds., Advances in Speech Coding.Dordrecht: Kluwer Academic Publishers, 1991.
[3] B. S. Atal, V. Cuperman and A. Gersho, Eds., Speech and audio coding for wireless and network applications. Dordrecht: Kluwer Academic Publishers, 1993.
[4] T. P. Barnwell III, K. Nayebi, C. H. Richardson, Speech coding: A computer laboratory textbook. New York: John Wiley & Sons, 1995.
[5] S. Furui, Digital speech processing, synthesis, and recognition. New York: Marcel Dekker, 1985.
[6] W. B. Kleijn and K. K. Paliwal, Eds., Speech Coding and Synthesis. Amsterdam: ElsevierScience, 1995.
[7] L. R. Rabiner and R. W. Schafer, Digital processing of speech signals. Englewood Cliffs: Prentice-Hall, 1978.

EVALUATION
Exercises will be proposed at each class whose resolution is due for the next one. Besides, an intermediate and a final examination will be taken.
The final mark will be obtained as
       N = 0.7P + 0.3E,
where P is the average of the two examination marks, and E is the average of the exercise marks.

Signal Processing Laboratory