Tinkering with Compressed Speech

Martin McCormick martin at dc.cis.okstate.edu
Wed Oct 6 13:49:16 UTC 2004


	I tried an experiment this last weekend to see how hard it is
to write code that compresses audio similarly to what the old APH
pitch restoring speech compressers used to do.  In my case, mine still
does, but it is almost 30 years old and I know it will one day bite
the dust.

	I was also trying for simplicity first so I went for a program
that gives you exactly twice the speech speed out as went in.

	For those of you who like to play with this sort of thing, the
easiest way to get started is to use /dev/dsp in Linux.  It behaves
just like a file but when you write to it with 8-bit audio at 8,000
samples per second, you get sound from your sound card if it is
working correctly.  If you read from it, you get an 8,000
byte-per-second stream that you can direct in to a file or whatever
you like to do with this stream.

	What I did was to write a little program that opens a raw
audio file and begins counting samples.  The old speech compressors
were based on a 20-millisecond sample of sound which is about 50
samples per second.  This slices the audio up in to little fragments
that are 1/50 of a second long.  What I did in my experiment was to
pass the first 160 bytes of audio from the file to the output and then
throw away the next 160 samples.  When the counter hit 320, I reset it
and began passing more audio.

	The result is audio at twice the normal tempo but still at the
correct pitch.  It also has the distortion we find in the older
devices.  What you actually hear is what sounds like static as the
wave form of the voice gets cut in one place at the end of a sample
and then resumes abruptly in a different part at the beginning of the
next sample.

	I guess my next experiment will be to try to make the samples
start and end at slightly different times to attempt to preserve the
wave form being compressed.  This should make the sound more smooth,
I hope.

	I remember in late high school or early college which would
have been late sixties and early seventies for me, hearing about
speech compressers that used a rotating head like a video recorder to
do the audio slicing.  I may be exaggerating but the price of $50,000
comes to mind.  These were probably modified video tape recorders
originally built for television studios.  Only institutions with lots
of dough could have bought one of those and I bet they were a real
beast to maintain, kind of like the first Kersweils.

	When the first electronic speech compressers came out in the
early seventies, I longed for one but they still cost over a thousand
Dollars.

	Finally, the APH began selling their pitch-restorer device
around 1975 at a price that was reasonable enough so that us common
folk could buy them.

	My test program which is definitely not a replacement yet for
one of those devices is done totally with software and the existing
sound card hardware.  It has less background noise than the APH box
but the static I mentioned is pretty distracting so it exchanges one
form of discomfort for another.

	If you want to play with it, be sure your sound card works
first.  If you have ALSA installed on your system, it should make your
sound card work in the manner it is supposed to work in UNIX.  Here is
the little program I wrote which I called cp2x.  Have fun, but don't
blame me if your computer catches fire or eats your cat.  I don't have
anything in here that is normally seen as dangerous.  Take the source
code and compile it with

gcc -ocp2x cp2x.c

You could even compile with

gcc cp2x.c

and then your executable is called a.out.  Since gcc always makes
a.out as the default executable file name, this isn't a very smart
move if you plan to use it for more than a few seconds.  Cut here for
source.

#include <stdio.h>
#include <ctype.h>
#include <strings.h>
typedef int		boolean;		/* boolean data type */
   #define TRUE 1
   #define FALSE 0

main(int argc, char **argv)
{
FILE *soundinput;
FILE *sounddev;
unsigned char c = 0;
int index = 0;
char s4[] = "/dev/dsp";

 if ((soundinput = fopen(argv[1],"rb")) == NULL) {
  perror(argv[1]);
  exit(1);
}

 if ((sounddev = fopen(s4,"w")) == NULL) {
  perror(s4);
  exit(1);
 }

index = 0;
while(fread(&c,sizeof(c),1,soundinput))
{ /*read loop*/
if (index <160) putc (c,sounddev);
index++;
if (index == 320) index = 0;
} /*read loop*/
}




More information about the Blinux-list mailing list