This is only a preview of the September 2001 issue of Silicon Chip. You can view 35 of the 104 pages in the full issue, including the advertisments. For full access, purchase the issue for $10.00 or subscribe for access to the latest issues. Items relevant to "Build Your Own MP3 Jukebox; Pt.1":
Items relevant to "PC-Controlled Mains Switch":
Items relevant to "Personal Noise Source For Tinnitus Sufferers":
Items relevant to "Using Linux To Share An Internet Connection; Pt.4":
Purchase a printed copy of this issue for $10.00. |
MP3
Changing the way you listen to music
MP3 is the new music buzzword. It
“crunches” bloated audio tracks into
compact files, for playback via your
PC or a dedicated MP3 player. What’s
in it for you? – read on and find out!
By JIM ROWE
D
ESPITE THE JARGON, there’s
nothing magical about MP3; it’s
simply a technique for compressing digital audio, so that it needs
less storage space and is faster to transmit from one place to another – via the
Internet, for example.
In many ways, MP3 is rather like the
JPEG format that’s used to compress
digital image files, so that more of them
can be stored on your PC’s hard disc.
In fact, the MP3 version of a piece of
music can be 12-14 times smaller than
6 Silicon Chip
the equivalent WAV file but still sound
almost identical.
You can get an even better appreciation of just how effective MP3 is
by looking at file sizes. Typically, a
4-minute CD-quality music track occupies about 45MB of disk space in WAV
format but this shrinks to only about
4MB in MP3 format for near-CD sound
quality. That’s a saving of about 40MB
per track or 800MB for a 20-track CD!
If necessary, even greater file compression ratios are achievable – it all
depends on how much sound quality
you’re willing to sacrifice.
Is it legal?
Being able to use a technique like
MP3 to “shrink wrap” music into
much smaller electronic packages
has made it easier for people to swap
music files over the Internet. It’s
this aspect that the music industry
doesn’t like, because the popularity
of MP3 and other digital compression
techniques has resulted in a dramatic
increase in music “piracy”.
However, just because people use
MP3 to illegally obtain (or distribute)
copyright music doesn’t mean there’s
anything illegal about MP3 itself.
MP3 is really nothing more than a
file format and a lot of MP3s that are
available via the Internet are quite
legitimate. MP3s are often posted on
the Internet by new bands as a means
of self-promotion, for example.
You can also make MP3s from
your own audio CDs, LPs and tapes
although, technically speaking, this
can constitute a breach of copyright.
However, many people take the view
that it’s OK to copy provided they own
the recordings and the MP3s are for
personal use only.
There are plenty of software tools
available for “ripping” tracks off an
audio CD and storing them on your
hard disk in WAV format. After that
the audio data has to be encoded in
MP3 format. Some programs only do
ripping while others only do encoding
but there are also plenty of combination ripper/encoders available.
www.siliconchip.com.au
Microsoft’s “Windows Media Player 7” can play MP3s, conventional audio CDs
and a host of other audio formats as well. It includes a playlist editor, supports
extracting track titles from a CDDB and there are a number of interesting
“visualisation” effects to choose from during playback. You can also customise
the appearance of the player by applying different “skins”.
Many of these software tools are
available as freeware or shareware
and can be readily downloaded from
various sites on the Internet. You’ll
also find MP3 rippers/encoders on
computer magazine cover CDs. We’ll
take a closer look at making your own
MP3s later on.
Why compress?
As you’re probably aware, audio
CDs and other digital media store
music as a stream of binary numbers
(ie, 1s and 0s). Each number specifies
the amplitude of the original analog
audio signal at a particular sampling
instant. In the case of an audio CD, the
sampling rate is 44.1kHz – ie, there
are 44,100 samples every second –
for each of the two stereo channels
and each sample is stored as a 16-bit
binary number.
This means that for every second
of a stereo recording, 88,200 of these
16-bit numbers must be stored on the
CD. So when you’re playing the CD,
the digital music “data” has to be read
off the CD at the rate of 1,411,200 (16
x 88,200) bits per second, or about
1.33Mb/s (megabits per second).
In practice, the total data rate when
you’re playing a CD is actually about
three times this, or about 4.3Mb/s,
because additional “housekeeping”
www.siliconchip.com.au
data is needed for error correction,
etc. After decoding and error correction, the 16 bits for each audio sample are fed through digital to analog
converters (DACs), to deliver the two
analog audio signals for a stereo amplifier.
This type of digital recording is
known as “linear pulse-code modula-
The Creative Nomad IIc personal MP3
player connects to your PC’s USB port,
comes with 32MB of RAM and costs
$299 from Dick Smith Electronics.
Also available from DSE is the deluxe
Nomad II model for $398.00.
tion” (or LPCM), because of the way it
saves the samples as “code” numbers
whose binary value corresponds directly to the amplitude of the original
audio at the sampling instant. LPCM
certainly delivers excellent audio
quality but this comes at a fairly heavy
price in terms of data storage space and
transmission time.
Even in its “raw” form as a
WAV file on a computer hard disk,
16-bit/44.1kHz LPCM needs 1,411,200
bits or 176,400 bytes of storage space
for every second of stereo audio. That’s
just over 10MB (megabytes) per minute, which is why you need a big hard
disk if you use your PC to make your
own audio CDs.
You can also see why LPCM isn’t
really suitable for carrying the audio
for digital TV or radio, or for sending music over the Internet. Even a
3-minute pop song would involve
over 30MB of data, which would take
about 93 minutes to download using
your 56.6KB/s modem!
Packing it in
There are two different ways of
compressing digital data. One method
simply involves analysing the data for
redundancy (ie, data repetition) and
then encoding it more efficiently. In
other words, it “packs” it more tightly and efficiently. At the other end,
mirror-image decoding techniques are
used to expand it again, restoring the
original data exactly.
This is known as “lossless” compression and it’s the kind of compression used for squeezing computer data
into “zip” files.
Lossless compression can achieve
fairly large reduction factors with data
that has a lot of redundancy, such as
video (where one field image is often
almost identical to the one before).
But it’s not as effective with data that
doesn’t have much redundancy, like
music or speech.
That brings us to so-called “lossy”
digital compression. It is called
“lossy” because it prevents the data
from being restored exactly to what
it was before – just “near enough” for
practical purposes. These techniques
are based on the idea of “perceptual
coding”, which involves analysing
the data on the basis of what we know
about human perception (sight and
hearing) and looking for content that
either won’t be perceived or is unlikely
continued on page 10
September 2001 7
Psychoacoustics: Fooling The Ear
A
Sound Pressure Level (dB rel. 0.2nBar <at> 1kHz)
S YOU MAY already know, a
typical normal ear’s frequency
response varies considerably over
the audio range and is also quite
dependent on the volume of the
sound. Our hearing is most sensitive
at about 2-5kHz and least sensitive
at frequencies below 100Hz. The
response also varies a lot more at
low volume levels than at high levels.
In other words, our ears are quite
non-linear and have a rather bumpy
frequency response to boot.
More recently, it’s also been discovered that because of the way our
hearing receptor “hair cells” work
inside the ear’s cochlea, the ear isn’t
very good at hearing all of the components of a complex sound. In particular, a loud sound at one frequency
tends to dominate our perception of
all sounds in a band of frequencies,
extending either side of the lound
sound. This is called “masking” and
is illustrated in Fig.1.
What happens is that a relatively
loud sound (signal A) “pulls up” the
ear’s hearing threshold at frequencies on either side, so that if there
are other sounds present in that
frequency band at a lower level
(like sound B), they simply won’t be
heard. Essentially, the lower level
sounds are “masked out”, because
of the way our hearing receptors are
desensitised at frequencies on either
side of sound A.
In practice, the width of this masking effect varies logarithmically with
frequency. For example, a loud sound
at 100Hz masks out other sounds
from 50-150Hz, while another at
1000Hz masks out sounds from 5001500Hz and one at 10kHz masks
frequen
c ies from 5-15kHz. The
higher the frequency, the wider the
masking curve in Hertz – see Fig.2.
The width of these masking curves
also varies with the volume. At low
levels, only frequencies quite close
to the main sound are masked but
the masking widens as the sound
level is increased – Fig.3. This means
that the ear is best at distinguishing
adjacent sound frequencies at low
volume levels.
There’s another aspect of masking,
too. As well as varying with frequency
and volume, masking also varies with
time. So when a fairly loud sound A
starts at time T1 and ends at time
T2, its masking effect doesn’t just
last while it’s present but fades away
relatively slowly after it ends (Fig.4).
It can even start slightly before the
loud sound is perceived – called
“pre-masking”.
This “temporal masking” effect
+120
+100
+80
Masking
sound A
+60
Masking
threshold
+40
Masked
sound B
+20
0
10Hz
Normal hearing
threshold in quiet
1kHz
100Hz
10kHz
Frequency
Fig.1: when you’re listening to a reasonably loud sound, you can’t really hear
quieter sounds nearby — due to the masking effect.
8 Silicon Chip
also varies with the duration of the
masking sound. The masking fades
relatively quickly after brief loud
sounds but takes longer to fade after
long-duration loud sounds.
So it’s been established that weak
sounds at frequencies close to louder
sounds simply can’t be heard. In addition, the masking effect varies with
frequency, volume and time duration
in a fairly predictable way. It’s this
knowledge that is used to program
the operation of perceptual encoders,
like those used for MP3.
Encoder operation
Without going into things too deeply, the encoders operate in two main
ways. First, they decide which audio
components can safely be removed,
because they’ll be masked and inaudible anyway – so they’re perceptually
redundant.
Second, they make decisions regarding how many (or how few) bits
need to be used to encode the audio,
on a dynamic “instant by instant”
basis. To allow this to be done, the
analog signal is not converted to
digital samples as a single entity but
is first filtered into a set of frequency
sub-bands – typically 32 – with bandwidths of about 1/3 of an octave. The
signal components in each sub-band
are then sampled independently and
the encoder then analyses each of
their amplitudes.
Then, by predicting the way the ear
will handle each of these sub-band
signals and the interaction between
them, the encoder decides how many
bits are needed to convey each one
with sufficient accuracy and clarity.
Louder sub-band components will
be encoded with a larger number of
bits and softer components with a
smaller number of bits. Sub-bands
where the signal level is below the
threshold of hearing aren’t even encoded at all.
This is called “adaptive sub-band
coding”.
What’s the point
But what’s the point? Well, you’ll
recall that LPCM uses a brute-force
www.siliconchip.com.au
Varying the bits
On the other hand, by combining
sub-band coding with a knowledge
of the ear’s behaviour, it becomes
possible to vary the number of bits
used to encode each of the signal
components, so that the quantising
noise in each sub-band is still kept
below the ear’s threshold of hearing
(taking into account the effects of
masking). This is done dynamically,
so that the number of bits needed
to convey the signal is not fixed but
varies up and down, depending at
any time on the signal itself.
The net result is a dramatic reduction in the total number of data
bits needed to store or transmit the
audio, but with almost no “perceptible”
difference in the sound of the signal
decoded at the other end.
Note that the decoded audio signal may well end up lacking many
components or details that were in
the original and may also have quite
a bit of additional noise due to the
cruder sampling of quieter signal
components. But the crucial point
is that these shortcomings are near
enough to inaudible.
If all this sub-band filtering, analysis and adaptive coding sounds pretty
complicated, that’s because it is. In
fact, this whole approach to signal
compression only became feasible
in the last 15 years or so, with the
development of digital filtering and
www.siliconchip.com.au
Sound Pressure Level (dB rel. 0.2nBar <at> 1kHz)
+120
+100
Masking
thresholds
+80
+60
+40
+20
0
10Hz
Normal hearing
threshold in quiet
1kHz
100Hz
10kHz
Frequency
Fig.2: the width of the masking threshold “skirts” varies logarithmically with
frequency.
Sound Pressure Level (dB rel. 0.2nBar <at> 1kHz)
approach, with fixed-length 16-bit
numbers to represent every sample
of the signal regardless of its amplitude. This gives low “quantising
noise” (theoretically -96dB below
maximum level, for 16-bit sampling)
and hence a large dynamic range. If
we reduce the number of bits used
to represent each sample, this lowers
the amount of digital data being sent
but the sampling would be cruder – ie, the quantising noise would
increase.
In fact the noise increases by 6dB
each time we use one less bit, so if
we drop to only 8-bit sampling we get
a signal-to-noise ratio of only 48dB.
So although 8-bit LPCM gives half
the file size and transmission time of
16-bit LPCM, it also sounds pretty
terrible. And 4-bit LPCM would be
even worse.
+120
+100
100dB
+80
+60
80dB
+40
60dB
+20
0
40dB
Normal hearing
threshold in quiet
20dB
10Hz
1kHz
100Hz
10kHz
Frequency
Fig.3: the shape of the masking curve also varies with the volume of the
masking sound, being much wider for loud sounds.
Fig.4: masking
also varies with
time, taking
quite a while
to fade after a
loud masking
sound ends.
signal processing techniques.
It certainly wasn’t possible back in
the old analog days but now it can all
be done digitally by some dedicated
LSI chips or software running on a
PC.
September 2001 9
MP3
Changing the way you listen to
music – ctd from page 7
to be missed if it’s removed.
The data that’s judged as “perceptually redundant” is then removed,
allowing the remaining data to be
compressed considerably. In other
words, some of the audio information
is “thrown away” on the basis that you
won’t be able to hear the difference.
By the way, lossy and lossless compression aren’t mutually exclusive
– you can use them both together, for
even more efficient data reduction. It’s
this kind of “double shot” compression that’s used to squeeze up to two
hours of digital video and 5.1-channel
audio on a DVD and to pack up to 74
minutes of high quality stereo on a
MiniDisc.
It also happens to be the kind of
compression used in MP3, to crunch
down digital audio files by a factor of
10-14 times or more.
But how does the perceptual encoding part of lossy compression actually
work? And how does the audio encoder decide which parts of the data can
be safely chopped out, without being
missed? Take a look at the accompanying panel on psychoacoustics to find
out more on this subject.
MPEG-1, Layer 3
Now we’ve looked at the broad
principles on which digital audio
compression are based, let’s turn our
attention to MP3 itself. By the way its
full official name is “MPEG-1 Layer
3”, which reveals that it’s one implementation of the group of digital data
compression technologies known as
MPEG-1, developed and standardised
by the Motion Picture Experts Group.
MPEG-1 began as a technology to
compress digital audio and video so
they could be stored on CDs – ie, for
video CDs. As such, MPEG-1 audio
encoding was developed from two
earlier technologies called MUSICAM
(Masking-pattern Universal Sub-band
Integrated Coding And Multiplexing)
and ASPEC (Adaptive Spectral Perceptual Entropy Coding).
There are essentially three “layers”
of MPEG-1 audio encoding, each involving a different level of processing
complexity and offering a different
degree of compression or “data reduc
tion”. Layer 1 is the least complex in
10 Silicon Chip
terms of processing and is designed
for applications that don’t need a huge
amount of data reduction. It reduces
the audio data by about 4:1 and needs
a data rate of about 384kb/s to give
stereo reproduction of near-CD
quality.
Layer 2 involves more
complex processing but
reduces the audio data
by between 6:1 and 8:1.
It gives near-CD stereo
repro
duction at data
rates of 192kb/s and
above. Layer 2 is used
for the audio on video
CDs and for digital TV
audio.
Layer 3 (ie, MP3) involves the most
complex processing, but also achieves
the highest degree of data reduction – between 10:1 and 12:1. This
allows it to provide near-CD stereo
reproduction at data rates of 112kb/s
or 128kb/s, or “FM stereo” quality at
64kb/s (21:1 reduction). Even a data
rate of just 32kb/s can give respectable “AM mono” quality, with 15kHz
sampling and a bandwidth of about
7.5kHz –see Table 1.
Obviously, the big appeal of MP3
is this ability to give near-CD quality
stereo with files only 1/12 the size of
LPCM files, or FM stereo quality with
files half that size again. That’s why it’s
become so popular for downloading
music files over the Internet – because
an MP3 file of a typical 3-minute song
might take only seven or eight minutes
to download, instead of 90 minutes or
so for the equivalent WAV file.
This also makes MP3 files very
attractive for storing music on a PC
Looking for MP3 music software? If
so, www.mp3.com is the place to go.
Iomega’s HipZip MP3 player
uses 40MB PocketZip disks
as the storage medium.
hard disk or in the memory chips of a
portable MP3 music player. As stated
earlier, MP3 lets you cram a complete
4-minute track into just 4MB for nearCD quality but if you’re happy with
lower quality, it can be even smaller.
MP3 for all
How can you take advantage of MP3
yourself? Well, there’s two fairly easy
ways to get MP3 music files.
One way of getting MP3s is to
download them from the Internet, from
the many web sites that specialise in
making MP3 files available. Popular
sites for this are www.mp3.com, www.
scour.net and www.riffage.com but
be warned – they’re often very busy
and your browser may not be able to
access them.
One obvious drawback here is
that the web sites may not have the
particular pieces of music you want.
They’re a bit of a lucky dip in this
regard.
The other main way to get MP3s is
to make them yourself (see “Making
MP3s”), by converting the tracks on
existing audio CDs, LPs and tapes.
This involves using a PC with a CDROM drive plus some readily available
software. It’s a two-step process: you
first turn the music into a WAV file
(known as “ripping”), then encode
it into an MP3 file (the “encoding” stage).
To convert a track from an audio
CD, for example, you first need to
read the track and save it on your
hard disk as a WAV file. This can be
www.siliconchip.com.au
done using either one of the common
music editing programs (like Cool Edit,
Sound Forge or CD Spin Doctor) or a
ripper program. Many of these can be
downloaded from the Internet, from
sites like www.mp3.com
Similarly, with a track from a tape or
LP record, you again have to use one of
the music editing programs to record
it through your sound card.
Once you have the music on your
hard disk in WAV file form, you
then use an MP3 encoding program
to produce the MP3 equivalent file.
Again there are many MP3 encoding
programs that you can download from
the net. Alternatively, you can get a
combined ripper/encoder that does
everything in one seamless operation.
Silicon Chip’s MP3 Jukebox
Our MP3 Jukebox is
basically a standard PC
fitted with an infrared
remote control receiver
and an LCD screen to
display the track titles.
The universal remote
control handpiece lets
you select from up
to 99 playlists, each
containing up to 199
songs – just by pressing
the buttons.
Playing MP3s
Once you have the music you want
in MP3 form, there are various ways
you can play it. One way is to play it
on your PC via its sound card and amplifier/speakers, using an MP3-capable
software program. If you’re running
Windows 98/Me/NT, the latest Windows Media Player will play MP3
files directly (as well as conventional
audio CDs).
Alternatively, you can use a freeware MP3 player such as “Winamp”.
As before, you can download
these players from web sites like
www.mp3.com or from a computer
magazine CD-ROM.
Another approach is to record the
MP3 files on a CD-R disc, using your
CD-writer drive and a program like
“Easy CD Creator”. You can then play
the files from the disc, either on your
PC using Windows Media Player or
Winamp, or on one of the latest DVD
players that can play CD-R discs with
MP3 files.
MP3 on the move
Yet another approach is to download
the MP3 files from your computer into
one of the shirt-pocket sized portable
MP3 players, like the Diamond Rio 500
or 600, or the Creative Labs Nomad II.
Many of these players have a USB port,
so you can download the files into the
player’s memory chips or card quite
quickly. Most of the players can store
up to an hour or so of high-quality
128kb/s stereo.
In short, MP3 is quite a useful tool
for making digital audio widely available in surprisingly compact form. No
wonder it’s become so popular! SC
www.siliconchip.com.au
H
ERE’S A FANTASTIC WAY to play your
MP3s. What we’ve done is design a remote
control receiver and LCD display that plugs into the serial (RS232) port
of your PC and is controlled by a universal remote control.
An accompanying software program interfaces the unit with Winamp. With
this setup, you can play your MP3s by remote control and all the track data
is displayed on the LCD. The remote can select between 99 playlists, each
listing up to 199 songs.
You can either build the remote control receiver directly into your PC or
mount it externally. In fact, it doesn’t even have to be in the same room as
your PC. Instead, you could mount the remote control in your loungeroom
and connect it via a serial (RS232) cable to a PC located in an adjacent
room – eg, a bedroom or study.
Of course, you would also have to run audio cables to connect the output
from your PC’s soundcard back to your amplifier. You have to keep these
cables short, though – any more than 4-5 metres and you could quickly run
into hum and stability problems (not to mention high-frequency losses).
One neat solution is to use a dedicated PC as an MP3 Jukebox. This could
be sprayed charcoal gray and mounted next to your existing hifi gear. Once
it’s working, you don’t really need a keyboard, mouse or monitor, since our
remote control setup lets you power the unit down when not in use (provided
you have an ATX motherboard, that is).
In short, it’s up to you how you use the remote control unit. The first article
on our MP3 Jukebox is on page 24 of this month’s issue.
September 2001 11
|