MP3: Changing The Way You Listen To Music - September 2001

Outer Front Cover
Contents
Publisher's Letter: MP3 format will change our music listening
Subscriptions
Feature: MP3: Changing The Way You Listen To Music by Jim Rowe
Feature: Making MP3s: Rippers & Encoders by Greg Swain
Review: Sony's VAIO Notebook Computer by Ross Tester
Project: Build Your Own MP3 Jukebox; Pt.1 by Peter Smith
Weblink
Project: PC-Controlled Mains Switch by Trent Jackson & Ross Tester
Product Showcase
Project: Personal Noise Source For Tinnitus Sufferers by John Clarke
Project: The Sooper Snooper by Ross Tester
Feature: Using Linux To Share An Internet Connection; Pt.4 by Greg Swain
Order Form
Vintage Radio: The Healing R401E/S401E mantel radio by Rodney Champness
Feature: Writing Articles For Silicon Chip by Leo Simpson
Book Store
Feature: Newsgroups: Common Terms & Abbreviations by Silicon Chip
Back Issues
Notes & Errata
Market Centre
Advertising Index
Outer Back Cover

This is only a preview of the September 2001 issue of Silicon Chip.

You can view 35 of the 104 pages in the full issue, including the advertisments.

For full access, purchase the issue for $10.00 or subscribe for access to the latest issues.

Purchase a printed copy of this issue for $10.00.

MP3 Changing the way you listen to music MP3 is the new music buzzword. It “crunches” bloated audio tracks into compact files, for playback via your PC or a dedicated MP3 player. What’s in it for you? – read on and find out! By JIM ROWE D ESPITE THE JARGON, there’s nothing magical about MP3; it’s simply a technique for compressing digital audio, so that it needs less storage space and is faster to transmit from one place to another – via the Internet, for example. In many ways, MP3 is rather like the JPEG format that’s used to compress digital image files, so that more of them can be stored on your PC’s hard disc. In fact, the MP3 version of a piece of music can be 12-14 times smaller than 6 Silicon Chip the equivalent WAV file but still sound almost identical. You can get an even better appreciation of just how effective MP3 is by looking at file sizes. Typically, a 4-minute CD-quality music track occupies about 45MB of disk space in WAV format but this shrinks to only about 4MB in MP3 format for near-CD sound quality. That’s a saving of about 40MB per track or 800MB for a 20-track CD! If necessary, even greater file compression ratios are achievable – it all depends on how much sound quality you’re willing to sacrifice. Is it legal? Being able to use a technique like MP3 to “shrink wrap” music into much smaller electronic packages has made it easier for people to swap music files over the Internet. It’s this aspect that the music industry doesn’t like, because the popularity of MP3 and other digital compression techniques has resulted in a dramatic increase in music “piracy”. However, just because people use MP3 to illegally obtain (or distribute) copyright music doesn’t mean there’s anything illegal about MP3 itself. MP3 is really nothing more than a file format and a lot of MP3s that are available via the Internet are quite legitimate. MP3s are often posted on the Internet by new bands as a means of self-promotion, for example. You can also make MP3s from your own audio CDs, LPs and tapes although, technically speaking, this can constitute a breach of copyright. However, many people take the view that it’s OK to copy provided they own the recordings and the MP3s are for personal use only. There are plenty of software tools available for “ripping” tracks off an audio CD and storing them on your hard disk in WAV format. After that the audio data has to be encoded in MP3 format. Some programs only do ripping while others only do encoding but there are also plenty of combination ripper/encoders available. www.siliconchip.com.au Microsoft’s “Windows Media Player 7” can play MP3s, conventional audio CDs and a host of other audio formats as well. It includes a playlist editor, supports extracting track titles from a CDDB and there are a number of interesting “visualisation” effects to choose from during playback. You can also customise the appearance of the player by applying different “skins”. Many of these software tools are available as freeware or shareware and can be readily downloaded from various sites on the Internet. You’ll also find MP3 rippers/encoders on computer magazine cover CDs. We’ll take a closer look at making your own MP3s later on. Why compress? As you’re probably aware, audio CDs and other digital media store music as a stream of binary numbers (ie, 1s and 0s). Each number specifies the amplitude of the original analog audio signal at a particular sampling instant. In the case of an audio CD, the sampling rate is 44.1kHz – ie, there are 44,100 samples every second – for each of the two stereo channels and each sample is stored as a 16-bit binary number. This means that for every second of a stereo recording, 88,200 of these 16-bit numbers must be stored on the CD. So when you’re playing the CD, the digital music “data” has to be read off the CD at the rate of 1,411,200 (16 x 88,200) bits per second, or about 1.33Mb/s (megabits per second). In practice, the total data rate when you’re playing a CD is actually about three times this, or about 4.3Mb/s, because additional “housekeeping” www.siliconchip.com.au data is needed for error correction, etc. After decoding and error correction, the 16 bits for each audio sample are fed through digital to analog converters (DACs), to deliver the two analog audio signals for a stereo amplifier. This type of digital recording is known as “linear pulse-code modula- The Creative Nomad IIc personal MP3 player connects to your PC’s USB port, comes with 32MB of RAM and costs $299 from Dick Smith Electronics. Also available from DSE is the deluxe Nomad II model for $398.00. tion” (or LPCM), because of the way it saves the samples as “code” numbers whose binary value corresponds directly to the amplitude of the original audio at the sampling instant. LPCM certainly delivers excellent audio quality but this comes at a fairly heavy price in terms of data storage space and transmission time. Even in its “raw” form as a WAV file on a computer hard disk, 16-bit/44.1kHz LPCM needs 1,411,200 bits or 176,400 bytes of storage space for every second of stereo audio. That’s just over 10MB (megabytes) per minute, which is why you need a big hard disk if you use your PC to make your own audio CDs. You can also see why LPCM isn’t really suitable for carrying the audio for digital TV or radio, or for sending music over the Internet. Even a 3-minute pop song would involve over 30MB of data, which would take about 93 minutes to download using your 56.6KB/s modem! Packing it in There are two different ways of compressing digital data. One method simply involves analysing the data for redundancy (ie, data repetition) and then encoding it more efficiently. In other words, it “packs” it more tightly and efficiently. At the other end, mirror-image decoding techniques are used to expand it again, restoring the original data exactly. This is known as “lossless” compression and it’s the kind of compression used for squeezing computer data into “zip” files. Lossless compression can achieve fairly large reduction factors with data that has a lot of redundancy, such as video (where one field image is often almost identical to the one before). But it’s not as effective with data that doesn’t have much redundancy, like music or speech. That brings us to so-called “lossy” digital compression. It is called “lossy” because it prevents the data from being restored exactly to what it was before – just “near enough” for practical purposes. These techniques are based on the idea of “perceptual coding”, which involves analysing the data on the basis of what we know about human perception (sight and hearing) and looking for content that either won’t be perceived or is unlikely continued on page 10 September 2001 7 Psychoacoustics: Fooling The Ear A Sound Pressure Level (dB rel. 0.2nBar <at> 1kHz) S YOU MAY already know, a typical normal ear’s frequency response varies considerably over the audio range and is also quite dependent on the volume of the sound. Our hearing is most sensitive at about 2-5kHz and least sensitive at frequencies below 100Hz. The response also varies a lot more at low volume levels than at high levels. In other words, our ears are quite non-linear and have a rather bumpy frequency response to boot. More recently, it’s also been discovered that because of the way our hearing receptor “hair cells” work inside the ear’s cochlea, the ear isn’t very good at hearing all of the components of a complex sound. In particular, a loud sound at one frequency tends to dominate our perception of all sounds in a band of frequencies, extending either side of the lound sound. This is called “masking” and is illustrated in Fig.1. What happens is that a relatively loud sound (signal A) “pulls up” the ear’s hearing threshold at frequencies on either side, so that if there are other sounds present in that frequency band at a lower level (like sound B), they simply won’t be heard. Essentially, the lower level sounds are “masked out”, because of the way our hearing receptors are desensitised at frequencies on either side of sound A. In practice, the width of this masking effect varies logarithmically with frequency. For example, a loud sound at 100Hz masks out other sounds from 50-150Hz, while another at 1000Hz masks out sounds from 5001500Hz and one at 10kHz masks frequen c ies from 5-15kHz. The higher the frequency, the wider the masking curve in Hertz – see Fig.2. The width of these masking curves also varies with the volume. At low levels, only frequencies quite close to the main sound are masked but the masking widens as the sound level is increased – Fig.3. This means that the ear is best at distinguishing adjacent sound frequencies at low volume levels. There’s another aspect of masking, too. As well as varying with frequency and volume, masking also varies with time. So when a fairly loud sound A starts at time T1 and ends at time T2, its masking effect doesn’t just last while it’s present but fades away relatively slowly after it ends (Fig.4). It can even start slightly before the loud sound is perceived – called “pre-masking”. This “temporal masking” effect +120 +100 +80 Masking sound A +60 Masking threshold +40 Masked sound B +20 0 10Hz Normal hearing threshold in quiet 1kHz 100Hz 10kHz Frequency Fig.1: when you’re listening to a reasonably loud sound, you can’t really hear quieter sounds nearby — due to the masking effect. 8 Silicon Chip also varies with the duration of the masking sound. The masking fades relatively quickly after brief loud sounds but takes longer to fade after long-duration loud sounds. So it’s been established that weak sounds at frequencies close to louder sounds simply can’t be heard. In addition, the masking effect varies with frequency, volume and time duration in a fairly predictable way. It’s this knowledge that is used to program the operation of perceptual encoders, like those used for MP3. Encoder operation Without going into things too deeply, the encoders operate in two main ways. First, they decide which audio components can safely be removed, because they’ll be masked and inaudible anyway – so they’re perceptually redundant. Second, they make decisions regarding how many (or how few) bits need to be used to encode the audio, on a dynamic “instant by instant” basis. To allow this to be done, the analog signal is not converted to digital samples as a single entity but is first filtered into a set of frequency sub-bands – typically 32 – with bandwidths of about 1/3 of an octave. The signal components in each sub-band are then sampled independently and the encoder then analyses each of their amplitudes. Then, by predicting the way the ear will handle each of these sub-band signals and the interaction between them, the encoder decides how many bits are needed to convey each one with sufficient accuracy and clarity. Louder sub-band components will be encoded with a larger number of bits and softer components with a smaller number of bits. Sub-bands where the signal level is below the threshold of hearing aren’t even encoded at all. This is called “adaptive sub-band coding”. What’s the point But what’s the point? Well, you’ll recall that LPCM uses a brute-force www.siliconchip.com.au Varying the bits On the other hand, by combining sub-band coding with a knowledge of the ear’s behaviour, it becomes possible to vary the number of bits used to encode each of the signal components, so that the quantising noise in each sub-band is still kept below the ear’s threshold of hearing (taking into account the effects of masking). This is done dynamically, so that the number of bits needed to convey the signal is not fixed but varies up and down, depending at any time on the signal itself. The net result is a dramatic reduction in the total number of data bits needed to store or transmit the audio, but with almost no “perceptible” difference in the sound of the signal decoded at the other end. Note that the decoded audio signal may well end up lacking many components or details that were in the original and may also have quite a bit of additional noise due to the cruder sampling of quieter signal components. But the crucial point is that these shortcomings are near enough to inaudible. If all this sub-band filtering, analysis and adaptive coding sounds pretty complicated, that’s because it is. In fact, this whole approach to signal compression only became feasible in the last 15 years or so, with the development of digital filtering and www.siliconchip.com.au Sound Pressure Level (dB rel. 0.2nBar <at> 1kHz) +120 +100 Masking thresholds +80 +60 +40 +20 0 10Hz Normal hearing threshold in quiet 1kHz 100Hz 10kHz Frequency Fig.2: the width of the masking threshold “skirts” varies logarithmically with frequency. Sound Pressure Level (dB rel. 0.2nBar <at> 1kHz) approach, with fixed-length 16-bit numbers to represent every sample of the signal regardless of its amplitude. This gives low “quantising noise” (theoretically -96dB below maximum level, for 16-bit sampling) and hence a large dynamic range. If we reduce the number of bits used to represent each sample, this lowers the amount of digital data being sent but the sampling would be cruder – ie, the quantising noise would increase. In fact the noise increases by 6dB each time we use one less bit, so if we drop to only 8-bit sampling we get a signal-to-noise ratio of only 48dB. So although 8-bit LPCM gives half the file size and transmission time of 16-bit LPCM, it also sounds pretty terrible. And 4-bit LPCM would be even worse. +120 +100 100dB +80 +60 80dB +40 60dB +20 0 40dB Normal hearing threshold in quiet 20dB 10Hz 1kHz 100Hz 10kHz Frequency Fig.3: the shape of the masking curve also varies with the volume of the masking sound, being much wider for loud sounds. Fig.4: masking also varies with time, taking quite a while to fade after a loud masking sound ends. signal processing techniques. It certainly wasn’t possible back in the old analog days but now it can all be done digitally by some dedicated LSI chips or software running on a PC. September 2001 9 MP3 Changing the way you listen to music – ctd from page 7 to be missed if it’s removed. The data that’s judged as “perceptually redundant” is then removed, allowing the remaining data to be compressed considerably. In other words, some of the audio information is “thrown away” on the basis that you won’t be able to hear the difference. By the way, lossy and lossless compression aren’t mutually exclusive – you can use them both together, for even more efficient data reduction. It’s this kind of “double shot” compression that’s used to squeeze up to two hours of digital video and 5.1-channel audio on a DVD and to pack up to 74 minutes of high quality stereo on a MiniDisc. It also happens to be the kind of compression used in MP3, to crunch down digital audio files by a factor of 10-14 times or more. But how does the perceptual encoding part of lossy compression actually work? And how does the audio encoder decide which parts of the data can be safely chopped out, without being missed? Take a look at the accompanying panel on psychoacoustics to find out more on this subject. MPEG-1, Layer 3 Now we’ve looked at the broad principles on which digital audio compression are based, let’s turn our attention to MP3 itself. By the way its full official name is “MPEG-1 Layer 3”, which reveals that it’s one implementation of the group of digital data compression technologies known as MPEG-1, developed and standardised by the Motion Picture Experts Group. MPEG-1 began as a technology to compress digital audio and video so they could be stored on CDs – ie, for video CDs. As such, MPEG-1 audio encoding was developed from two earlier technologies called MUSICAM (Masking-pattern Universal Sub-band Integrated Coding And Multiplexing) and ASPEC (Adaptive Spectral Perceptual Entropy Coding). There are essentially three “layers” of MPEG-1 audio encoding, each involving a different level of processing complexity and offering a different degree of compression or “data reduc tion”. Layer 1 is the least complex in 10 Silicon Chip terms of processing and is designed for applications that don’t need a huge amount of data reduction. It reduces the audio data by about 4:1 and needs a data rate of about 384kb/s to give stereo reproduction of near-CD quality. Layer 2 involves more complex processing but reduces the audio data by between 6:1 and 8:1. It gives near-CD stereo repro duction at data rates of 192kb/s and above. Layer 2 is used for the audio on video CDs and for digital TV audio. Layer 3 (ie, MP3) involves the most complex processing, but also achieves the highest degree of data reduction – between 10:1 and 12:1. This allows it to provide near-CD stereo reproduction at data rates of 112kb/s or 128kb/s, or “FM stereo” quality at 64kb/s (21:1 reduction). Even a data rate of just 32kb/s can give respectable “AM mono” quality, with 15kHz sampling and a bandwidth of about 7.5kHz –see Table 1. Obviously, the big appeal of MP3 is this ability to give near-CD quality stereo with files only 1/12 the size of LPCM files, or FM stereo quality with files half that size again. That’s why it’s become so popular for downloading music files over the Internet – because an MP3 file of a typical 3-minute song might take only seven or eight minutes to download, instead of 90 minutes or so for the equivalent WAV file. This also makes MP3 files very attractive for storing music on a PC Looking for MP3 music software? If so, www.mp3.com is the place to go. Iomega’s HipZip MP3 player uses 40MB PocketZip disks as the storage medium. hard disk or in the memory chips of a portable MP3 music player. As stated earlier, MP3 lets you cram a complete 4-minute track into just 4MB for nearCD quality but if you’re happy with lower quality, it can be even smaller. MP3 for all How can you take advantage of MP3 yourself? Well, there’s two fairly easy ways to get MP3 music files. One way of getting MP3s is to download them from the Internet, from the many web sites that specialise in making MP3 files available. Popular sites for this are www.mp3.com, www. scour.net and www.riffage.com but be warned – they’re often very busy and your browser may not be able to access them. One obvious drawback here is that the web sites may not have the particular pieces of music you want. They’re a bit of a lucky dip in this regard. The other main way to get MP3s is to make them yourself (see “Making MP3s”), by converting the tracks on existing audio CDs, LPs and tapes. This involves using a PC with a CDROM drive plus some readily available software. It’s a two-step process: you first turn the music into a WAV file (known as “ripping”), then encode it into an MP3 file (the “encoding” stage). To convert a track from an audio CD, for example, you first need to read the track and save it on your hard disk as a WAV file. This can be www.siliconchip.com.au done using either one of the common music editing programs (like Cool Edit, Sound Forge or CD Spin Doctor) or a ripper program. Many of these can be downloaded from the Internet, from sites like www.mp3.com Similarly, with a track from a tape or LP record, you again have to use one of the music editing programs to record it through your sound card. Once you have the music on your hard disk in WAV file form, you then use an MP3 encoding program to produce the MP3 equivalent file. Again there are many MP3 encoding programs that you can download from the net. Alternatively, you can get a combined ripper/encoder that does everything in one seamless operation. Silicon Chip’s MP3 Jukebox Our MP3 Jukebox is basically a standard PC fitted with an infrared remote control receiver and an LCD screen to display the track titles. The universal remote control handpiece lets you select from up to 99 playlists, each containing up to 199 songs – just by pressing the buttons. Playing MP3s Once you have the music you want in MP3 form, there are various ways you can play it. One way is to play it on your PC via its sound card and amplifier/speakers, using an MP3-capable software program. If you’re running Windows 98/Me/NT, the latest Windows Media Player will play MP3 files directly (as well as conventional audio CDs). Alternatively, you can use a freeware MP3 player such as “Winamp”. As before, you can download these players from web sites like www.mp3.com or from a computer magazine CD-ROM. Another approach is to record the MP3 files on a CD-R disc, using your CD-writer drive and a program like “Easy CD Creator”. You can then play the files from the disc, either on your PC using Windows Media Player or Winamp, or on one of the latest DVD players that can play CD-R discs with MP3 files. MP3 on the move Yet another approach is to download the MP3 files from your computer into one of the shirt-pocket sized portable MP3 players, like the Diamond Rio 500 or 600, or the Creative Labs Nomad II. Many of these players have a USB port, so you can download the files into the player’s memory chips or card quite quickly. Most of the players can store up to an hour or so of high-quality 128kb/s stereo. In short, MP3 is quite a useful tool for making digital audio widely available in surprisingly compact form. No wonder it’s become so popular! SC www.siliconchip.com.au H ERE’S A FANTASTIC WAY to play your MP3s. What we’ve done is design a remote control receiver and LCD display that plugs into the serial (RS232) port of your PC and is controlled by a universal remote control. An accompanying software program interfaces the unit with Winamp. With this setup, you can play your MP3s by remote control and all the track data is displayed on the LCD. The remote can select between 99 playlists, each listing up to 199 songs. You can either build the remote control receiver directly into your PC or mount it externally. In fact, it doesn’t even have to be in the same room as your PC. Instead, you could mount the remote control in your loungeroom and connect it via a serial (RS232) cable to a PC located in an adjacent room – eg, a bedroom or study. Of course, you would also have to run audio cables to connect the output from your PC’s soundcard back to your amplifier. You have to keep these cables short, though – any more than 4-5 metres and you could quickly run into hum and stability problems (not to mention high-frequency losses). One neat solution is to use a dedicated PC as an MP3 Jukebox. This could be sprayed charcoal gray and mounted next to your existing hifi gear. Once it’s working, you don’t really need a keyboard, mouse or monitor, since our remote control setup lets you power the unit down when not in use (provided you have an ATX motherboard, that is). In short, it’s up to you how you use the remote control unit. The first article on our MP3 Jukebox is on page 24 of this month’s issue. September 2001 11