Fun with Bash, and converting file formats under Linux

As you may or may not already know, I’ve been biking to and from work whenever I can (and when my body permits) as of late.  With gas at 138 cents a litre, who can blame me?  I did the first few trips without music, but I have to say, having something to set the rhythm of my pedalling to makes things so much easier.

Only problem was, my music collection was in .ogg file format.

Yes, I suppose I referred to them as mp3s in a previous post.  I guess that’s from force of habit, as I changed them to .ogg format about six months ago, both so it would work better with my various Linux systems, including a MythTV box that I have yet to resurrect from the dead — but that is for another post — and so that I wouldn’t feel bad about violating the Motion Picture Experts Group’s MPEG3 Audio format patents, owned by Fraunhoffer IIS, who evidently don’t much like Linux geeks like me (never mind the legality of the music itself).  The problem with them working better with other Linux servers, is that they unfortunately work worse, that is to say not at all, with older devices that were made at the outset of mp3’s adoption into the commercial world.  For instance, the only portable device I own that can handle mp3s, an mp3-CD player from about six years ago.

So today I converted my entire collection back to mp3 again.  As I was at work, trying to get a particularly stubborn laptop reformatted while dealing with various other crises that other members of the IT team brought to my attention, I couldn’t really focus much effort on the actual conversion, so researching on the web, finding special software and such, was pretty much out.  However, I had gotten all of the tools I needed when I did the conversion to ogg, so I was set.

First up is oggdec, a command line utility that decompresses an ogg file to its basic .wav data.  For those of you not in the know, a .wav file, depending on the format of .wav, is about as close to perfect fidelity you can get for a digital version of a sound file.  As a result, it’s incredibly large.  Ogg and mp3 by contrast, are both lossy formats — some small details of the sound, particularly above and below the human hearing range, get tossed, and the rest of the sound is passed through a compression scheme that lets you store nearly-perfect quality using much less disk space.  (Compare this to a .jpg image — they can be either perfectly clear, or lossy to the point where you can barely make out what it’s supposed to be.)

Then, once it’s a wav file, the same package that allows for reading mp3s on Linux in the first place, can also create them out of the raw audio data.  That package is called LAME — “LAME Ain’t an Mp3 Encoder”.  Oh, sure, it CAN encode mp3s, but it doesn’t want to be typecast.  Never mind that they named it after what it “ain’t”.

So, those two tools were already there and ready on my computer, all I’d have to do is write up a very quick Bash script to go through my collection, one at a time, convert each ogg to wav, then each wav to mp3.  And then delete the wav that was left behind, since it was ten times bigger than the original song in either format.  I came up with this quick and dirty shell script (with line numbers, remove them to use this!):

1 #!/bin/bash
3 for i in *.ogg; do {
4	oggdec "$i"
5	lame "`echo $i | sed -e 's/.ogg/.wav/g'`"
6	rm -rf "`echo $i | sed -e 's/.ogg/.wav/g'`"
7 }; done

And now, after a brief sidebar while I figured out how to get that code to look the way it does, here’s that script line by line, just because this post isn’t nearly long enough to make up for the last week of absence!

The first line is called a “shebang” line. No, not like the Ricky Martin song, and you should go hang yourself for even thinking it. In geek parlance, a number sign (# or pound for you mouth-breathers) is a “hash”, and an exclamation mark (!) is a “bang”. I don’t know why geeks developed single-syllable nicknames for just about every bit of punctuation, but they did. Anyway, say hashbang a million times fast, and I guess you eventually start saying shebang instead. Or something. Anyway, that line is for pointing your shell script to where your local copy of Bash is, or if you’re programming in a different language like say Python (another topic for another day!), where your Python interpreter is.

The second line is blank. We will not speak of it.

Line 3 is what actually tripped me up for a bit. It originally read, “for i in `ls *.ogg`; do {” — backticks mean, do the command within them, and use the output of that command as your looping variable. For loops are essentially a loop through every item in a list, changing the value of “i” or whatever loop variable name you use to the value of the current item, so that you can run commands on whatever you had in that item list. The reason my original way didn’t work (sounds like it’d do the same, right? list every .ogg in a folder, then do some commands to the filename?), is because “ls”, the directory command, will produce a list of all the files in the folder to standard output, and the for command will see each word in every filename (separated by spaces) as its own filename unto itself. Since every one of my oggs had spaces in them, it would fail to process every single one of them. Just doing “for i in *.ogg; do {” instead, accurately performs your commands on each filename instead of goofing it all up by separating out every word of every filename as its own filename. If you followed that, gold star. If not, just always remember to do it like I did in the block of code above.

The next line is the first command to run on each file — oggdec.  That’s the part that pulls the raw audio out of the ogg files.  It saves them as .wav instead of .ogg.

Line 5 is for running the file through LAME.  However, since $i actually contains the full name of the ogg file, and not the converted-to-wav file, I had to do some trickery.  Let’s look at what’s inside the backticks:

echo $i | sed -e 's/.ogg/.wav/g'

What this does is, it outputs to std-out (standard-output, which is usually your monitor), the contents of $i, being the original filename with .ogg at the end.  Then, the pipe takes what goes out to std-out, and pushes it into std-in (standard-input, normally your keyboard, or a file given on the command line).  sed is a giant among little utilities that can be used to replace parts of text — here, I use a regular expression in the format of s/searchstring/replaceitwith/g .  (s means search, the slashes separate the two parameters, and g means keep replacing even if you found one that matches — not necessary in this case, but again, force of habit is tough to deal with.)

After it’s done converting the wav to an mp3, there’s no need to keep it on disk since it’s huge, so line 6 consigns the bloated bit of flotsam to the digital dustbin.  Or erases it, whatever.  My way was more poetic.

And if you can’t figure out what line 7 does, you’re reading the wrong post.  I wrote a good one about Portal the other day.  Go read that instead.

So that’s the anatomy of a Bash script.  Quick and dirty, and if you know what you’re doing, you can do some pretty convoluted stuff with it in no time at all.

There was one thing that bothered me about all this — it produced song filenames that ended in .wav.mp3 due to my not accounting for LAME’s naming conventions and specifying an output name.  So I wrote a one-liner right on the terminal (take that, DOS!) to rename all of those to just filename.mp3 instead.  Here it is, if you’re interested.

for i in *.wav.mp3; do mv "$i" "`echo $i | sed -e 's/.wav.mp3/.mp3/g'`"; done

Note the backslashes before the periods in this regular expression — in regex’es, a period has a special meaning, that being “match any character at all for this one particular place”.  If you precede the period by a backslash, that means “match this character only if it’s a period”.  Useful to know that there’s a difference.

Oh, and as a postscript, Amarok was used to randomly select 180 songs to add to a playlist, and I used the built-in “burn to CD” menu option which sent those 180 freshly minted mp3s to a CD as a data CD.  The conversion process took roughly 3 hours for 7 gigs of music, but the effort I put into this post far outweighs the actual effort it took to write the script this is all based on.  And I got to listen to a good variety of music on my bike ride home.

I’ll be back to insert links to various things tomorrow.  Right now it’s bedtime.

Fun with Bash, and converting file formats under Linux
OrbitCon: The Orbit's online conference. Attend from anywhere.

One thought on “Fun with Bash, and converting file formats under Linux

Comments are closed.