How to extract string from filename
tony at baechler.net
Wed Jul 29 11:13:29 UTC 2015
The recent discussion on shell scripts got me thinking. A couple of posters
invited people to post problems they're having with scripts to the list, so
I have not actually written a script for this because I'm not sure how to go
about it. I would normally use cut, but I need to cut from right to left.
The cut help doesn't indicate a way to do this. You can only cut from the
beginning of the line or a range of bytes. The problem is each line
(filenames, to be exact) are of different lengths, so it's impossible to
know what range of bytes I need.
What I'm trying to do is extract the BBC PID from the downloaded files.
It's a lower case alphanumeric string which starts with a letter and is
eight characters. In my case, the first letter is always "b" or "p," so if
I could use something like grep to just extract the first lower case letter
followed by a number up to the next underscore, that would be good. I don't
think grep will just print a matching phrase, only the matching line. Here
are some example filenames:
As you can see, they all follow a similar format. If I could go from right
to left, I would simply cut "_default.mp3" and extract the preceeding 8
bytes, but I can't figure out how. What I'm trying to do is first extract
the PIDs, hopefully preserving the filenames in the process. Once they are
extracted (or printed to stdout), I want to use wget to download the BBC
programme page. If you go to www.bbc.co.uk/programme/bXXXXXXX, you'll get a
web page displaying the broadcast date, description and notes. I would like
to download those pages.
Any help with this would be greatly appreciated. Thanks in advance.
Tony Baechler, Baechler Access Technology Services
Putting accessibility at the forefront of technology
mailto:bats at batsupport.com
Phone: 1-619-746-8310 Fax: 1-619-449-9898
More information about the Blinux-list