How to extract string from filename

Tim Chase blinux.list at thechases.com
Wed Jul 29 15:23:58 UTC 2015


Running your file-names through a couple tests, it's doable with
either a single sed:

  ls | sed 's/.*_\([pb][^_]*\)_default.mp3/\1/'

With grep, it does have a "-o" flag to only print the matching text
rather than the entire line, but to isolate the bits you want, you'd
have to provide more context (that it would then print).  Otherwise,
you end up with things like "puters" from your examples because it
starts with a "p" (or "b") up until the next underscore.

Hope this helps

-tim

On July 29, 2015, Tony Baechler wrote:
> Hi all,
> 
> The recent discussion on shell scripts got me thinking.  A couple
> of posters invited people to post problems they're having with
> scripts to the list, so here goes.
> 
> I have not actually written a script for this because I'm not sure
> how to go about it.  I would normally use cut, but I need to cut
> from right to left. The cut help doesn't indicate a way to do
> this.  You can only cut from the beginning of the line or a range
> of bytes.  The problem is each line (filenames, to be exact) are of
> different lengths, so it's impossible to know what range of bytes I
> need.
> 
> What I'm trying to do is extract the BBC PID from the downloaded
> files. It's a lower case alphanumeric string which starts with a
> letter and is eight characters.  In my case, the first letter is
> always "b" or "p," so if I could use something like grep to just
> extract the first lower case letter followed by a number up to the
> next underscore, that would be good.  I don't think grep will just
> print a matching phrase, only the matching line.  Here are some
> example filenames:
> 
> 5_live_Science_-_Coding_and_Computers_b062dj5j_default.mp3
> Witness_-_The_Sinking_of_the_USS_Indianapolis_p02wdykn_default.mp3
> Discovery_-_A_Scientific_View_of_Agriculture_p0053gbd_default.mp3
> Click_-_05_10_2010_p00b18gp_default.mp3
> 
> As you can see, they all follow a similar format.  If I could go
> from right to left, I would simply cut "_default.mp3" and extract
> the preceeding 8 bytes, but I can't figure out how.  What I'm
> trying to do is first extract the PIDs, hopefully preserving the
> filenames in the process.  Once they are extracted (or printed to
> stdout), I want to use wget to download the BBC programme page.  If
> you go to www.bbc.co.uk/programme/bXXXXXXX, you'll get a web page
> displaying the broadcast date, description and notes.  I would like
> to download those pages.
> 
> Any help with this would be greatly appreciated.  Thanks in advance.
> 
> --------------------
> Tony Baechler, Baechler Access Technology Services
> Putting accessibility at the forefront of technology
> mailto:bats at batsupport.com
> Phone: 1-619-746-8310   Fax: 1-619-449-9898
> 
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list




More information about the Blinux-list mailing list