Using sed to straighten quotes.

Jeffery Mewtamer mewtamer at gmail.com
Mon Apr 17 00:17:15 UTC 2017


I often convert various document formats to plain text because the
conversion is generally easier than trying to navigate a program that
can read the document in its original format. Problem is, even when
the document is in English or another language that uses the Roman
Alphabet, the converted .txt contains characters my text-mode screen
reader can't read properly(pronouncing the character as "thorn")

Things like left and right double curly quotes and single right curly
quotes are the most common offenders, which also screws up my screen
reader's pronunciation of contractions and possessive, though things
like ellipsis, em-dashes, and accented letters also cause problems.

Most of these problems can be fixed manually, though it means I often
spend as much time correcting the file as I do reading it.

I know how to use sed to do global search and replace on plain text
files, at least where both the string to be found and the string it's
to be replaced with can be typed, but most of the replacements I'd
like to make have search strings containing characters not on my
keyboard.

So, how do I tell sed to replace a left double curly quote with a
straight double quote, an ellipsis with three periods, or an e with an
acute accent with a normal e among other such things? And if this is
beyond sed's capabilities, could someone suggest another command line
tool that can automate this task?

-- 
Sincerely,

Jeffery Wright
Bachelor of Computer Science
President Emeritus, Nu Nu Chapter, Phi Theta Kappa.
Former Secretary, Student Government Association, College of the Albemarle.




More information about the Blinux-list mailing list