Text Manipulation/Replacement
Cameron Simpson
cs at zip.com.au
Tue Sep 23 00:11:46 UTC 2008
On 22Sep2008 14:57, Ubence Quevedo <r0d3nt at pacbell.net> wrote:
| I've used pdftotext to convert a pdf document to text and then used
| a combination of grep and awk to single out data and replace formatting
| that I didn't need.
|
| The output data eventually looks like this:
| 12,123456789
| ,0987654321
|
| But I want it to look like this:
| 12,123456789,0987654321
|
| I've tried many different things with awk, but I can't get it replace
| \r, with just a ,
Do you want to only do this when the following line starts with a comma?
A little state machine might do (untested):
h # stash first line in hold space
:again
n # get next line
/^,/{ # starts with comma? do this stuff
H # append line to hold space
x # get hold space
s/\n// # remove embedded newline
x # put it back
b again # repeat for next line
}
x # pull back hold space for printing
Put that in a file called "sedf" and try:
sed -f sedf < olddata >newdata
and see how it goes. I think it will eat the last line as written.
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
Heaven could change from chocolate to vanilla without violating perfection.
- arromdee at jyusenkyou.cs.jhu.edu (Ken Arromdee)
More information about the fedora-list
mailing list