Text processing
Paul Howarth
paul at city-fan.org
Fri Feb 3 16:06:02 UTC 2006
Dan Track wrote:
> Hi
>
> I've got the following output
>
> Col1 Col2 Col3 Col5
> 1 000 001 Yes
> 2 000 001
> 3 000 001
> 4 Yes Yes
> 4 000 001
> 4 000 001
> 5 000 001
> 5 Yes 001
> 6 000 001 Yes
>
> As you can see the column widths vary in size. What I need to do is to
> find out The number in Col1 that is associated with all those "Yes"
> occurrences in Col5. How can I do this.
> I've tried the following
> cat file | tr -s ' ' ' ' | tr -s '\t' ' ' | cut -d ' ' -f 6
>
> But I get a result like this
>
> Hi
>
> I've got the following output
>
> Col1 Col2 Col3 Col5
> 1 000 001 Yes
> 2 000 001
> 3 000 001
> 4 Yes Yes
> 4 000 001
> 4 000 001
> 5 000 001
> 5 Yes 001
> 6 000 001 Yes
>
> As you can see one of the "Yes" statements has moved into the third
> column, so that's a wrong move.
>
> Any help would be appreciated
The problem here I think is that some of your columns are empty, so for
instance:
Col1 Col2 Col3 Col5
4 Yes Yes
appears the same as:
Col1 Col2 Col3 Col5
4 Yes Yes
to most Unix text-processing tools that separate fields based on whitespace.
If you're actually looking for lines where the last field is "Yes", you
could just do:
$ awk '$NF == "Yes"' file
If all you want is the number in the first field, you'd have:
$ awk '$NF == "Yes" { print $1 }' file
Paul.
More information about the fedora-list
mailing list