text - Grep expression filter out lines of the form [alnum][punct][alnum] -
hi first post thought simple ...
i haven't been able find example of similar problem/solution.
i have thousands of text files thousands of lines of content in form
<word><space><word><space><number>
example:
example 1 useful when 1 for. 1 ,boy wonder 1 ,hary-horse wondered 2
in above example want exclude line 3 contains internal punctuation
i'm trying use gnu grep 2.25 not having luck
my initial attempt (however not allow "-" internal pattern):
grep -v [:alnum:]*[:punct:]*[:alnum:]* filename
so tried
grep -v [:alnum:]*[:space:]*[!]*["]*[#]*[$]*[%]*[&]*[']*[(]*[)]*[*]*[+]*[,]*[.]*[/]*[:]*[;]*[<]*[=]*[>]*[?]*[@]*[[]*[\]*[]]*[^]*[_]*[`]*[{]*[|]*[}]*[~]*[.]*[:space:]*[:alnum:]* filename
however need factor in spaces , - these acceptable internal string.
i had been trying :punct" set see contains - not work
i have stored procedure in tsql process these prefer preprocess prior loading if possible routine takes seconds per file.
has been able achieve similar?
on face of it, you're looking 'word space word space number' schema, assuming 'word' 'one alphanumeric optionally followed 0 or 1 occurrences of 0 or more alphanumeric or punctuation characters , ending alphanumeric', , 'space' 'one or more spaces' , 'number' 'one or more digits'.
in terms of grep -e
(aka egrep
):
grep -e '[[:alnum:]]([[:alnum:][:punct:]]*[[:alnum:]])?[[:space:]]+[[:alnum:]]([[:alnum:][:punct:]]*[[:alnum:]])?[[:space:]]+[[:digit:]]+'
that contains:
[[:alnum:]]([[:alnum:][:punct:]]*[[:alnum:]])?
that detects word punctuation surrounded alphanumerics, and:
[[:space:]]+ [[:digit:]]+
which 1 or more spaces or digits.
using mildly extended data file, produces:
$ cat data example 1 useful when 1 for. 1 ,boy wonder 1 ,hary-horse wondered 2 o'reilly books 23 coelecanths, dodos etc 19 $ grep -e '[[:alnum:]]([[:alnum:][:punct:]]*[[:alnum:]])?[[:space:]]+[[:alnum:]]([[:alnum:][:punct:]]*[[:alnum:]])?[[:space:]]+[[:digit:]]+' data example 1 useful when 1 ,boy wonder 1 ,hary-horse wondered 2 o'reilly books 23 coelecanths, dodos etc 19 $
it eliminates for. 1
line required.
Comments
Post a Comment