python - Find and Edit Text File -
i'm looking find if there way of automating process. have 300,000 rows of data needed download on daily basis. there couple of rows need edited before can uploaded sql.
jordan || michael | 23 | bulls | chicago bryant | kobe ||| 8 || la
what want accomplish have 4 vertical bars per row. normally, search keyword edit manually save. these 2 anomalies in data.
- find "jordan", remove excess 1 vertical bar "|" right after it.
- i need find "kobe", remove 2 excess vertical bars "|" right after it.
correct format below -
jordan | michael | 23 | bulls | chicago bryant | kobe | 8 || la
not sure if can done in vbscript or python. appreciated. thanks!
python or vbscript used overkill simple. try sed
:
$ sed -e 's/(jordan *)\|/\1/g; s/(kobe *)\| *\|/\1/g' file jordan | michael | 23 | bulls | chicago bryant | kobe | 8 || la
to save new file:
sed -e 's/(jordan *)\|/\1/g; s/(kobe *)\| *\|/\1/g' file >newfile
or, change existing file in-place:
sed -ei.bak 's/(jordan *)\|/\1/g; s/(kobe *)\| *\|/\1/g' file
how works
sed reads , processes file line line. in our case, need substitute command has form s/old/new/g
old
regular expression and, if found, replaced new
. optional g
@ end of command tells sed perform substitution command 'globally', meaning not once many times appears on line.
s/(jordan *)\|/\1/g
this tells sed jordan followed 0 or more spaces followed vertical bar , remove vertical bar.
in more detail, parens in
(jordan *)
tell sed save string jordan followed 0 or more spaces group. in replacement side, reference group\1
.s/(kobe *)\| *\|/\1/g
similarly, tells sed kobe followed 0 or more spaces followed vertical bar , remove vertical bar.
using python
using same logic above, here python program:
$ cat kobe.py import re open('file') f: line in f: line = re.sub(r'(jordan *)\|', r'\1', line) line = re.sub(r'(kobe *)\| *\|', r'\1', line) print(line.rstrip('\n')) $ python kobe.py jordan | michael | 23 | bulls | chicago bryant | kobe | 8 || la
to save new file:
python kobe.py >newfile
Comments
Post a Comment