Sunday, January 23, 2011

Remove html tags: two sed one liners

1. Quick and easy one liner to remove any html tags from file blah:
cat blah | sed 's/<[^>]*>//g'

2. Remove html tags and format jscript just enough to read file blah:
cat blah | sed 's/<[^>]*>//g' | sed 's/;[^\n]/;\n/g' | sed 's/{/{\n/g'| sed 's/{\s*[^\n]/\n{/g'

note that these only work if the html tag is not broken onto two lines.

No comments:

Post a Comment