Comment

Peter Bengtsson

But would you use it in a production application where the HTML isn't perfectly pure?

Parent comment

Gregory J. Baker

Beg to disagree with your conclusion: "Regular expressions are ... weak in power" html_text = '''

This should get stripped.

Please keep.

Foo & Bar

This should also get stripped.
''' warnings = '<.*(warning){1}.*>' no_display = '<.*(display: none){1}.*>' disp_warn = re.compile(rf('{no_display}|{warnings}') html_tags = re.compile(r'<.*?>') clean_txt = html.unescape(html_tags.sub('', disp_warn.sub('', html_text))) Will give you: Please keep. Foo & Bar