.NET: Strip/Remove HTML Tags from Text Using Regex

How to remove the tags from HTML text using C# and regular expressions.

It is very easy to remove all HTML tags using Regex.Replace().

string html = "This is a <b>test</b>!<img src='test.jpg' />";
string text = Regex.Replace(html, "<[^>]*>", string.Empty);

Here is the above code snippet used in a complete example.

using System.Text.RegularExpressions;
using System;

namespace ConsoleAppRemoveHtml
	class Program
		static void Main(string[] args)
			string html = "This is a <b>test</b>!<img src='test.jpg' />";
			string text = Regex.Replace(html, "<[^>]*>", " ");

This code will remove HTML comments, too.

Sample program output:

This is a <b>test</b>!<img src='test.jpg' />
This is a test!

If wanting to remove just parts of the HTML document or just inline tags, then take a look at these options.

using System.Text.RegularExpressions;

Regex inline_tags = new Regex(@"<\/?(?:i|b|a|span|strong|cite|em|code)[^>]*>", RegexOptions.Compiled); // remove common inline tags but leaves the text contained within
Regex strike = new Regex(@"<s\b[^<]*(?:(?!<\/s>)<[^<]*)*<\/s>", RegexOptions.Compiled); // remove striked text and eveyrthing contained within the tag
Regex sup = new Regex(@"<sup\b[^<]*(?:(?!<\/sup>)<[^<]*)*<\/sup>", RegexOptions.Compiled); // remove sup tag and eveyrthing contained within
Regex sub = new Regex(@"<sub\b[^<]*(?:(?!<\/sub>)<[^<]*)*<\/sub>", RegexOptions.Compiled); // remove sub tag and eveyrthing contained within
Regex header = new Regex(@"<header\b[^<]*(?:(?!<\/header>)<[^<]*)*<\/header>", RegexOptions.Compiled); // remove header tag and eveyrthing contained within
Regex footer = new Regex(@"<footer\b[^<]*(?:(?!<\/footer>)<[^<]*)*<\/footer>", RegexOptions.Compiled); // remove footer tag and eveyrthing contained within
Regex head = new Regex(@"<head\b[^<]*(?:(?!<\/head>)<[^<]*)*<\/head>", RegexOptions.Compiled); // remove head tag and eveyrthing contained within
Regex nav = new Regex(@"<nav\b[^<]*(?:(?!<\/nav>)<[^<]*)*<\/nav>", RegexOptions.Compiled); // remove nav tag and eveyrthing contained within
Regex comment = new Regex("<!--.*?-->", RegexOptions.Compiled | RegexOptions.Singleline); // remove html comments

string StripSomeHTML(string html)
	html = comment.Replace(html, string.Empty);
	html = head.Replace(html, string.Empty);
	html = header.Replace(html, string.Empty);
	html = footer.Replace(html, string.Empty);
	html = nav.Replace(html, string.Empty);
	html = sub.Replace(html, " ");
	html = sup.Replace(html, " ");
	html = strike.Replace(html, " ");
	html = inline_tags.Replace(html, string.Empty);
	return html;

These are just some common examples. By now, it should be obvious the changes to make should other tags want to be removed or stripped.

