PROWAREtech

articles » current » dot-net » strip-html-tags-from-text-using-regex

.NET: Strip/Remove HTML Tags from Text Using Regex

How to remove the tags from HTML text using C# and regular expressions.

See related: Find Keywords in Text and strip SCRIPT tags from HTML text

It is very easy to remove all HTML tags using Regex.Replace().


string html = "This is a <b>test</b>!<img src='test.jpg' />";
string text = Regex.Replace(html, "<[^>]*>", string.Empty);

Here is the above code snippet used in a complete example.


using System.Text.RegularExpressions;
using System;

namespace ConsoleAppRemoveHtml
{
	class Program
	{
		static void Main(string[] args)
		{
			string html = "This is a <b>test</b>!<img src='test.jpg' />";
			string text = Regex.Replace(html, "<[^>]*>", string.Empty);
			Console.WriteLine(html);
			Console.WriteLine(text);
		}
	}
}

Sample program output:

This is a <b>test</b>!<img src='test.jpg' />
This is a test!

This site uses cookies. Cookies are simple text files stored on the user's computer. They are used for adding features and security to this site. Read the privacy policy.
CLOSE