PROWAREtech








.NET: Strip/Remove HTML Tags from Text Using Regex
How to remove the tags from HTML text using C# and regular expressions.
See related: Find Keywords in Text and strip SCRIPT tags from HTML text
It is very easy to remove all HTML tags using Regex.Replace()
.
string html = "This is a <b>test</b>!<img src='test.jpg' />";
string text = Regex.Replace(html, "<[^>]*>", string.Empty);
Here is the above code snippet used in a complete example.
using System.Text.RegularExpressions;
using System;
namespace ConsoleAppRemoveHtml
{
class Program
{
static void Main(string[] args)
{
string html = "This is a <b>test</b>!<img src='test.jpg' />";
string text = Regex.Replace(html, "<[^>]*>", string.Empty);
Console.WriteLine(html);
Console.WriteLine(text);
}
}
}
Sample program output:
This is a <b>test</b>!<img src='test.jpg' /> This is a test!
Comment