Monday, May 22, 2017

Sample SQL Server function to Strip HTML tags from a HTML string

If you want to remove the HTML tags from a HTML string and retrieve only plain text, the below SQL Server function can be used. It is just removing all the HTML tags by identifying '<' and '>'.

SQL Server Function :
create function [dbo].[StripHTML] 
(
 @HTMLText varchar(max)
)
returns varchar(max) 
as begin
 declare @Start int
    declare @end int
    declare @Length int
    set @Start = charindex('<',@HTMLText)
    set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
    set @Length = (@end - @Start) + 1
    while @Start > 0 and @end > 0 and @Length > 0
    begin
        set @HTMLText = stuff(@HTMLText,@Start,@Length,'')
        set @Start = charindex('<',@HTMLText)
        set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
        set @Length = (@end - @Start) + 1
    end
    return ltrim(rtrim(@HTMLText))
end

Sample Input:

select dbo.StripHTML('
<!DOCTYPE html><html><body><h1>My First Heading. </h1><p>My first paragraph.</p></body></html>
')

Output:

My First Heading. My first paragraph.