If you want to remove the HTML tags from a HTML string and retrieve only plain text, the below SQL Server function can be used. It is just removing all the HTML tags by identifying '<' and '>'.
SQL Server Function :
create function [dbo].[StripHTML]
(
@HTMLText varchar(max)
)
returns varchar(max)
as begin
declare @Start int
declare @end int
declare @Length int
set @Start = charindex('<',@HTMLText)
set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
set @Length = (@end - @Start) + 1
while @Start > 0 and @end > 0 and @Length > 0
begin
set @HTMLText = stuff(@HTMLText,@Start,@Length,'')
set @Start = charindex('<',@HTMLText)
set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
set @Length = (@end - @Start) + 1
end
return ltrim(rtrim(@HTMLText))
end
Sample Input:
select dbo.StripHTML('
<!DOCTYPE html><html><body><h1>My First Heading. </h1><p>My first paragraph.</p></body></html>
')
Output:
My First Heading. My first paragraph.
No comments:
Post a Comment