If you want to remove the HTML tags from a HTML string and retrieve only plain text, the below SQL Server function can be used. It is just removing all the HTML tags by identifying '<' and '>'.
SQL Server Function :
- create function [dbo].[StripHTML]
- (
- @HTMLText varchar(max)
- )
- returns varchar(max)
- as begin
- declare @Start int
- declare @end int
- declare @Length int
- set @Start = charindex('<',@HTMLText)
- set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
- set @Length = (@end - @Start) + 1
- while @Start > 0 and @end > 0 and @Length > 0
- begin
- set @HTMLText = stuff(@HTMLText,@Start,@Length,'')
- set @Start = charindex('<',@HTMLText)
- set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
- set @Length = (@end - @Start) + 1
- end
- return ltrim(rtrim(@HTMLText))
- end
Sample Input:
- select dbo.StripHTML('
<!DOCTYPE html><html><body><h1>My First Heading. </h1><p>My first paragraph.</p></body></html>
- ')
Output:
- My First Heading. My first paragraph.
No comments:
Post a Comment