Monday, May 22, 2017

Sample SQL Server function to Strip HTML tags from a HTML string

If you want to remove the HTML tags from a HTML string and retrieve only plain text, the below SQL Server function can be used. It is just removing all the HTML tags by identifying '<' and '>'.

SQL Server Function :
  1. create function [dbo].[StripHTML]
  2. (
  3. @HTMLText varchar(max)
  4. )
  5. returns varchar(max)
  6. as begin
  7. declare @Start int
  8. declare @end int
  9. declare @Length int
  10. set @Start = charindex('<',@HTMLText)
  11. set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
  12. set @Length = (@end - @Start) + 1
  13. while @Start > 0 and @end > 0 and @Length > 0
  14. begin
  15. set @HTMLText = stuff(@HTMLText,@Start,@Length,'')
  16. set @Start = charindex('<',@HTMLText)
  17. set @end = charindex('>',@HTMLText,charindex('<',@HTMLText))
  18. set @Length = (@end - @Start) + 1
  19. end
  20. return ltrim(rtrim(@HTMLText))
  21. end

Sample Input:

  1. select dbo.StripHTML('
  2. <!DOCTYPE html><html><body><h1>My First Heading. </h1><p>My first paragraph.</p></body></html>
  3. ')

Output:

  1. My First Heading. My first paragraph.

No comments:

Post a Comment