Identify Unicode Junk Characters in a String – T – SQL

In this blog I will tell you about Unicode Characters.

Most common problem when dealing with data is –
“I have a string with an unwanted Unicode character, how to identify?”
“Does anybody have any ideas on how to get rid of unwanted junk character from a text string? “

First you have to identify for yourself what you mean by “unwanted/junk characters”. Generally spoken “junk characters” are characters which you don’t want to have in your Data.

For example, in English texts every Western European or Chinese characters are junk. It purely depends on what exactly you want to accomplish.

For that we need to know what this Unicode character is.

Each character on a computer (printable and non-printable) has a number known as its Unicode character code or value, no matter what platform, device, application or language. These characters are not intended for use in Excel worksheets and can cause a variety of errors if present.

Some Junk Characters :-

CharacterUnicode
©169
ª170
«171
¬172
®174
¯175
°176
±177
²178
³179
´180
µ181
182
·183
¸184
¹185
º186
»187
¿191

Here I will give you the method of identifying Unicode in a string.

For that we must make a list of Unicode and its related characters.

Create Table:-

CREATE TABLE [dbo].[ch_char_validation_tb](

            [chr_id] [nvarchar](255) NULL,

            [chr_char] [nvarchar](255) NULL,

            [chr_description] [nvarchar](255) NULL,

            [chr_unicode] [int] NULL,

            [chr_is_valid] [bit] NULL

) ON [PRIMARY]

Sample Data (This is sample junk characters, you must make your list): –

Reference – List of Unicode characters – Wikipedia

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’33’, N’Space’, N’Space’, 32, 0)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’34’, N’!’, N’Exclamation mark’, 33, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’35’, N'”‘, N’Quotation mark’, 34, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’36’, N’#’, N’Number sign, Hashtag, Octothorpe, Sharp’, 35, 0)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’37’, N’$’, N’Dollar sign’, 36, 0)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’38’, N’%’, N’Percent sign’, 37, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’39’, N’&’, N’Ampersand’, 38, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’40’, N””, N’Apostrophe’, 39, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’41’, N'(‘, N’Left parenthesis’, 40, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’42’, N’)’, N’Right parenthesis’, 41, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’43’, N’*’, N’Asterisk’, 42, 1)

INSERT [dbo].[ch_char_validation_tb] ([chr_id], [chr_char], [chr_description], [chr_unicode], [chr_is_valid]) VALUES (N’44’, N’+’, N’43’, N’Plus sign’, 43, 1)

Function to Compare String and Table Unicode: –

ALTER Function [dbo].[char_validation_fn]

(

      @nstring NVARCHAR(MAX)

)

RETURNS VARCHAR(MAX)

BEGIN

            DECLARE @position INT, @chars nvarchar(2000)

            SET @position = 1

            Declare @tempchr TABLE 

                                    (

                                                chr_message name_dt

                                    )

            WHILE @position <= DATALENGTH(@nstring)

            BEGIN

                        insert into @tempchr

                        select chr_char

                        from ch_char_validation_tb

                        where chr_unicode = UNICODE(SUBSTRING(@nstring, @position, 1))

                        and  chr_is_valid = 0

                        SET @position = @position + 1

            END

            select @chars = rtrim(ltrim(isnull(@chars,”))) + rtrim(ltrim(isnull(chr_message,”))) + ‘|’ from @tempchr

RETURN (@chars)

END

GO

Call Function and Test:-

select dbo.char_validation_fn (‘t t#e$r’)

Use it in Stored procedure.

Keep reading, share your thoughts, experiences. Feel free to contact us to discuss more. If you have any suggestion / feedback / doubt, you are most welcome.

Stay tuned on Knowledge-Junction, will come up with more such articles.

🙂

Thanks for reading

Advertisements