SQL pattern matching

SQL Pattern Matching Issues with Unicode Datatypes

How to use SQL Pattern Matching

When doing SQL pattern matching as you perform Select Queries in SQL it’s frustrating enough that data is often not quite what is expected with rogue characters and extra spaces that we hadn’t planned for. But there is another little twist with regard to trailing spaces which waits to trip you up in Microsoft T-SQL.

Data Types

When defining columns to store character data in tables we can either store as ASCII characters or Unicode characters. ASCII data types include Char() and VarChar(). Unicode data types include NChar() and NVarChar(). The difference between these will impact on the way the Like operator behaves.

Syntax

match_expression [ NOT ] LIKE pattern [ ESCAPE escape_character ]

The wildcard that we will discuss here is ‘%’ which represents any string of zero or more characters.

The Important Bits

When Unicode data (nchar or nvarchar data types) are used with LIKE, trailing blanks are significant and you will need to allow for them. Conversely, for non-Unicode data (ASCII), trailing blanks are not significant.

When all arguments (match_expression, pattern, and escape_character, if present) are ASCII character data types, ASCII pattern matching is performed. If any one of the arguments is of Unicode data type, all arguments are converted to Unicode and Unicode pattern matching is performed.

Example: ASCII pattern matching with char column

(NB: examples are using Microsoft T-SQL syntax and operating from SQL Server Management Console, hence ‘Go’ and no semi-colons.)

CREATE TABLE TestLike (FullName char(20))
Go
INSERT INTO TestLike VALUES ('John Doe')
Go
SELECT * 
FROM TestLike
WHERE FullName LIKE '% Doe'   -- returns 1 row
Go
Drop Table TestLike
Go

Note: The 12 unused characters (20 – len(‘John Doe’)) are ignored, a wildcard is not necessary because the data type is Non Unicode (ASCII). Although the WHERE clause does not provide for this, 1 row is however returned.

Example: Unicode pattern matching with nchar column

CREATE TABLE TestLike (FullName nchar(20))
Go
INSERT INTO TestLike VALUES ('John Doe')
Go
SELECT *
FROM TestLike
WHERE FullName LIKE '% Doe'   -- no rows returned
Go
Drop Table TestLike
Go

Note: The 12 unused characters (20 – len(‘John Doe’)) are significant because the data type is Unicode. The WHERE clause does not provide for this and therefore no rows are returned.

Exmple: Unicode pattern matching with nchar column and RTRIM

CREATE TABLE TestLike (FullName nchar(20))
Go
INSERT INTO TestLike VALUES ('John Doe')
Go
SELECT * 
FROM TestLike
WHERE RTRIM(FullName) LIKE '% Doe'   -- returns 1 row
Go
Drop Table TestLike
Go

Note: The 12 unused characters (20 – len(‘John Doe’)) are significant because the data type is Unicode. The Where clause does provide for this (RTrim) and therefore 1 row is returned.

Example: Unicode pattern matching with nchar column and allowing for trailing blanks with a % wildcard.

CREATE TABLE TestLike (FullName nchar(20))
Go
INSERT INTO TestLike VALUES ('John Doe')
Go
SELECT * 
FROM TestLike
WHERE FullName LIKE '% Doe%'   -- returns 1 row
Go
Drop Table TestLike
Go

Note: The 12 unused characters (20 – len(‘John Doe’)) are significant because the data type is Unicode. The Where clause does provide for this (%) and therefore 1 row is returned.

Conclusion

Assume nothing!

Click here for more details on Microsoft SQL training courses.

Microsoft SQL Pattern Matching Links

Microsoft SQL using Pattern Matching

Stuart Tayler
27 May 2016