Problem
Let’s say there’s a table
CREATE TABLE dbo.Settings (
...,
CharactersToRemove NVARCHAR(50)
);
How do you select rows where CharactersToRemove
is equal to a whitespace character (U+0020)?
This question took me by surprise. For me, it was obvious that you can just write
SELECT *
FROM dbo.Settings
WHERE CharactersToRemove = N' '
But this query returns not only the rows with a whitespace character but also rows where CharactersToRemove
is an empty string.
This happens because SQL Server checks if strings have equal length before comparing them. If not, it pads the shorter string with whitespaces. This is defined in SQL-92 standard (section 8.2, General Rule 3a, pp. 207-208):
If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad char- acters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any char- acter in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>
So according to this rule SELECT CASE WHEN 'a' = 'a ' THEN 1 ELSE 0 END
returns 1.
Solution?
Ok, then LEN()
function should help, right?
SELECT *
FROM dbo.Settings
WHERE CharactersToRemove = N' ' and LEN(CharactersToRemove) = 1
This query returns nothing. Why?
Because LEN()
doesn’t count trailing whitespaces (link), so LEN(N’ ‘)
is 0
.
Solution
Working solution uses the DATALENGTH()
function:
SELECT *
FROM dbo.Settings
WHERE CharactersToRemove = N' ' AND DATALENGTH(CharactersToRemove) = 2
The result of DATALENGTH()
may depend on the collation and the version of SQL Server. In my case, DATALENGTH(N' ')
is 2
.