Vigenere By SQL
After that previous post, I suppose it was natural for me to wonder what it would take to apply Vigenere cypher logic using T-SQL.
I knew right off that the code would look much different. Mainly, because my self-imposed rule is that I try very, very hard never to use a loop. I’ve written it before, but loops in T-SQL make the server work on your data row by row, which is madness, considering the millions of man-hours that engineers have spent optimizing its ability to work on large data sets in parallel.
I could go deeper into that rant, but you’ve probably heard it before.
Anyway, I stayed up late the other night to see what I could do, and it turned out to be easier than I expected.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
DECLARE @keyph varchar(100), @cleartext varchar(100) SELECT @keyph = 'thisisamountain', @cleartext = 'The quick brown fox.' ;WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N), t2 AS (SELECT 1 N FROM t1 x, t1 y), t3 AS (SELECT 1 N FROM t2 x, t2 y), tally AS (SELECT ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY x.N)AS N from t3 x, t3 y, t3 z), letters AS (SELECT N , SUBSTRING (@cleartext, N, 1) AS ct , ASCII(LOWER(SUBSTRING (@keyph, CASE WHEN N % LEN(@keyph) = 0 THEN LEN(@keyph) ELSE N % LEN(@keyph) END , 1))) - ASCII('a') AS AsciiKpOffset , ASCII(SUBSTRING (@cleartext, N, 1)) AS AsciiCt , ASCII('A') AS AsciiAUc , ASCII('Z') AS AsciiZUc , ASCII('a') AS AsciiALc , ASCII('z') AS AsciiZLc FROM tally), logic AS (SELECT CASE WHEN AsciiCt BETWEEN AsciiAUc AND AsciiZUc AND AsciiCt + AsciiKpOffset > AsciiZUc THEN CHAR(AsciiCt + AsciiKpOffset - 26) WHEN AsciiCt BETWEEN AsciiAUc AND AsciiZUc AND AsciiCt + AsciiKpOffset <= AsciiZUc THEN CHAR(AsciiCt + AsciiKpOffset) WHEN AsciiCt BETWEEN AsciiALc AND AsciiZLc AND AsciiCt + AsciiKpOffset > AsciiZLc THEN CHAR(AsciiCt + AsciiKpOffset - 26) WHEN AsciiCt BETWEEN AsciiALc AND AsciiZLc AND AsciiCt + AsciiKpOffset <= AsciiZLc THEN CHAR(AsciiCt + AsciiKpOffset) ELSE ct END AS CypherChar FROM letters WHERE AsciiCt IS NOT NULL) SELECT (SELECT CypherChar [text()] FROM logic FOR XML PATH, TYPE).value(N'.[1]', N'nvarchar(max)') AS CypherText |
I can think of absolutely zero practical applications for this. Please let me know if you come across one.
So what’s going on here? Let’s break it down.
- CTE’s t1, t2, t3, and tally build a CTE Talley table.
- The CTE named letters splits the keyphrase and the cleartext message into a result set with one letter on each row. We also have a column named N with the row number, starting with one. Surrounding those results, you have the ASCII() function, so that we’re dealing with numbers rather than letters in the next step. This is the heart of the query. Let’s add a column to show the letter of the keyphrase, and take a look at just this intermediate result set.
123456789101112131415DECLARE @keyph varchar(100), @cleartext varchar(100)SELECT @keyph = 'thisisamountain', @cleartext = 'The quick brown fox.';WITH t1 AS (SELECT 1 N UNION ALL SELECT 1 N),t2 AS (SELECT 1 N FROM t1 x, t1 y),t3 AS (SELECT 1 N FROM t2 x, t2 y),tally AS (SELECT ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY x.N)AS N from t3 x, t3 y, t3 z),letters AS (SELECT N, SUBSTRING (@cleartext, N, 1) AS ct, SUBSTRING (@keyph, N, 1) AS kp, ASCII(LOWER(SUBSTRING (@keyph, CASE WHEN N % LEN(@keyph) = 0 THEN LEN(@keyph) ELSE N % LEN(@keyph) END , 1))) - ASCII('a') AS AsciiKpOffset, ASCII(SUBSTRING (@cleartext, N, 1)) AS AsciiCtFROM tally)SELECT * FROM lettersWHERE ASCIIct IS NOT NULL
Which gives us this result set:
Column by column, we start with N, which just shows which position which character is in. Then, we have ct, which is our clear text, split letter by letter. The kp column is our key phrase split letter by letter. The next column is the same, save that we use a mod operation for the SUBSTRING() start position rather than N, so that the passphrase repeats again after we reach the end.
AsciiKpOffset is just that repeated key phrase, transformed to lower case, converted from ASCII, and then subtracted from an ASCII ‘a’. This gives us the number of characters to shift based on our key phrase. Now that we’ve calculated this, the rest of our implementation is almost trivial.
The final column is just the ASCII number of our clear text letter.
- The CTE named logic is just my letter-shifting logic as it was described in the previous post. My CASE statement has two WHENs for upper case letters and two for lower case. In each group, one case handles when we have to subtract 26 because our shift ran us off the end of the alphabet, and the other case handles the shift without the subtract.
- Finally, I use the well-documented pseudo-hack of using FOR XML PATH to put all of the cells for all of my rows back into a single record.
The code to decrypt is almost the same. We just subtract the offset instead of add. And this short paragraph is my way of telling you that I didn’t bother to work out that bit of code for myself.
You could easily wrap this into an inline table-valued function. Please don’t use a multi-line table valued function for this, or I’ll have to get back to my earlier rant!
I’ve added this to the GitHub too.