Coding Vigenere Cyphers

Recently, I rewatched The Prestige, the very good Christopher Nolan move where Hugh Jackman and Christian Bale play rival magicians in the late 1800s. Jackman’s character has an old journal, written in cypher text that he slowly and meticulously needs to work out over the course of the movie.

I’ve always been fascinated by encryption, and I remembered that he used a Vigenere Cypher, but I had to look back up how those work.

Ceasar Cypher

Imagine a cypher where you just switch each number an agreed-upon number of letters up the alphabet. If you’re number is 3, then you would substitute D for A, E for B, etc. X would loop around and become A, Y would be B, etc.

This is a Ceasar Cypher, reportedly used by Julius himself, but I bet it didn’t take too long to break that code. If you think about it, the cypher that millions solve in the daily paper is an order of magnitude more difficult.

Enter Vigenere

To make it more difficult, you can shift each letter, but by a different amount for each position. So the first letter might shift by 3 characters, the second letter by 19, the third by 8. Instead of a shared number, you use a key phrase. We do our shift based on each letter’s position in the alphabet, using a = 0, b = 1, etc.

For example, if our key phrase is “ahoy”, then we shift the first number by 0 characters (a = 0), our second character by 7 characters (h = 7), our third character by 14 (o = 14), and our fourth character by 24 (y = 24). After that, we start back with a and shift by zero.

You can see how tedious this would be to decipher by hand, especially with a very long key phrase. And even today, if you avoided dictionary words and had a keyphrase that was very long, I think this would be difficult to decipher.

I thought it would be fun to code this algorithm. When I made that decision, I was on a bit of a Swift kick, so I did it in Swift. If you want to try, you can copy and paste from here, or use Github links that I’ll provide at the end. (I also made an implementation in javascript, so if Swift isn’t your thing, you can jump down just a bit.)

Swift Implementation

Here’s a class with all of the cypher/decipher logic:

And here’s an extension to String that lets you use it on any string:

Then to use it, you just need to apply the new methods to your string:

Javascript Implementation

Once my Swift project was old enough to be almost completely out of mind, I had a client looking for a relatively secure way to share an account password. I wrote up a document explaining how to get and use a PGP certificate key pair, but process is still just too technical. A day or so later, I thought about this algorithm, and figured I could do the same thing using javascript on a web form.

, I also decided that I needed a web form that does the same thing. That meant writing my same functions in javascript. I use an object to keep all of the functions and the keyword together in a namespace.

Then to use it, just set the key path, and call either transformString() or untransformString().

Again, here’s the page where I implemented this code.

Security
This isn’t going to slow down the NSA one bit. If your keyphrase is really short, it’s almost not secure at all. You can improve your security by using only one case (or mixing cases at random), removing spaces and punctation, using very long key phrases, and/or avoiding dictionary words in your key phrases.

It’s probably good enough for sharing a passcode with a client, keeping your kids’ Christmas list secret, or making illicit plans with a date. (I’m talking to young people hiding their plans from parents here. Please don’t use my code to sneak around on your spouse, you creeps!)

Download source code from GitHub.

Let me know if you use any of this for your own fun little project!

Posted in Javascript, Swift | Tagged , , , | Leave a comment

SQL For Fun: Parsing the iTunes XML

What happens if one part of your personality is data-parsing nerd and another part is music lover nerd?

You end up trying to answer questions like, “Which songs did I have rated with 4 stars last year that are lower than that now?”

Or, “How many songs that were added to my library before this year started have been played this year?”

iTunes makes it easy to create smart playlists that can answer a lot of these questions. But iTunes keeps your current play count and your current rating. You can’t get information about the state of your music library in the past and then run queries to see how things have changed.

It does, however, let you export a snapshot of your library data to an XML format, and if we save that XML with time stamps in a SQL database, we can answer these questions ourselves.
Continue reading

Posted in Apple, iTunes, MS SQL, Transact SQL | Leave a comment

Using Tally Table to Remove Invalid XML Characters

A couple of weeks ago, I was called in to troubleshoot an error occurring in a SQL stored procedure. The procedure was selecting a variety of information out as XML data. The error occurred because on of the columns being included was a text data column, which included raw text notes brought in to the system from a variety of sources. Some of these sources included characters that are invalid in XML.

Thinking this was a common issue, I went to google. I found the common solution was something like this:

This code is fine. It works well, and I like using the PATINDEX function to evaluate a string of characters against a REGEX expression. (I have a newly discovered appreciation for REGEX. I should post more about that…)

How Do We Improve This?

One thing sat poorly with me about this solution, though. I don’t like the WHILE loop.

Transact-SQL is built to work on big sets of data all at once. When you use a WHILE loop, you’re forcing the database to work on one piece of information at a time.

Over the last several years, I’ve convinced myself that any data manipulation that your programmer brain wants to do with a WHILE loop can be done more quickly using a Tally table. If you aren’t using Tally tables regularly, go ahead and read from this link. No single article has improved my skill set more than this.

So, I wrote an equivalent function to remove my XML characters.

What Are We Doing, Exactly?

The Common Table Expressions t1, t2, t3, t4, t5, and cteTally just build a Tally table — a result set that spits out numbers in order. In this case, I’ve cross-joined enough rows to return numbers from 1 to 4,294,967,296 which is ridiculous overkill.

The CTE named cteEachChar splits our text into a list of individual characters.

In this subquery, N is the column representing the number from cteTally. The column returned as Ch, then, is the Nth character of our test string. You end up with one character on each row if you select directly out of this CTE.

The WHERE clause uses the LEN function so that there’s no need for the server to evaluate rows up to 4,294,967,296. I would think it helps performance when there’s a short string, but I didn’t test it to see for sure. The more important term in the WHERE clause is our same PATINDEX, which removes those rows where the characters are invalid for XML. Mission complete!

Well, almost complete. We still have to merge our single-character rows back into one big line of characters.

That brings us to our last CTE. Here, we use a trick for combining information from different rows into a single result using the FOR XML construct.

I wish I knew who to whom to credit this construct. I’ve been using it for a long time to combine different rows’ information into a delimited list. (SELECT ‘, ‘ + Ch [text()]), then use SUBSTRING to get rid of the first comma and space.

My original function puts it all back together again, but without my delimiter, it changes my spaces to “ ”. The final select replaces those with the actual space that I want.

No real reason that I specified the column name next to the name of the CTE in this case rather than in the select statement itself. I normally like having column names in the SELECT, because I find that a little easier to read. But I can be flexible with it. 🙃

How Does This Really Perform?

In the data set where I hit this problem, the old script and my new made no discernible performance difference. They were both more or less instant. But if I bundle this up as a function, someone somewhere is going to apply it to millions of rows, each potentially having millions of characters of data. So let’s test with something big.

I created a test string, made up of the sentence, “The quick brown fox jumped over the lazy dogs.” I dropped 800 invalid XML characters into the middle of this, and then replicated that big character string a million times. Then, I added a variable for start time, and a DATEDIFF function at the end so we can see how long it took to run. Here are the final queries and results from my underpowered development machine.

(As with all SQL performance tests, much of this is dependent on hardware and on what else our servers are doing, so we can compare two results, but anyone’s specific result on a specific machine may vary widely.)

The results on my underpowered development box:

Now, the full query and the results on the same test data using the Tally Table method:

Repeated tests showed execution times in the same general area.

It seems like a lot of work to knock out 500 milliseconds, but I look at it as reducing the time by two orders of magnitude. Perhaps someday, this method reduces someone’s job from running into 100 hours down to running in one hour.

Posted in MS SQL, REGEX, Transact SQL | Tagged , , , , | Leave a comment

Using SQL to Mege Three Incomplete Data Sets

You can join data sets. LEFT JOIN, RIGHT JOIN, even OUTER APPLY. There’s nothing to learn here.

Last week, I had what seemed to be a simple join of three data sets. My permutation was a lot more complicated than this, but I had to boil it down to this example in order to think it through.

What makes it tricky is that none of the three sets of data have ALL of the keys that we need in order to represent our full, inclusive result set.

Let’s look at an example.

Table 1:

Key Animal
A aardvark
B bird
C cat

Table 2:

Key Fruit
B banana
A apple
P pear

Table 3:

Key Activity
A act
C climb
P push
R run

And then we’re looking for a result like this:

Key Animal Fruit Action
A aardvark apple act
B bird banana NULL
C cat NULL climb
P NULL pear push
R NULL NULL run

Let’s create our data and figure it out.

We’re going to need a full outer join between the three tables, so let’s start with that.

The key that makes this tricky is in that 2nd outer join. Because our key column might or might not be in either of our first two tables, we have to tell the server to try the join to t1, but also tell it to look at t2 if the key isn’t there in t1.

If we were to go to four tables, we would have to catch all of the permutations in order to be sure our join would work correctly. It would quickly get difficult to read, and I suspect, difficult for the server to perform efficiently.

We can use the COALESCE operator to simplify.

With this syntax, we can just keep adding columns on there as we add tables. It’s not exactly telling server to do the same thing. It’s saying that if t1.col1 matches, join there. Otherwise, move along to t2.col1. But since our keys are common in these sets, the result is exactly what we want.

Speaking of results, we’re not quit there yet.

act

col1 animal col1 fruit col1 activity
A aardvark A apple A
B bird B banana NULL NULL
C cat NULL NULL C climb
NULL NULL P pear P push
NULL NULL NULL NULL R run

Let’s use COALESCE again and put this one to bed.

Bingo!

Key Animal Fruit Action
A aardvark apple act
B bird banana NULL
C cat NULL climb
P NULL pear push
R NULL NULL run
Posted in MS SQL, Transact SQL | Tagged , , , , , , , | Leave a comment

Find DTS and FTP Job Steps From msdb

I started a new full-time contract this month, which is always weird. Not having a full understanding of the project, access rights roadblocks, and unfamiliarity with the code base all team up to make days feel long and unproductive.

One of the things I needed to do is to find where data is regularly getting imported into a database. I came up with this query on msdb to show me scheduled tasks that run FTP commands or DTS packages and had been run during this calendar year.

[/crayon]
 
The items in the where clause are pretty much self-explainitory and can be changed to meet the need of any SQL detective. It’s one I’ll keep in my toolbox for a while, I suspect.

Aside: what’s with the 1 = 1 at the top of the WHERE clause?

If I’m doing a lot of commenting and uncommenting of conditions, I like to just start my WHEREs with 1 = 1. That way, I don’t have to worry about whether or not I need to include the AND when I comment something. I can just start dropping “–” at the start of each line and not think about it any more than that.

Posted in MS SQL | Tagged , , | Leave a comment

Logical Tests Using OR in Crystal Reports

Consider these two formulas in Crystal Reports:

[/crayon]

[/crayon]

One would think that these two should behave the same, right?

In fact, the results are different if Table.Value is null.

In the first case, “if isnull({Table.Value}) gets evaluated first. Since that’s true, and the other condition is connected with an OR, the system (apparently) doesn’t evaluate the other condition. You get “Result 1”, which is what I would want and expect.

In the second case, tonumber({Table.Value}) gets evaluated first. The tonumber() function on a null value returns neither true nor false. That makes sense, I guess, but you can’t evaluate a null in an OR and get a true or false either. So for this formula, neither result is presented.

Admittedly, I hit this using an embarassingly old version of Crystal Reports. I’ll have to test to see if the same thing happens on something more modern.

But for now, the lesson is to put isnull() parts at the beginning of logical tests.

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Update Across Joined Tables in MySQL

I had to do a MySQL update on records that were retrieved from a two-table join today. I shouldn’t have been surprised to find that I needed a different syntax than I know from MS SQL, where I would write,

[/crayon]

I always thought this was a goofy syntax, but I’ve gotten used to it. If it isn’t clear, I’m finding all of the records in fTime that belong to a particular record in parent table fProject with the conditions that the WorkDate is before a particular date and the ExportStatus is 0. Those are just columns in fTime. They aren’t particularly important to this discussion.

To do the same thing in MySql, I have to write,

[/crayon]

Now that I look at it, the MySql syntax is a lot more intuitive to me. Too bad I’ll never remember it, since I use MySql so much less than I use MS SQL.

Of course, with MySql, I have to be really specific with my date format too. But that’s another topic for another day.

Posted in MS SQL, mysql, Transact SQL | Tagged , , , , | Leave a comment

Take Control of Mac File Paths

Since I added a Mac to my repertoire of workstations, I’m getting better and better about finding quick and easy ways to get around. Sooner or later, I’ll be just as efficient on my Mac as I am on my PCs.

One thing that I do to save a lot of time on the PC is to copy paths from the Explorer address bar, and paste them in on other places. I didn’t have a quick way to do anything similar on my Mac.

Today, I created a Service Workflow to help me out. (I’m not 100% sure about what makes something a workflow service yet, but this worked.)

From Automater, select New, and then select Service. Then, just make your settings like this:

You’ll probably need to click on the image to see it full-sized.

Then, chose Save, and name it something like “Copy Path to Clipboard.”

After that, you can right-click any file or folder, move the arrow down to the Services menu, and “Copy Path to Clipboard” is right there for you.

To do the reverse — open a box where you can type a known path — just use Command + Shift + G. I’m not nuts about having to memorize keyboard shortcuts, but I’m starting to get them, little by little.

Command + Shift + G even works in an “Open File” dialog in any app that you happen to be using. You don’t have to actually be in Finder itself.

Posted in Apple, OS X | Tagged , , , , , | Leave a comment

Easy Foreign Key Drop and Restore for SQL Server

From time to time, we’ve all had a database query to run that was stymied by foreign key constraints.

In my case, it was a request to restore a table’s values from one copy of the system to another. We knew that after the information was refreshed, there wouldn’t be any key violations. But SQL server was coded not to trust us, so I couldn’t even run the truncate statement to get rid of the original data.

Of course you want the foreign keys to exist. It’s the best way to prevent buggy code from creating situations where rows that should point to a parent point to nothing at all. If you have an Orders table, and that table has a column with an ID number that points to a Customers table, you can’t let your software delete the Customers record, because then you have Orders for nobody.

But once in a while, the king of the database needs to override the rules in order to get something done.

Unfortunately, there’s no “super delete” function that lets you bypass the constraint.

I decided to write a stored procedure that would save the relevant information about my constraints, drop them, and restore them when I was ready to have them back. I leaned heavily on the work posted by Sushant Saha on DatabaseJournal.com.

I ended up with a stored procedure with usage like this:

[/crayon]

The first parameter accepts any number of table names. I’ll warn you now that the way I wrote this means that it won’t work for table names that has spaces in them. (No one does that though, do they?)

The second parameter is ‘D’ to drop constraints or ‘R’ to restore them. That way, I can run the procedure to drop them, do my other work, then flip the D to an R, and run it again to restore.

I can make it this easy because the procedure also leaves behind a table named dbo.FKeys, which displays an easily readable representation of the foreign key contstratints in the database. Columns show the name of the constraint, names of primary and foreign tables, and names of primary and foreign columns.

The final parameter can be flipped to a 1 to display the ALTER TABLE statements that will be applied rather than actually applying them. You would be wise to run it like this before letting it fly from scratch.

Running this with a 1 in the last parameter will also update FKeys, so if you’ve applied an update that may have added foreign key constraints, you can refresh the user-friendly readout of them without chaning anything in your database structure by running it with a 1 at the end.

I took this exercise out to four columns, becase I was sure that none of my primary keys need more than that. It’s easily adapted to more, though, if you have such needs.

The code for the procedure is below.

[/crayon]

One would be well advised to be cautious playing with this stuff from scripts copied from other people’s websites, but I hope its either educational or helpful to someone, somewhere down the road.

Posted in Database Design, MS SQL, Transact SQL | Tagged , , , , , , | Leave a comment

Assigning Variables When No Row Exists

A SQL Server tidbit that I never knew before today. This was a test on SQL 2000. I’d be interest to know if you get the same result on something more current.

[Update: still just like this on SQL 2008]

What do you suppose this returns?

[/crayon]
A – NULL
B – asfdfs
C – an error message
D – OrderName

Correct Answer: B.

I would have expected A. It turns out that I had a variable assignment in a loop, and from time to time, it wasn’t getting assigned a new value. The problem was on rows where the select statement returns no rows. In that case, my varible still had the value from its previous iteration.

Posted in MS SQL, Transact SQL | Leave a comment