SQL For Fun: Parsing the iTunes XML

What happens if one part of your personality is data-parsing nerd and another part is music lover nerd?

You end up trying to answer questions like, “Which songs did I have rated with 4 stars last year that are lower than that now?”

Or, “How many songs that were added to my library before this year started have been played this year?”

iTunes makes it easy to create smart playlists that can answer a lot of these questions. But iTunes keeps your current play count and your current rating. You can’t get information about the state of your music library in the past and then run queries to see how things have changed.

It does, however, let you export a snapshot of your library data to an XML format, and if we save that XML with time stamps in a SQL database, we can answer these questions ourselves.
Continue reading

Posted in Apple, iTunes, MS SQL, Transact SQL | Leave a comment

Using Tally Table to Remove Invalid XML Characters

A couple of weeks ago, I was called in to troubleshoot an error occurring in a SQL stored procedure. The procedure was selecting a variety of information out as XML data. The error occurred because on of the columns being included was a text data column, which included raw text notes brought in to the system from a variety of sources. Some of these sources included characters that are invalid in XML.

Thinking this was a common issue, I went to google. I found the common solution was something like this:

This code is fine. It works well, and I like using the PATINDEX function to evaluate a string of characters against a REGEX expression. (I have a newly discovered appreciation for REGEX. I should post more about that…)

How Do We Improve This?

One thing sat poorly with me about this solution, though. I don’t like the WHILE loop.

Transact-SQL is built to work on big sets of data all at once. When you use a WHILE loop, you’re forcing the database to work on one piece of information at a time.

Over the last several years, I’ve convinced myself that any data manipulation that your programmer brain wants to do with a WHILE loop can be done more quickly using a Tally table. If you aren’t using Tally tables regularly, go ahead and read from this link. No single article has improved my skill set more than this.

So, I wrote an equivalent function to remove my XML characters.

What Are We Doing, Exactly?

The Common Table Expressions t1, t2, t3, t4, t5, and cteTally just build a Tally table — a result set that spits out numbers in order. In this case, I’ve cross-joined enough rows to return numbers from 1 to 4,294,967,296 which is ridiculous overkill.

The CTE named cteEachChar splits our text into a list of individual characters.

In this subquery, N is the column representing the number from cteTally. The column returned as Ch, then, is the Nth character of our test string. You end up with one character on each row if you select directly out of this CTE.

The WHERE clause uses the LEN function so that there’s no need for the server to evaluate rows up to 4,294,967,296. I would think it helps performance when there’s a short string, but I didn’t test it to see for sure. The more important term in the WHERE clause is our same PATINDEX, which removes those rows where the characters are invalid for XML. Mission complete!

Well, almost complete. We still have to merge our single-character rows back into one big line of characters.

That brings us to our last CTE. Here, we use a trick for combining information from different rows into a single result using the FOR XML construct.

I wish I knew who to whom to credit this construct. I’ve been using it for a long time to combine different rows’ information into a delimited list. (SELECT ‘, ‘ + Ch [text()]), then use SUBSTRING to get rid of the first comma and space.

My original function puts it all back together again, but without my delimiter, it changes my spaces to “ ”. The final select replaces those with the actual space that I want.

No real reason that I specified the column name next to the name of the CTE in this case rather than in the select statement itself. I normally like having column names in the SELECT, because I find that a little easier to read. But I can be flexible with it. 🙃

How Does This Really Perform?

In the data set where I hit this problem, the old script and my new made no discernible performance difference. They were both more or less instant. But if I bundle this up as a function, someone somewhere is going to apply it to millions of rows, each potentially having millions of characters of data. So let’s test with something big.

I created a test string, made up of the sentence, “The quick brown fox jumped over the lazy dogs.” I dropped 800 invalid XML characters into the middle of this, and then replicated that big character string a million times. Then, I added a variable for start time, and a DATEDIFF function at the end so we can see how long it took to run. Here are the final queries and results from my underpowered development machine.

(As with all SQL performance tests, much of this is dependent on hardware and on what else our servers are doing, so we can compare two results, but anyone’s specific result on a specific machine may vary widely.)

The results on my underpowered development box:

Now, the full query and the results on the same test data using the Tally Table method:

Repeated tests showed execution times in the same general area.

It seems like a lot of work to knock out 500 milliseconds, but I look at it as reducing the time by two orders of magnitude. Perhaps someday, this method reduces someone’s job from running into 100 hours down to running in one hour.

Posted in MS SQL, REGEX, Transact SQL | Tagged , , , , | Leave a comment

Using SQL to Mege Three Incomplete Data Sets

You can join data sets. LEFT JOIN, RIGHT JOIN, even OUTER APPLY. There’s nothing to learn here.

Last week, I had what seemed to be a simple join of three data sets. My permutation was a lot more complicated than this, but I had to boil it down to this example in order to think it through.

What makes it tricky is that none of the three sets of data have ALL of the keys that we need in order to represent our full, inclusive result set.

Let’s look at an example.

Table 1:

Key Animal
A aardvark
B bird
C cat

Table 2:

Key Fruit
B banana
A apple
P pear

Table 3:

Key Activity
A act
C climb
P push
R run

And then we’re looking for a result like this:

Key Animal Fruit Action
A aardvark apple act
B bird banana NULL
C cat NULL climb
P NULL pear push
R NULL NULL run

Let’s create our data and figure it out.

We’re going to need a full outer join between the three tables, so let’s start with that.

The key that makes this tricky is in that 2nd outer join. Because our key column might or might not be in either of our first two tables, we have to tell the server to try the join to t1, but also tell it to look at t2 if the key isn’t there in t1.

If we were to go to four tables, we would have to catch all of the permutations in order to be sure our join would work correctly. It would quickly get difficult to read, and I suspect, difficult for the server to perform efficiently.

We can use the COALESCE operator to simplify.

With this syntax, we can just keep adding columns on there as we add tables. It’s not exactly telling server to do the same thing. It’s saying that if t1.col1 matches, join there. Otherwise, move along to t2.col1. But since our keys are common in these sets, the result is exactly what we want.

Speaking of results, we’re not quit there yet.

act

col1 animal col1 fruit col1 activity
A aardvark A apple A
B bird B banana NULL NULL
C cat NULL NULL C climb
NULL NULL P pear P push
NULL NULL NULL NULL R run

Let’s use COALESCE again and put this one to bed.

Bingo!

Key Animal Fruit Action
A aardvark apple act
B bird banana NULL
C cat NULL climb
P NULL pear push
R NULL NULL run
Posted in MS SQL, Transact SQL | Tagged , , , , , , , | Leave a comment

Find DTS and FTP Job Steps From msdb

I started a new full-time contract this month, which is always weird. Not having a full understanding of the project, access rights roadblocks, and unfamiliarity with the code base all team up to make days feel long and unproductive.

One of the things I needed to do is to find where data is regularly getting imported into a database. I came up with this query on msdb to show me scheduled tasks that run FTP commands or DTS packages and had been run during this calendar year.

[/crayon]
 
The items in the where clause are pretty much self-explainitory and can be changed to meet the need of any SQL detective. It’s one I’ll keep in my toolbox for a while, I suspect.

Aside: what’s with the 1 = 1 at the top of the WHERE clause?

If I’m doing a lot of commenting and uncommenting of conditions, I like to just start my WHEREs with 1 = 1. That way, I don’t have to worry about whether or not I need to include the AND when I comment something. I can just start dropping “–” at the start of each line and not think about it any more than that.

Posted in MS SQL | Tagged , , | Leave a comment

Logical Tests Using OR in Crystal Reports

Consider these two formulas in Crystal Reports:

[/crayon]

[/crayon]

One would think that these two should behave the same, right?

In fact, the results are different if Table.Value is null.

In the first case, “if isnull({Table.Value}) gets evaluated first. Since that’s true, and the other condition is connected with an OR, the system (apparently) doesn’t evaluate the other condition. You get “Result 1”, which is what I would want and expect.

In the second case, tonumber({Table.Value}) gets evaluated first. The tonumber() function on a null value returns neither true nor false. That makes sense, I guess, but you can’t evaluate a null in an OR and get a true or false either. So for this formula, neither result is presented.

Admittedly, I hit this using an embarassingly old version of Crystal Reports. I’ll have to test to see if the same thing happens on something more modern.

But for now, the lesson is to put isnull() parts at the beginning of logical tests.

Posted in Uncategorized | Tagged , , , , , , , , , | Leave a comment

Update Across Joined Tables in MySQL

I had to do a MySQL update on records that were retrieved from a two-table join today. I shouldn’t have been surprised to find that I needed a different syntax than I know from MS SQL, where I would write,

[/crayon]

I always thought this was a goofy syntax, but I’ve gotten used to it. If it isn’t clear, I’m finding all of the records in fTime that belong to a particular record in parent table fProject with the conditions that the WorkDate is before a particular date and the ExportStatus is 0. Those are just columns in fTime. They aren’t particularly important to this discussion.

To do the same thing in MySql, I have to write,

[/crayon]

Now that I look at it, the MySql syntax is a lot more intuitive to me. Too bad I’ll never remember it, since I use MySql so much less than I use MS SQL.

Of course, with MySql, I have to be really specific with my date format too. But that’s another topic for another day.

Posted in MS SQL, mysql, Transact SQL | Tagged , , , , | Leave a comment

Take Control of Mac File Paths

Since I added a Mac to my repertoire of workstations, I’m getting better and better about finding quick and easy ways to get around. Sooner or later, I’ll be just as efficient on my Mac as I am on my PCs.

One thing that I do to save a lot of time on the PC is to copy paths from the Explorer address bar, and paste them in on other places. I didn’t have a quick way to do anything similar on my Mac.

Today, I created a Service Workflow to help me out. (I’m not 100% sure about what makes something a workflow service yet, but this worked.)

From Automater, select New, and then select Service. Then, just make your settings like this:

You’ll probably need to click on the image to see it full-sized.

Then, chose Save, and name it something like “Copy Path to Clipboard.”

After that, you can right-click any file or folder, move the arrow down to the Services menu, and “Copy Path to Clipboard” is right there for you.

To do the reverse — open a box where you can type a known path — just use Command + Shift + G. I’m not nuts about having to memorize keyboard shortcuts, but I’m starting to get them, little by little.

Command + Shift + G even works in an “Open File” dialog in any app that you happen to be using. You don’t have to actually be in Finder itself.

Posted in Apple, OS X | Tagged , , , , , | Leave a comment

Easy Foreign Key Drop and Restore for SQL Server

From time to time, we’ve all had a database query to run that was stymied by foreign key constraints.

In my case, it was a request to restore a table’s values from one copy of the system to another. We knew that after the information was refreshed, there wouldn’t be any key violations. But SQL server was coded not to trust us, so I couldn’t even run the truncate statement to get rid of the original data.

Of course you want the foreign keys to exist. It’s the best way to prevent buggy code from creating situations where rows that should point to a parent point to nothing at all. If you have an Orders table, and that table has a column with an ID number that points to a Customers table, you can’t let your software delete the Customers record, because then you have Orders for nobody.

But once in a while, the king of the database needs to override the rules in order to get something done.

Unfortunately, there’s no “super delete” function that lets you bypass the constraint.

I decided to write a stored procedure that would save the relevant information about my constraints, drop them, and restore them when I was ready to have them back. I leaned heavily on the work posted by Sushant Saha on DatabaseJournal.com.

I ended up with a stored procedure with usage like this:

[/crayon]

The first parameter accepts any number of table names. I’ll warn you now that the way I wrote this means that it won’t work for table names that has spaces in them. (No one does that though, do they?)

The second parameter is ‘D’ to drop constraints or ‘R’ to restore them. That way, I can run the procedure to drop them, do my other work, then flip the D to an R, and run it again to restore.

I can make it this easy because the procedure also leaves behind a table named dbo.FKeys, which displays an easily readable representation of the foreign key contstratints in the database. Columns show the name of the constraint, names of primary and foreign tables, and names of primary and foreign columns.

The final parameter can be flipped to a 1 to display the ALTER TABLE statements that will be applied rather than actually applying them. You would be wise to run it like this before letting it fly from scratch.

Running this with a 1 in the last parameter will also update FKeys, so if you’ve applied an update that may have added foreign key constraints, you can refresh the user-friendly readout of them without chaning anything in your database structure by running it with a 1 at the end.

I took this exercise out to four columns, becase I was sure that none of my primary keys need more than that. It’s easily adapted to more, though, if you have such needs.

The code for the procedure is below.

[/crayon]

One would be well advised to be cautious playing with this stuff from scripts copied from other people’s websites, but I hope its either educational or helpful to someone, somewhere down the road.

Posted in Database Design, MS SQL, Transact SQL | Tagged , , , , , , | Leave a comment

Assigning Variables When No Row Exists

A SQL Server tidbit that I never knew before today. This was a test on SQL 2000. I’d be interest to know if you get the same result on something more current.

[Update: still just like this on SQL 2008]

What do you suppose this returns?

[/crayon]
A – NULL
B – asfdfs
C – an error message
D – OrderName

Correct Answer: B.

I would have expected A. It turns out that I had a variable assignment in a loop, and from time to time, it wasn’t getting assigned a new value. The problem was on rows where the select statement returns no rows. In that case, my varible still had the value from its previous iteration.

Posted in MS SQL, Transact SQL | Leave a comment

Tracing MySql Transactions

Anyone who has had to troubleshoot or support systems built on a SQL Server database knows and loves the Profiler Tool, which allows you to see all of the queries submitted to a database.

Web page database interaction tends to be simpler, and until recently, I never had to troubleshoot a situation that was so complex that I needed to trace its database interactions. This week, though, I was looking at a function to save some records. I had a print_r at the top to prove to myself that the records were there, and I saw other records in the list in the database after the run, but my last ones were missing.

It had to be that the records were being deleted behind the insert. I needed a trace to figure out what was coming behind me to delete perfectly good records.

It turns out to be pretty easy. First, I edited the configuration file for mysql, which I found in /etc/mysql/my.cnf. (All of my work was on an Ubuntu server running both Apache and MySql. Needless to say, there are other environments where paths and file names are different.)

I found this section of the config file:

[/crayon]

I changed the general_log value from 0 to 1, and restarted mysql.

[/crayon]

I then ran my page, and opened the log file named in the configuration: /var/log/mysql/mysql.log.

[/crayon]

Posted in Linix/Apache, mysql, php, Web Apps | Tagged , , , , , , , , | Leave a comment