Monday 19 April 2010

Two SQL tips

Here's something that I've overlooked but naturally managed to get right (most of the time) when writing SQL commands - the order of the statements. This is how it should be done:

select ...
    from ...
        where ...
            group by ...
                having ...
                    order by ...;

It might seem obvious but it is important to remember it. Here are two sayings that make it easy to remember the order:

Some French Workers Glue Hardwood Often.
San Francisco, Where The Greatful Dead Heads Originated.
 
In case you're wondering, I didn't come up with those.
 
Another tip with SQL, is that when you use UNION, the statements coming after the WHERE clause must come after the last SQL SELECT and WHERE statement. This is how to write it properly:
 
SELECT id, name, password, 'Student' AS position FROM students
WHERE username LIKE '%ar%'
UNION
SELECT id, name, password, 'Staff' AS position FROM staff
WHERE username LIKE '%ar%' 
ORDER BY position;
 
This makes the whole dataset ordered by position. If you try to order individual SELECTs it will throw an error.

Wednesday 14 April 2010

Some SAS Snippets

Don't show the current date on reports:

options nodate;

To load/import SAS datasets (sas7bdat files) you tell SAS about the library (folder) which they're in:

libname Morty "Z:\path\to\Dummy Data\";

This will create a library icon that will show up on the left in the library explorer. I want to copy the files to my work folder using more manageable names:

data Dec09;
    set Morty.cybdec09;
run;
data Jan10;
    set Morty.cybjan10;
run;
data Feb10;
    set Morty.cybfeb10;
run;

Next, I plan to merge the datasets into a single table by a mutual variable (column) name, Account_no, but I need to sort the datasets by that before merging.

proc sort data=Dec09;
    by Account_no;
run;
proc sort data=Jan10;
    by Account_no;
run;
proc sort data=Feb10;
    by Account_no;
run;


data allmort;
    merge Dec09 Jan10 Feb10;
    by Account_no;
run;

What if I want to insert Datalines as CSV (delimited by commas) by hand instead of the horrible default manual space-padded SAS format:

data trydel;
    input usrname $ finger_count;
    infile datalines delimiter=",";
    datalines;
Adam, 10
Bob, 11
Chris, 9
Dan, 10
Eddy, 12
Fred, 10
Gary, 7
Hugh, 5
Ian, 10
Joe, 10
Kev, 10
Lee, 10
Mike, 9
Norm, 10
Otis, 8
Pat, 9
;
run;
 
Sort that data by a variable (column):
 
proc sort data=trydel;
    title "Everybody by finger count";
    by DESCENDING finger_count DESCENDING usrname;
run;

SAS proc sort-by arguments are not seperated with commas and you should type DESCENDING in full and before the variable (column) name, unlike SQL where DESC and ASC are sufficient as acronyms and it comes after the variable name. Also the SAS keyword ASCENDING doesn't exist! It sorts ascending by default.

Make a standard report without SQL:

proc print data=trydel noobs;
    title "More than 8 Fingers";
    where finger_count > 8;
run;

"Noobs" (NO OBServation count) makes sure that the SAS report doesn't show a row count next to the data.

Now use SQL to get the data you need:

proc sql;
    title "SQL Where Names gt Bob";
    SELECT usrname, finger_count format=z2.
    FROM Work.trydel WHERE usrname > "Bob"
    ORDER BY finger_count DESC, usrname DESC;
quit;

Note that the library name Work (or wherever your dataset is), has to be explicitly named before the dataset (table) name, like so: Work.dataset. (Library.Table).
Also note the "format=z2." part. It formats the number with leading zeros, so instead of showing 7, 9, 11, 10, it will show 07, 09, 11, 10. Of course, you need to find out how many digits to set, so if you wanted to show more leading zeros, put a larger number after z.

Another thing that is different is the string length checking SQL function, often named STRLEN in other diallects. In SAS it's just "LENGTH". Usage: SELECT * FROM table WHERE LENGTH(usrname) > 2.

Other useful Proc SQL functions include IS NULL / IS NOT NULL and IS MISSING / IS NOT MISSING.

This is a good resource: http://en.wikiversity.org/wiki/Data_Analysis_using_the_SAS_Language

Monday 5 April 2010

I have a job now!

I am employed now! After what feels like forever, about a year and a half of searching! I'm well pleased. It's at the Leeds Yorkshire Bank head offices. Thankfully, they had faith in this graduate and gave him a chance - more organizations aught to do that with graduates.