Wednesday, 29 September 2010

Difficult SAS issue - restructuring datasets with proc transpose

I've been trying to solve this problem for a few hours and have tried many different things. In the end, it was two sets of a process that solved the problem with proc transpose. It's worth documenting so I'm blogging it.

The Problem:

I have an example dataset that contains observations like so:


(dataset: dayobs)
days_range,    value,    count
0-9,    300,    6
60-69,    250,    4
300-309,    76,    1

I wanted it to show all the ranges with value and count set to zero if it wasn't already set:

days_range,    value,    count
0-9,    300,    6
10-19,     0,     0
20-29,     0,     0
30-39,     0,     0
40-49,     0,     0
50-59,     0,     0
60-69,    250,    4
... etc ... 
290-299,    0,    0
300-309,    76,    1

That was easy, simply make a dataset that lists the day_range from 1-10 ... 390-400.

data ranges;
    input days_range $;
    format days_range $7.;
    datalines;
0-9
10-19
... etc ...
    ;
run;

and then merge both the datasets - dayobs and ranges - to create a new dataset called daysrange:

data daysrange;
    merge ranges dayobs;
    by days_range;
run;


OK. Now, I wanted to transpose it into a single row so I can insert that into a big dataset that logs the changes every day. I want to make it look like so:

v1_9, c1_9, v10_19, c10_19, v20_29, c20_29, ...etc... v60_69, c60_69, ...etc
300, 6, 0, 0, 0, 0, ...etc... 250, 4, ...etc

It might seem obvious what to do but I tried to be clever and did this:

data rowifieddaysrange;
    set daysrange;


    if days_range='0-9' then do;
        c1_9=count;
        v1_9=value;
    end;
    else if days_range='10-19' then do;
        c10_19=count;
        v10_19=value;
    end;
 etc...


keep c1_9 v1_9 c10_19 v10_19 etc...;
run;

and this is what the data looked a bit like in the end:

0, 0, 0, 0, 0, 300, 0, 0, 0, 0, 0, 0, 6
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
250, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0
etc.

Basically, it had the correct columns, but they were spaced out across different rows. No matter what I tried - I look at all the different sas procs and their options - I was unable to collapse/compact it into one row... if anyone knows the solution to this problem I'd really like to see it please. Ta.

OK. So guess what the solution is... It turns out you need to use two sets of proc transpose and a couple of merges, to deal with each variable individually - then finally merging the two separate datasets together at the end! :


*transpose the daysrange dataset first by count;
proc transpose data=daysrange out=tdaysrange(drop=_:) prefix=c;
    id days_range;
    var count;
run;
*now transpose the table with all the categories;
proc transpose data=ranges out=tranges(drop=_:) prefix=c;

    id days_range;
run;
*now merge them;
data cdataset;
    merge tranges tdaysrange;
run;


*transpose the daysrange dataset next by value;
proc transpose data=daysrange out=vdaysrange(drop=_:) prefix=v;
    id days_range;
   var value;
run;
*now transpose the table with all the categories;
proc transpose data=ranges out=vranges(drop=_:) prefix=v;
    id days_range;
run;
*now merge them;
data vdataset;
    merge vranges vdaysrange;
run;


*now join the the value and count datasets into one dataset!
data rowifieddaysrange;
    merge vdataset cdataset;
run;

And finally it's all in one row that looks something like this:


v1_9, c1_9, v10_19, c10_19, v20_29, c20_29, ...etc... v60_69, c60_69, ...etc
300, 6, 0, 0, 0, 0, ...etc... 250, 4, ...etc

That's a lot of work for something that aught to be straightforward!

No comments:

Post a Comment