Sarah,
I couldn't decide whether to post my response in your thread, or the one
that Peter started regarding "Bad Ways to Code Data". But, since my
findings may be of help to you, decided that this would be the best place
for it.
Someone had asked me, a few weeks ago, how one might deal with a situation
where they received an Excel file that was created in two parts: one on a
system where mmddyyyy was used as the system date format and the other on
a system where ddmmyyyy had been used.
Then, just yesterday, I was shown the worst set of Excel date entries I
had ever seen. What had happend, basically, is that various members of a
secretarial pool were asked to help fill in this spreadsheet whenever
their time came to provide a break for the receptionist. One of the
column's was for date and it was a mixture of almost every date type
imaginable, including text (with various abbreviations), mmddyyyy,
ddmmyyyy, and numerous others.
The one saving grace was that, except for the text fields, some type of
date format had actually been applied.
Interestingly, proc import could correctly read all of the date formatted
fields and only choked on the dates that had been entered as text.
However, if one had to export the spreadsheet as a comma-delimited file,
SAS had no clue.
That got me thinking that Excel must convey the format information, if it
knows it, to SAS. The answer I found was an Excel function:
cell("format",some_cell_reference_e.g.,_a2)
Thus, given the following dates in a spreadsheet (that actually represent
the period between 1JAN2009 and 13JAN2009, inclusive):
1/1/2009
2/1/2009
3/1/2009
4/1/2009
5/1/2009
1/6/2009
1/7/2009
1/8/2009
1/9/2009
10JAN2009
11-Jan-09
12-Jan-09
"January 13, 2009"
one could create a new column using the cell function, thus providing a
csv file that looked like:
date,format
1/1/2009,D1
2/1/2009,D1
3/1/2009,D1
4/1/2009,D1
5/1/2009,D1
1/6/2009,D4
1/7/2009,D4
1/8/2009,D4
1/9/2009,D4
10JAN2009,D1
11-Jan-09,D1
12-Jan-09,D1
"January 13, 2009",G
The codes in the format column represent:
G=general
D1=d-mmm-yy or dd-mmm-yy
D4=m/d/yy or m/d/yy h:mm or mm/dd/yy
With that additional information the solution was easy:
data test;
infile "k:\art\DateTest.csv" dsd dlm=","
lrecl=32768 missover firstobs=2;
informat rawdate $20.;
informat format $2.;
input rawdate format;
format date date9.;
if anyalpha(rawdate) then
Date=inputn (rawdate , 'anydtdte' , 20 );
else if format eq 'D1' then
Date=inputn (rawdate , 'ddmmyy' , 10 );
else if format eq 'D4' then
Date=inputn (rawdate , 'mmddyy' , 10 );
run;
Hopefully you don't need this, as the solutions already offered looked
like they included all that you needed.
If not, hope this helps. More can be found regarding the cell function at:
http://www.techonthenet.com/excel/formulas/cell.php
Also, Koen Vverman has an excellent SAS-L post that shows how one can
accomplish getting and using such functions in the form of an old-style
Excel macro via DDE:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0209A&L=sas-l&P=12088
Art
--------
On Mon, 23 Nov 2009 13:15:46 -0600, Pashi, Arthur
<Arthur.Pashi@DHSS.MO.GOV> wrote:
>Hi, Sarah,
>
>Some other suggestions or data clean up to perform within EXCEL prior to
>turning to SAS. Please see below.
>
>Arthur K. Pashi
>
>-----Original Message-----
>From: Peters, Lisa
>Sent: Monday, November 23, 2009 1:07 PM
>To: Pashi, Arthur
>Subject: RE: Question about read in Excel date data
>
>Hey Arthur,
>
>Does Sarah know that she could change the format of the cells that are
>in the incorrect format by
>1-Highlighting all incorrectly formatted cells
>2-Right clicking on them
>3-Clicking "Format Cells"
>4-And then (if "Date" is not selected) select "Date"
>5-Then select the format you wish the cells to be in.
>
>I don't know how many Sarah has, or if she has fixed the problem, but
>that would be the easiest solution. Naturally I'm only an H&I typist, so
>I'm not very aware of what all she is working with, but I am proficient
>in Excel. If she has any problems with excel data, feel free to have her
>contact me.
>
>Thanks!
>
>Lisa J. Peters
>
>-----Original Message-----
>From: Pashi, Arthur
>Sent: Monday, November 23, 2009 8:18 AM
>To: Peters, Lisa
>Subject: FW: Question about read in Excel date data
>
>Good morning, Lisa,
>
>I am forwarding this item to you to let you know how your work is very
>important. Someone down the line may face challenges using the data you
>code or enter in a given database or spreadsheet.
>
>Have a great day,
>
>Arthur K. Pashi
>
>-----Original Message-----
>From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>Sarah Cox
>Sent: Monday, November 23, 2009 8:13 AM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Question about read in Excel date data
>
>HI, all,
>
>Just received a excel file with a column named getdate. The problem is
>at
>the first 500 rows the format for the date is mm/dd/yyyy(01/29/2007).
>But
>the format for the rest of the rows is dd/mm/yyyy(29/01/2007) . Tried to
>read into SAS, but looks like SAS treat the value with format dd/mm/yyyy
>as
>missing, so all the value with dd/mm/yyyy can't be read in correctly.
>
>Any valuable suggestion on how to deal with this will be really
>appreciated!
>
>Thanks
>
>Sarah
|