f



Reading in multiple raw data files with headers

Hello all,

I am trying to read in and stack multiple raw data files (file1, file2
etc)which contain the same variables (var1, var2 etc) but refer to
different time periods. I also want to add a variable to the final dataset
so that I can tell which file an observation originally came from. The
following program works fine when I try to read in text files without a
line of variable names at the top (omitting the firstobs=2 option).
However, I get 'invalid data' error messages when I try to read in
tab-delimited text files with a line of variable names at the top.

data allfiles;
    infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile missover;
    input var1 var2 var3 var4 var5;
    retain time 1;
    if newfile then do;
         newfile=0;
         time+1;
    end;
run;

The error message refers to the first line of each of file2, file3 etc.I
conclude that SAS only applies the firstobs=2 observation ONCE when reading
in and stacking multiple files. Is there any way that I can fix this
problem apart from going through the text files and manually deleting the
line of variable names for each file?

I'm using SAS version 8.2.

Many thanks,

Katherine
0
5/19/2006 4:36:23 AM
comp.soft-sys.sas 142828 articles. 3 followers. Post Follow

5 Replies
366 Views

Similar Articles

[PageSpeed] 19

Kat

one way of many:

- use a filevar= variable in the infile staement that points to a list of 
raw input files, with firstobs=2 also added
- separate the infile statment from an input statement nested within a do 
loop

* example *

* there are 3 raw text files you want to concatenate, each has a variable 
name header row, all 3 are stored in the same windows folder
* simplifying assumption --  file paths are stored in a sas data set

a1.txt
-----
  x     y
100 200
  3     4

a2.txt
-----
  x    y
  5    6
700  800

a3.txt
-----
   x    y
100 900
   8  9

* file paths *
data rawdata;
length fnames $21;
input fnames&;
cards;
C:\sas testing\a1.txt
C:\sas testing\a2.txt
C:\sas testing\a3.txt
;

data stacked;
set rawdata;
infile in filevar = fnames firstobs=2 end = lastfile;
do until (lastfile);
   input x y;
   source=scan(fnames,3,'\');
   output;
end;

* result *

  x        y          source
----------------------
100   200        a1.txt
   3      4          a1.txt
   5      6          a2.txt
700   800        a2.txt
100   900        a3.txt
  8       9          a3.txt

Erico

"That's right, I'm a math person, I'm not one of the
4 out of 3 people who have trouble with fractions"




"Katherine Smith" <katherine.smith@MCRI.EDU.AU> wrote in message 
news:5.1.0.14.2.20060519141524.00bab320@mail.mcri.edu.au...
> Hello all,
>
> I am trying to read in and stack multiple raw data files (file1, file2
> etc)which contain the same variables (var1, var2 etc) but refer to
> different time periods. I also want to add a variable to the final dataset
> so that I can tell which file an observation originally came from. The
> following program works fine when I try to read in text files without a
> line of variable names at the top (omitting the firstobs=2 option).
> However, I get 'invalid data' error messages when I try to read in
> tab-delimited text files with a line of variable names at the top.
>
> data allfiles;
>    infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile 
> missover;
>    input var1 var2 var3 var4 var5;
>    retain time 1;
>    if newfile then do;
>         newfile=0;
>         time+1;
>    end;
> run;
>
> The error message refers to the first line of each of file2, file3 etc.I
> conclude that SAS only applies the firstobs=2 observation ONCE when 
> reading
> in and stacking multiple files. Is there any way that I can fix this
> problem apart from going through the text files and manually deleting the
> line of variable names for each file?
>
> I'm using SAS version 8.2.
>
> Many thanks,
>
> Katherine 


0
eeyre (54)
5/19/2006 7:15:05 AM
Dear Katherine,

Departing from your code I would propose the following *untested* amendment:

data allfiles;
  infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile missover;
  retain time 1;
  if newfile then do; * First record of each next, not first file;
    newfile=0; * Is this really necessary? Not auto-0 next record? ;
    time+1;
    INPUT; * Skip each first line of new file, no variables specified;
  end;
  ELSE input var1 var2 var3 var4 var5; * Read the variables as desired;
run;

Try this and tell us whether it works.

Regards - Jim.
--
Jim Groeneveld, Netherlands
Statistician, SAS consultant
home.hccnet.nl/jim.groeneveld


On Fri, 19 May 2006 14:36:23 +1000, Katherine Smith
<katherine.smith@MCRI.EDU.AU> wrote:

>Hello all,
>
>I am trying to read in and stack multiple raw data files (file1, file2
>etc)which contain the same variables (var1, var2 etc) but refer to
>different time periods. I also want to add a variable to the final dataset
>so that I can tell which file an observation originally came from. The
>following program works fine when I try to read in text files without a
>line of variable names at the top (omitting the firstobs=2 option).
>However, I get 'invalid data' error messages when I try to read in
>tab-delimited text files with a line of variable names at the top.
>
>data allfiles;
>    infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile missover;
>    input var1 var2 var3 var4 var5;
>    retain time 1;
>    if newfile then do;
>         newfile=0;
>         time+1;
>    end;
>run;
>
>The error message refers to the first line of each of file2, file3 etc.I
>conclude that SAS only applies the firstobs=2 observation ONCE when reading
>in and stacking multiple files. Is there any way that I can fix this
>problem apart from going through the text files and manually deleting the
>line of variable names for each file?
>
>I'm using SAS version 8.2.
>
>Many thanks,
>
>Katherine
0
jim2stat (828)
5/19/2006 11:18:32 AM
you can use system pipe to do so very easily. Here is code and you
might need to change something.

/*
SAMPLE DATA IN 4 FILES WITH UNKNOWN NAMES
12011997 M 09/08/04
12011997 F 01/26/01
....
*/

/* USE PIPE TO RETURN A LIST OF FILENAMES */
filename indata pipe 'dir "C:\temp" /b';

data File Data;
format path $100.;
/* OPEN A PIPE FOR INPUT */
infile indata truncover;
/* INPUT FILE NAMES */
input file $20.;
/* SPECIFY FULL PATH OF FILES */
if index(file, "yourheader") >= 1 then path = 'C:\temp\'||file;
put path;

/* OPEN EACH FILE */
infile dummy filevar = path end = done truncover;
do while(not done);
/* INPUT DATA IN THE FILE */
input Account $ 1-8 Sex $ 10-10 Date mmddyy8.;
format Date Date.;
output;
end;
run;


On 5/19/06, Katherine Smith <katherine.smith@mcri.edu.au> wrote:
> Hello all,
>
> I am trying to read in and stack multiple raw data files (file1, file2
> etc)which contain the same variables (var1, var2 etc) but refer to
> different time periods. I also want to add a variable to the final dataset
> so that I can tell which file an observation originally came from. The
> following program works fine when I try to read in text files without a
> line of variable names at the top (omitting the firstobs=2 option).
> However, I get 'invalid data' error messages when I try to read in
> tab-delimited text files with a line of variable names at the top.
>
> data allfiles;
>     infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile missover;
>     input var1 var2 var3 var4 var5;
>     retain time 1;
>     if newfile then do;
>          newfile=0;
>          time+1;
>     end;
> run;
>
> The error message refers to the first line of each of file2, file3 etc.I
> conclude that SAS only applies the firstobs=2 observation ONCE when reading
> in and stacking multiple files. Is there any way that I can fix this
> problem apart from going through the text files and manually deleting the
> line of variable names for each file?
>
> I'm using SAS version 8.2.
>
> Many thanks,
>
> Katherine
>


--
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center
0
liuwensui (937)
5/19/2006 4:16:24 PM
Katherine,

Unfortunately, while I had just written and tested a solution that worked,
my office is currently upgrading our system and I lost access to what I
had written.  I'll do my best to recall, but no guarantees.

If it doesn't work, write me offline and I'll send you what I know had
worked when I can again access my system on Tuesday.

data allfiles;
  infile 'C:\sasdata\file*.txt' dlm='09'x eov=newfile missover;
  input x $255 @;
  retain newfile;
  if _n_ eq 1 or newfile then do;
    newfile=0;
    test+1;
  end;
  else do;
    input @1 var1 var2 var3 var4 var5;
    output;
  end;
run;

HTH,
Art
----------
On Fri, 19 May 2006 14:36:23 +1000, Katherine Smith
<katherine.smith@MCRI.EDU.AU> wrote:

>Hello all,
>
>I am trying to read in and stack multiple raw data files (file1, file2
>etc)which contain the same variables (var1, var2 etc) but refer to
>different time periods. I also want to add a variable to the final dataset
>so that I can tell which file an observation originally came from. The
>following program works fine when I try to read in text files without a
>line of variable names at the top (omitting the firstobs=2 option).
>However, I get 'invalid data' error messages when I try to read in
>tab-delimited text files with a line of variable names at the top.
>
>data allfiles;
>    infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile
missover;
>    input var1 var2 var3 var4 var5;
>    retain time 1;
>    if newfile then do;
>         newfile=0;
>         time+1;
>    end;
>run;
>
>The error message refers to the first line of each of file2, file3 etc.I
>conclude that SAS only applies the firstobs=2 observation ONCE when
reading
>in and stacking multiple files. Is there any way that I can fix this
>problem apart from going through the text files and manually deleting the
>line of variable names for each file?
>
>I'm using SAS version 8.2.
>
>Many thanks,
>
>Katherine
0
art297 (4213)
5/19/2006 11:07:45 PM
See below for some important corrections.

On Fri, 19 May 2006 07:18:32 -0400, Jim Groeneveld <jim2stat@YAHOO.CO.UK>
wrote:

>Dear Katherine,
>
>Departing from your code I would propose the following *untested*
amendment:
>
>data allfiles;
>  infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile missover;
>  retain time 1;

Insert here:

   input @;

Otherwise code will be inspecting the EOV flag for the previous record.

>  if newfile then do; * First record of each next, not first file;
>    newfile=0; * Is this really necessary? Not auto-0 next record? ;

Above is necessary; not automatically reset.

>    time+1;
>    INPUT; * Skip each first line of new file, no variables specified;

Change INPUT to DELETE. That will not only release the buffer but also
prevent outputting an observation full of missing values.

>  end;
>  ELSE input var1 var2 var3 var4 var5; * Read the variables as desired;
>run;
>
>Try this and tell us whether it works.
>
>Regards - Jim.
>--
>Jim Groeneveld, Netherlands
>Statistician, SAS consultant
>home.hccnet.nl/jim.groeneveld
>
>
>On Fri, 19 May 2006 14:36:23 +1000, Katherine Smith
><katherine.smith@MCRI.EDU.AU> wrote:
>
>>Hello all,
>>
>>I am trying to read in and stack multiple raw data files (file1, file2
>>etc)which contain the same variables (var1, var2 etc) but refer to
>>different time periods. I also want to add a variable to the final dataset
>>so that I can tell which file an observation originally came from. The
>>following program works fine when I try to read in text files without a
>>line of variable names at the top (omitting the firstobs=2 option).
>>However, I get 'invalid data' error messages when I try to read in
>>tab-delimited text files with a line of variable names at the top.
>>
>>data allfiles;
>>    infile 'C:\sasdata\file*.txt' dlm='09'x firstobs=2 eov=newfile
missover;
>>    input var1 var2 var3 var4 var5;
>>    retain time 1;
>>    if newfile then do;
>>         newfile=0;
>>         time+1;
>>    end;
>>run;
>>
>>The error message refers to the first line of each of file2, file3 etc.I
>>conclude that SAS only applies the firstobs=2 observation ONCE when
reading
>>in and stacking multiple files. Is there any way that I can fix this
>>problem apart from going through the text files and manually deleting the
>>line of variable names for each file?
>>
>>I'm using SAS version 8.2.
>>
>>Many thanks,
>>
>>Katherine
0
nospam1405 (4666)
5/22/2006 12:15:58 AM
Reply: