Reading a delimited file

  • Follow


I want to read a file that is organized with semicolons (;) as a delimeter.  The file is organized into 4 columns where each row contains a variable name, value, units, in/out designation.  

Example:
 
Variable.With.Various.Unique.Levels.name1;Value;Units;in
Variable.With.Various.Unique.Levels.name2;Value;Units;out

Value could be anything
1
10946.0386113404
19173.112909708834,0,-722.71884819229354
"2006.6,0; 2000.60,131.63; 1982.65,272.563;"      <-- This one is my problem
or any string

Units will always be a string
The last column will always be a string, 'in' or 'out'

I would just use textscan with ; as delim, but Value can sometimes have ; that are between double quotes. " "  Making textscan harder to use.  I would like anything inside the double quotes to be read as a string to be processed later.

Ideas?

Please ask for clarification if needed.  
0
Reply Camron 5/26/2010 7:21:04 PM

On May 27, 7:21=A0am, "Camron Call" <camronc...@gmail.cam> wrote:
> I want to read a file that is organized with semicolons (;) as a delimete=
r. =A0The file is organized into 4 columns where each row contains a variab=
le name, value, units, in/out designation. =A0
>
> Example:
>
> Variable.With.Various.Unique.Levels.name1;Value;Units;in
> Variable.With.Various.Unique.Levels.name2;Value;Units;out
>
> Value could be anything
> 1
> 10946.0386113404
> 19173.112909708834,0,-722.71884819229354
> "2006.6,0; 2000.60,131.63; 1982.65,272.563;" =A0 =A0 =A0<-- This one is m=
y problem
> or any string
>
> Units will always be a string
> The last column will always be a string, 'in' or 'out'
>
> I would just use textscan with ; as delim, but Value can sometimes have ;=
 that are between double quotes. " " =A0Making textscan harder to use. =A0I=
 would like anything inside the double quotes to be read as a string to be =
processed later.
>
> Ideas?
>
> Please ask for clarification if needed. =A0

So why not read everything in as a string?

BTW, your examples are as clear as mud.
The numerical example does not seem to match:
Variable.With.Various.Unique.Levels.name1;Value;Units;in

Can you just show us a few lines from the file without editorial
comments?
0
Reply TideMan 5/26/2010 8:54:21 PM


The rows are formatted with:
variable name ; value ; units ; in/out

The data is a text file with rows like the following:

Aircraft.Mass.Calibration.Name1;1;;in
Aircraft.Mass.Calibration.Name2;1;;in
Aircraft.Mass.Calibration.Name3;12345.6789;lb;in
Aircraft.Mass.Certification.Name4;123456;lb;in
Aircraft.Mass.Certification.Name5;12345;lb;in
Aircraft.Mass.Design.Category1.Name6;12345.6789,0,-1234.92;;out
Aircraft.Mass.Design.Name7;12345.678;lb;out
Aircraft.Operations.Name8;1.234;;in
Aircraft.Operations.Name9;123;kts;in
Aircraft.Operations.Name10;1.2;;in
Aircraft.Components.Category2.Name11;7556.63837287839;gal;out
Aircraft.Components.Category3.Category4.Name12;"2006.6,0;2000.60,131.63; ";;out 
FlightPerformance.DesignMission.Category5.Name13;True;;in
FlightPerformance.DesignMission.Category6.Name14;"10; 30; 50; 70; 90; 110; ";;in

I wondered if there is any way to get this data into the workspace without using fgetl in a loop and parsing each line individually.  The final goal is to preserve the variable hierarchy in a cell array, or structure or dataset or something and be able to access the variable name, value, units, and in/out. 
0
Reply Camron 5/26/2010 11:26:06 PM

On May 27, 11:26=A0am, "Camron Call" <camronc...@gmail.cam> wrote:
> The rows are formatted with:
> variable name ; value ; units ; in/out
>
> The data is a text file with rows like the following:
>
> Aircraft.Mass.Calibration.Name1;1;;in
> Aircraft.Mass.Calibration.Name2;1;;in
> Aircraft.Mass.Calibration.Name3;12345.6789;lb;in
> Aircraft.Mass.Certification.Name4;123456;lb;in
> Aircraft.Mass.Certification.Name5;12345;lb;in
> Aircraft.Mass.Design.Category1.Name6;12345.6789,0,-1234.92;;out
> Aircraft.Mass.Design.Name7;12345.678;lb;out
> Aircraft.Operations.Name8;1.234;;in
> Aircraft.Operations.Name9;123;kts;in
> Aircraft.Operations.Name10;1.2;;in
> Aircraft.Components.Category2.Name11;7556.63837287839;gal;out
> Aircraft.Components.Category3.Category4.Name12;"2006.6,0;2000.60,131.63; =
";;out
> FlightPerformance.DesignMission.Category5.Name13;True;;in
> FlightPerformance.DesignMission.Category6.Name14;"10; 30; 50; 70; 90; 110=
; ";;in
>
> I wondered if there is any way to get this data into the workspace withou=
t using fgetl in a loop and parsing each line individually. =A0The final go=
al is to preserve the variable hierarchy in a cell array, or structure or d=
ataset or something and be able to access the variable name, value, units, =
and in/out.

Aah, I see the problem now.
The Value column is fairly chaotic, isn't it?
Sometimes it has a number - that's easy.
Sometimes it has text - that's easy
But sometimes it is a string enclosed in double quotes, and if that is
the case the stuff inside the double quotes is delimited by either a
semicolon or a comma.  So, textscan should ignore the semicolons if
they are inside double quotes.

I'm afraid this is so idiosyncratic that I don't see any alternative
to using fgetl in a loop and parsing line by line.
Not much help I'm afraid.
0
Reply TideMan 5/26/2010 11:48:50 PM

Camron Call wrote:

> Aircraft.Components.Category2.Name11;7556.63837287839;gal;out
> Aircraft.Components.Category3.Category4.Name12;"2006.6,0;2000.60,131.63; 
> ";;out FlightPerformance.DesignMission.Category5.Name13;True;;in
> FlightPerformance.DesignMission.Category6.Name14;"10; 30; 50; 70; 90; 
> 110; ";;in

> I wondered if there is any way to get this data into the workspace 
> without using fgetl in a loop and parsing each line individually.  The 
> final goal is to preserve the variable hierarchy in a cell array, or 
> structure or dataset or something and be able to access the variable 
> name, value, units, and in/out.

If it were me, I would pre-process the input file into another form, 
using perl or awk or sed or an editor like vi.
0
Reply Walter 5/27/2010 3:06:31 AM

That's fine.  I can write the code to do it in a loop.  So I would really like to know what would be the best way to save the data and preserve it's variable names and hierarchy.  

How do you make nested structures programatically if I want it to be organized like:
Aircraft.Mass.Calibration.Name1
so that I can type --> Aircraft.Mass.Calibration.Name1.value and get the value string or get the units by Aircraft.Mass.Calibration.Name1.units

or even have Name1 be a cell array with 4 entries.  So as to access it by something like:
Aircraft.Mass.Calibration.Name1{1}

Would this type of data fit well into a dataset type?
0
Reply Camron 5/27/2010 3:43:21 PM

On May 28, 3:43=A0am, "Camron Call" <camronc...@gmail.cam> wrote:
> That's fine. =A0I can write the code to do it in a loop. =A0So I would re=
ally like to know what would be the best way to save the data and preserve =
it's variable names and hierarchy. =A0
>
> How do you make nested structures programatically if I want it to be orga=
nized like:
> Aircraft.Mass.Calibration.Name1
> so that I can type --> Aircraft.Mass.Calibration.Name1.value and get the =
value string or get the units by Aircraft.Mass.Calibration.Name1.units
>
> or even have Name1 be a cell array with 4 entries. =A0So as to access it =
by something like:
> Aircraft.Mass.Calibration.Name1{1}
>
> Would this type of data fit well into a dataset type?

Yes, but even better, IMHO, is to use an array of structures.
There would be 4 fields corresponding to your 4 columns (name, value,
units, inout), and each record would be an index in the structure.
So, for example, s(5).name would be
'Aircraft.Mass.Certification.Name5' (from your file listing above).
and s(5).units would be 'lb'.
You can generate this structure in the loop as you read the file in
line by line:
for id=3D1:1000
line=3Dfgetl(fid);
if line =3D=3D -1,break,end   % -1 means EOF
%  Parse each line here
s(id).name=3D
s(id).value=3D
s(id).units=3D
s(id).inout=3D
end

0
Reply TideMan 5/27/2010 8:09:40 PM

6 Replies
207 Views

(page loaded in 0.091 seconds)

Similiar Articles:













7/23/2012 7:21:02 AM


Reply: