Speed-up the reading of large binary files with complex structures

  • Follow


Hi,

I am trying to read a large binary file where the data is structured in a known format composed by a variable number of blocks. Also, these blocks have variable length. See the code attached below to look at the kind of structure that I am intending to read.

The problem is that the file is very large and Matlab spends more than 10 minutes to read a product of 300 MB (for example, snapshot_counter = 2500, point_counter =100000, num_bt = 12 in average).

The loop for the snapshot_counter is fast, but the loop regarding to the point_counter with the nested loop num_bt is very slow.

Could someone help me to find a solution to read fast these kind of files (my target is in a few seconds)?

Thanks in advance,
Fernando

============================================= 

function [MIR1C,error] = read_MIR_SC_F1C_vectorized(datablock)

id = fopen(datablock);
snapshot_counter = fread(id,1,'uint32');

for dsr_counter = 1:snapshot_counter
    MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.days=        fread(id, 1, 'int32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.seconds=     fread(id, 1, 'uint32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.microseconds=fread(id, 1, 'uint32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_ID')=            fread(id, 1, 'uint32'); 
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_OBET')=          fread(id, 1, 'uint64');            
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Position')=             fread(id, 1, 'float64'); 
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Y_Position')=             fread(id, 1, 'float64');   
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Position')=             fread(id, 1, 'float64');   
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Velocity')=             fread(id, 1, 'float64');  
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Y_Velocity')=             fread(id, 1, 'float64');   
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Velocity')=             fread(id, 1, 'float64');    
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Vector_Source')=          fread(id, 1, 'uchar');     
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q0')=                     fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q1')=                     fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q2')=                     fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q3')=                     fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('TEC')=                    fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_F')=               fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_D')=               fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_I')=               fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_RA')=                 fread(id, 1, 'float32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_DEC')=                fread(id, 1, 'float32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_BT')=                 fread(id, 1, 'float32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Accuracy')=               fread(id, 1, 'float32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Radiometric_Accuracy')=   fread(id, 2, 'float32');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Band')=      fread(id, 1, 'uchar');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Software_Error_Flag') = fread (id,1,'uchar');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Instrument_Error_Flag') = fread (id,1,'uchar');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_ADF_Error_Flag') = fread (id,1,'uchar');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Calibration_Error_Flag') = fread (id,1,'uchar');
end

point_counter = fread(id,1,'uint32');

for dsr_counter = 1:point_counter
    MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_ID=        fread(id, 1, 'uint32');
    MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Latitude=  fread(id, 1, 'float32');
    MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Longitude= fread(id, 1, 'float32');
    MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Altitude=  fread(id, 1, 'float32');
    MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Mask=      fread(id, 1, 'uchar');
    num_bt = fread(id, 1, 'uint16');
    for bt_element = 1:num_bt
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Flags(bt_element) = fread(id, 1, 'uint16');
        MIR1C.List_of_Grid_Point_Data(dsr_counter).BT_Value_Real(bt_element) = fread(id, 1, 'float32');
        MIR1C.List_of_Grid_Point_Data(dsr_counter).BT_Value_Imag(bt_element) = fread(id, 1, 'float32');
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Pixel_Radiometric_ACcuracy(bt_element) = fread(id, 1, 'uint16');
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Incidence_Angle(bt_element) = fread(id, 1, 'uint16')*90/(2^16);
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Azimuth_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Faraday_Rotation_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Geometric_Rotation_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Snapshot_ID_of_Pixel(bt_element) = fread(id, 1, 'uint32');
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Footprint_Axis1(bt_element) = fread(id, 1, 'uint16');
        MIR1C.List_of_Grid_Point_Data(dsr_counter).Footprint_Axis2(bt_element) = fread(id, 1, 'uint16');
    end        
end
0
Reply Fernando 1/20/2011 5:34:07 PM

I often use MEMMAPFILE to read large binary file.

Bruno
0
Reply Bruno 1/20/2011 7:00:21 PM


Thanks Bruno for your suggestion, but I think that I cannot use it since the file has variable length defined by three parameters.
One of the parameters define the number of records in the file and another parameter inside each record defines the length of each record.

I think that memmapfile is not flexible enough. Nevertheless, thanks for your suggestion.

Fernando

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <iha0o5$8p7$1@fred.mathworks.com>...
> I often use MEMMAPFILE to read large binary file.
> 
> Bruno
0
Reply Fernando 1/20/2011 8:54:04 PM

"Fernando" wrote in message <ih9rmf$2js$1@fred.mathworks.com>...
> Hi,
> 
> I am trying to read a large binary file where the data is structured in a known format composed by a variable number of blocks. Also, these blocks have variable length. See the code attached below to look at the kind of structure that I am intending to read.
> 
> The problem is that the file is very large and Matlab spends more than 10 minutes to read a product of 300 MB (for example, snapshot_counter = 2500, point_counter =100000, num_bt = 12 in average).
> 
> The loop for the snapshot_counter is fast, but the loop regarding to the point_counter with the nested loop num_bt is very slow.
> 
> Could someone help me to find a solution to read fast these kind of files (my target is in a few seconds)?
> 
> Thanks in advance,
> Fernando


I don't see any pre-allocation in your code. Everytime you expand your arrays MATLAB has to copy all the pointer data and/or value data to make room for your new stuff. See suggestions below.


> ============================================= 
> 
> function [MIR1C,error] = read_MIR_SC_F1C_vectorized(datablock)
> 
> id = fopen(datablock);
> snapshot_counter = fread(id,1,'uint32');
> 

Before getting into the following loop, create the structure for MIR1C of the necessary size and fields to hold snapshot_counter records.

> for dsr_counter = 1:snapshot_counter
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.days=        fread(id, 1, 'int32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.seconds=     fread(id, 1, 'uint32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.microseconds=fread(id, 1, 'uint32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_ID')=            fread(id, 1, 'uint32'); 
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_OBET')=          fread(id, 1, 'uint64');            
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Position')=             fread(id, 1, 'float64'); 
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Y_Position')=             fread(id, 1, 'float64');   
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Position')=             fread(id, 1, 'float64');   
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Velocity')=             fread(id, 1, 'float64');  
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Y_Velocity')=             fread(id, 1, 'float64');   
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Velocity')=             fread(id, 1, 'float64');    
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Vector_Source')=          fread(id, 1, 'uchar');     
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q0')=                     fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q1')=                     fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q2')=                     fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q3')=                     fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('TEC')=                    fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_F')=               fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_D')=               fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_I')=               fread(id, 1, 'float64');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_RA')=                 fread(id, 1, 'float32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_DEC')=                fread(id, 1, 'float32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_BT')=                 fread(id, 1, 'float32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Accuracy')=               fread(id, 1, 'float32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Radiometric_Accuracy')=   fread(id, 2, 'float32');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Band')=      fread(id, 1, 'uchar');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Software_Error_Flag') = fread (id,1,'uchar');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Instrument_Error_Flag') = fread (id,1,'uchar');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_ADF_Error_Flag') = fread (id,1,'uchar');
>     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Calibration_Error_Flag') = fread (id,1,'uchar');
> end
> 
> point_counter = fread(id,1,'uint32');
> 

Before getting into the next loop, allocate the MIR1C.List_of_Grid_Point_Data structure to the necessary size to hold point_counter number of records with the necessary fields.

> for dsr_counter = 1:point_counter
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_ID=        fread(id, 1, 'uint32');
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Latitude=  fread(id, 1, 'float32');
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Longitude= fread(id, 1, 'float32');
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Altitude=  fread(id, 1, 'float32');
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Mask=      fread(id, 1, 'uchar');
>     num_bt = fread(id, 1, 'uint16');

Before getting into the next loop, allocate the MIR1C.List_of_Grid_Point_Data(dsr_counter).Flags, ...BT_Value_Real, ...BT_Balue_Imag, etc. arrays to contain num_bt entries.

>     for bt_element = 1:num_bt
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Flags(bt_element) = fread(id, 1, 'uint16');
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).BT_Value_Real(bt_element) = fread(id, 1, 'float32');
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).BT_Value_Imag(bt_element) = fread(id, 1, 'float32');
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Pixel_Radiometric_ACcuracy(bt_element) = fread(id, 1, 'uint16');
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Incidence_Angle(bt_element) = fread(id, 1, 'uint16')*90/(2^16);
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Azimuth_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Faraday_Rotation_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Geometric_Rotation_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Snapshot_ID_of_Pixel(bt_element) = fread(id, 1, 'uint32');
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Footprint_Axis1(bt_element) = fread(id, 1, 'uint16');
>         MIR1C.List_of_Grid_Point_Data(dsr_counter).Footprint_Axis2(bt_element) = fread(id, 1, 'uint16');
>     end        
> end


James Tursa
0
Reply James 1/21/2011 12:59:21 AM

Hi James,

I have just tried to allocate the memory of the structure in advance, but I do not get any observable improvement.

The problem is with the point_counter loop which has about 100000 iterations and it takes more than 10 minutes. The snaphsot_counter loop only takes a couple of seconds.

Perhaps, a mex file could be the solution. What do you think?

Fernando

ps. I have seen software reading those files in less than 20 sec, so I am wondering if I am able to get similar performances with Matlab.



"James Tursa" wrote in message <ihalp9$bmi$1@fred.mathworks.com>...
> "Fernando" wrote in message <ih9rmf$2js$1@fred.mathworks.com>...
> > Hi,
> > 
> > I am trying to read a large binary file where the data is structured in a known format composed by a variable number of blocks. Also, these blocks have variable length. See the code attached below to look at the kind of structure that I am intending to read.
> > 
> > The problem is that the file is very large and Matlab spends more than 10 minutes to read a product of 300 MB (for example, snapshot_counter = 2500, point_counter =100000, num_bt = 12 in average).
> > 
> > The loop for the snapshot_counter is fast, but the loop regarding to the point_counter with the nested loop num_bt is very slow.
> > 
> > Could someone help me to find a solution to read fast these kind of files (my target is in a few seconds)?
> > 
> > Thanks in advance,
> > Fernando
> 
> 
> I don't see any pre-allocation in your code. Everytime you expand your arrays MATLAB has to copy all the pointer data and/or value data to make room for your new stuff. See suggestions below.
> 
> 
> > ============================================= 
> > 
> > function [MIR1C,error] = read_MIR_SC_F1C_vectorized(datablock)
> > 
> > id = fopen(datablock);
> > snapshot_counter = fread(id,1,'uint32');
> > 
> 
> Before getting into the following loop, create the structure for MIR1C of the necessary size and fields to hold snapshot_counter records.
> 
> > for dsr_counter = 1:snapshot_counter
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.days=        fread(id, 1, 'int32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.seconds=     fread(id, 1, 'uint32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_Time.microseconds=fread(id, 1, 'uint32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_ID')=            fread(id, 1, 'uint32'); 
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_OBET')=          fread(id, 1, 'uint64');            
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Position')=             fread(id, 1, 'float64'); 
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Y_Position')=             fread(id, 1, 'float64');   
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Position')=             fread(id, 1, 'float64');   
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Velocity')=             fread(id, 1, 'float64');  
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Y_Velocity')=             fread(id, 1, 'float64');   
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Velocity')=             fread(id, 1, 'float64');    
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Vector_Source')=          fread(id, 1, 'uchar');     
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q0')=                     fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q1')=                     fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q2')=                     fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q3')=                     fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('TEC')=                    fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_F')=               fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_D')=               fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Geomag_I')=               fread(id, 1, 'float64');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_RA')=                 fread(id, 1, 'float32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_DEC')=                fread(id, 1, 'float32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Sun_BT')=                 fread(id, 1, 'float32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Accuracy')=               fread(id, 1, 'float32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Radiometric_Accuracy')=   fread(id, 2, 'float32');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('X_Band')=      fread(id, 1, 'uchar');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Software_Error_Flag') = fread (id,1,'uchar');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Instrument_Error_Flag') = fread (id,1,'uchar');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_ADF_Error_Flag') = fread (id,1,'uchar');
> >     MIR1C.List_of_Snapshot_Informations(dsr_counter).('Quality_Information_Calibration_Error_Flag') = fread (id,1,'uchar');
> > end
> > 
> > point_counter = fread(id,1,'uint32');
> > 
> 
> Before getting into the next loop, allocate the MIR1C.List_of_Grid_Point_Data structure to the necessary size to hold point_counter number of records with the necessary fields.
> 
> > for dsr_counter = 1:point_counter
> >     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_ID=        fread(id, 1, 'uint32');
> >     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Latitude=  fread(id, 1, 'float32');
> >     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Longitude= fread(id, 1, 'float32');
> >     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Altitude=  fread(id, 1, 'float32');
> >     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Mask=      fread(id, 1, 'uchar');
> >     num_bt = fread(id, 1, 'uint16');
> 
> Before getting into the next loop, allocate the MIR1C.List_of_Grid_Point_Data(dsr_counter).Flags, ...BT_Value_Real, ...BT_Balue_Imag, etc. arrays to contain num_bt entries.
> 
> >     for bt_element = 1:num_bt
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Flags(bt_element) = fread(id, 1, 'uint16');
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).BT_Value_Real(bt_element) = fread(id, 1, 'float32');
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).BT_Value_Imag(bt_element) = fread(id, 1, 'float32');
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Pixel_Radiometric_ACcuracy(bt_element) = fread(id, 1, 'uint16');
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Incidence_Angle(bt_element) = fread(id, 1, 'uint16')*90/(2^16);
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Azimuth_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Faraday_Rotation_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Geometric_Rotation_Angle(bt_element) = fread(id, 1, 'uint16')*360/(2^16);
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Snapshot_ID_of_Pixel(bt_element) = fread(id, 1, 'uint32');
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Footprint_Axis1(bt_element) = fread(id, 1, 'uint16');
> >         MIR1C.List_of_Grid_Point_Data(dsr_counter).Footprint_Axis2(bt_element) = fread(id, 1, 'uint16');
> >     end        
> > end
> 
> 
> James Tursa
0
Reply Fernando 1/21/2011 9:47:04 AM

"Fernando" wrote in message <ihbkmo$pv8$1@fred.mathworks.com>...
> Hi James,
> 
> I have just tried to allocate the memory of the structure in advance, but I do not get any observable improvement.
> 
> The problem is with the point_counter loop which has about 100000 iterations and it takes more than 10 minutes. The snaphsot_counter loop only takes a couple of seconds.
> 
> Perhaps, a mex file could be the solution. What do you think?
> 
> Fernando

The way the data is organized as array of structures is inefficient and may be a killer for your code, as well as for reading and then for later processing. I strongly advise to revise your data structure if you can.

You can read data in block of bytes and use proper data casting (using TYPECAST).

But again. It is crucial to properly design the data structure before going further.

Bruno
0
Reply Bruno 1/21/2011 10:35:05 AM

Dear Fernando,

I cannot imaging, that the pre-allocation has no measureable effect. Perhaps something went wrong?

Some futher ideas:

> snapshot_counter = fread(id,1,'uint32');
> for dsr_counter = 1:snapshot_counter
Using UINT32 as index is usually faster than a DOUBLE. So you can try this:
  snapshot_counter = fread(id,1,'*uint32');
  for dsr_counter = uint32(1):snapshot_counter
 
You could use the integer types at all even for the other fields.

> MIR1C.List_of_Snapshot_Informations(dsr_counter).('Snapshot_ID') =    fread(id, 1, 'uint32'); 

Direct access is faster than dynamic field name:
MIR1C.List_of_Snapshot_Informations(dsr_counter).Snapshot_ID

Create a scalar struct at first and insert it to the struct array afterwards:
  for dsr_counter = 1:point_counter
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_ID=        fread(id, 1, 'uint32');
>     MIR1C.List_of_Grid_Point_Data(dsr_counter).Grid_Point_Latitude=  fread(id, 1, 'float32');
> ...
   tmp.Grid_Point_ID = fread(id, 1, 'uint32');
   tmp.Grid_Point_Latitude = fread(id, 1, 'float32');
   ...
   MIR1C.List_of_Grid_Point_Data(dsr_counter) = tmp;

> ... *90/(2^16);
A waste of time. Calculate such constants once outside the loop! Matlab is not C, where such constant expressions are precalculated by the compiler already. Perhaps the JIT of Matlab2010b fixes such expressions, but at least older Matlab versions don't and "2^16" is a time-consuming call of the expensive POWER for doubles.

> fread(id, 1, 'uint32');
The values of strings in an M-file are stored in a table. Accessing a string from this table needs some time. If a specific string appears repeatedly in an M-file, it is usually faster to define it as a single variable:
  u32 = 'uint32';
  ...
  fread(id, 1, u32);
A variable must be found in a lookup table also, but this can be much shorter! However, the effects depend strongly on the JIT-accelerator. So you have to check for each specific function, if the string variables are faster.

If you get the pre-allocation to work correctly in addition, I assume a speedup of 50% (+- 45%, of course).

Good luck, Jan
0
Reply Jan 1/21/2011 10:40:07 AM

Hi Bruno,

I agree with you that the data structure is inefficient, but I cannot modify the file format since the file format is imposed by the data provider. However, I can try to organize the data in a different way after reading the binary data. So far, the Matlab structure is reflecting the format of the binary file.

Fernando

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihbngp$pp9$1@fred.mathworks.com>...

> > 
> > Fernando
> 
> The way the data is organized as array of structures is inefficient and may be a killer for your code, as well as for reading and then for later processing. I strongly advise to revise your data structure if you can.
> 
> You can read data in block of bytes and use proper data casting (using TYPECAST).
> 
> But again. It is crucial to properly design the data structure before going further.
> 
> Bruno
0
Reply Fernando 1/21/2011 1:41:04 PM

"Fernando" wrote in message <ihc2dg$c2k$1@fred.mathworks.com>...
>  So far, the Matlab structure is reflecting the format of the binary file.

Not quite. Data of Matlab array of structures are not contiguous and serial like organized in the file or with C-structure. You should organize the data differently after reading. 

Bruno
0
Reply Bruno 1/21/2011 3:26:05 PM

Hi Bruno,

After reading Jan's e-mail, I have decided to do another test which consists on only reading the data without storing it. I have focused in the point_counter loop:

--------
for dsr_counter = 1:point_counter
    fread(id, 1, 'uint32');
    fread(id, 1, 'float32');
    fread(id, 1, 'float32');
    fread(id, 1, 'float32');
    fread(id, 1, 'uchar');
    num_bt = fread(id, 1, 'uint16');
    fread(id,num_bt*28,'uchar');
end
---------

This code is independent in the way that I store the data in Matlab and it is just reading the file without storing it. Also, I have decided to read the num_bt in once, simulating that I would decode that information later.

The result assuming that point_counter = 100000 and num_bt = 10 (in average, since num_bt block has variable length), the loop spends 10 minutes to read the product. The size of the file is 300 MB.

So far, I am running out of ideas to speed up the reading process.

Any idea is more than welcome.
Thanks
Fernando



"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihc8id$sn1$1@fred.mathworks.com>...
> "Fernando" wrote in message <ihc2dg$c2k$1@fred.mathworks.com>...
> >  So far, the Matlab structure is reflecting the format of the binary file.
> 
> Not quite. Data of Matlab array of structures are not contiguous and serial like organized in the file or with C-structure. You should organize the data differently after reading. 
> 
> Bruno
0
Reply Fernando 1/21/2011 4:32:04 PM

"Fernando" wrote in message <ihcce4$eu3$1@fred.mathworks.com>...

> Any idea is more than welcome.

Again, the best way is to read the *big* block of *binary* data at once, and use TYPECAST to convert to correct type. This must be fast. The pain is to compute where is the data positions of various fields.

Bruno
0
Reply Bruno 1/21/2011 6:50:05 PM

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihckgt$b3c$1@fred.mathworks.com>...
> "Fernando" wrote in message <ihcce4$eu3$1@fred.mathworks.com>...
> 
> > Any idea is more than welcome.
> 
> Again, the best way is to read the *big* block of *binary* data at once, and use TYPECAST to convert to correct type. This must be fast. The pain is to compute where is the data positions of various fields.
> 
> Bruno

Not only a pain, but may not even be possible (at least directly). Consider this excerpt:

             :
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Z_Velocity')= fread(id, 1, 'float64');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Vector_Source')= fread(id, 1, 'uchar');
    MIR1C.List_of_Snapshot_Informations(dsr_counter).('Q0')= fread(id, 1, 'float64');
             :

Apparently there is an 8-byte float, followed by a 1-byte char, followed by another 8-byte float. If you read that directly into memory the 8-byte floats will not both be aligned on 8-byte boundaries. Trying to access the values directly in memory as 8-byte floats would not be guaranteed to work on the processor (i.e., the equivalent of a type-pun using an invalid address for an 8-byte float), so one may be forced to copy the appropriate data to new memory that is properly aligned for the data type involved before working with it.  Then throw in the variable length of the "records", and it's a mess ...

James Tursa
0
Reply James 1/21/2011 7:40:08 PM

> Apparently there is an 8-byte float, followed by a 1-byte char, followed by another 8-byte float. If you read that directly into memory the 8-byte floats will not both be aligned on 8-byte boundaries. 

I disagree. If you read in large array of UINT8, the file data are map as it is on memory, there is no alignment shift occurs. Afterward the program only needs to extract the right index then cast. I have done that few such ways with complicated file up to 2Gb and it works.

Bruno
0
Reply Bruno 1/21/2011 7:52:04 PM

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihco54$cub$1@fred.mathworks.com>...
> 
> > Apparently there is an 8-byte float, followed by a 1-byte char, followed by another 8-byte float. If you read that directly into memory the 8-byte floats will not both be aligned on 8-byte boundaries. 
> 
> I disagree. If you read in large array of UINT8, the file data are map as it is on memory, there is no alignment shift occurs. Afterward the program only needs to extract the right index then cast.

Depends on what you mean by "extract the right index". If you mean that the data is copied from some original address to new memory ala memcpy or the like and then cast to the desired type, then yes it will work because the new memory is going to be aligned properly for the data type. But if you mean that you simply grab the original address and then try to cast *that* memory into your desired type (e.g., float or double) then no, that is *not* guaranteed to work. The processor is not required to handle misaligned floating point data properly and the behavior is not defined if you try to do it. It *may* work on your particular processor, but it may also give the wrong answer or even bomb on other processors.

James Tursa
0
Reply James 1/21/2011 10:05:21 PM

> 
> Depends on what you mean by "extract the right index". If you mean that the data is copied from some original address to new memory ala memcpy or the like and then cast to the desired type, then yes it will work because the new memory is going to be aligned properly for the data type. But if you mean that you simply grab the original address and then try to cast *that* memory into your desired type (e.g., float or double) then no, that is *not* guaranteed to work. The processor is not required to handle misaligned floating point data properly and the behavior is not defined if you try to do it. It *may* work on your particular processor, but it may also give the wrong answer or even bomb on other processors.

What you mean by Grab the original address ? I'm not suggest any low level manipulation, but straightforward use of Matlab

 fid=fopen('Myfile.bin','r')

Buffer = fread(fid,Inf,'uchar=>uchar');
mydata = typecast(Buffer([1:8 11:18 21:28]),'double')
....
fclose(fid);

What is wrong with that?

Bruno
0
Reply Bruno 1/21/2011 10:25:23 PM

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihd14j$4qe$1@fred.mathworks.com>...
> 
> > 
> > Depends on what you mean by "extract the right index". If you mean that the data is copied from some original address to new memory ala memcpy or the like and then cast to the desired type, then yes it will work because the new memory is going to be aligned properly for the data type. But if you mean that you simply grab the original address and then try to cast *that* memory into your desired type (e.g., float or double) then no, that is *not* guaranteed to work. The processor is not required to handle misaligned floating point data properly and the behavior is not defined if you try to do it. It *may* work on your particular processor, but it may also give the wrong answer or even bomb on other processors.
> 
> What you mean by Grab the original address ? I'm not suggest any low level manipulation, but straightforward use of Matlab
> 
>  fid=fopen('Myfile.bin','r')
> 
> Buffer = fread(fid,Inf,'uchar=>uchar');
> mydata = typecast(Buffer([1:8 11:18 21:28]),'double')
> ...
> fclose(fid);
> 
> What is wrong with that?

Nothing, because you copied the original data from potentially misaligned (for the double type) memory to properly aligned memory when you did the slice Buffer([1:8 11:18 21:28]). My caution was against any type of low level use that tried to avoid this copy. (e.g., using something like your pointer class and then accessing that memory directly as a double floating point value).

James Tursa
0
Reply James 1/22/2011 12:39:05 AM

I avoid any confusion to others who read the post: Reading a big binary block, extract the binary data by proper indexing, then use TYPECAST to convert data should work. There is no risk of misalignment regardless the type of processor the program is running.

Bruno
0
Reply Bruno 1/22/2011 8:19:04 AM

Hi Bruno, James,

I have been following your suggestions.

First I read all the data in the memory using:
 -------
buffer = fread(id,Inf,'uchar=>uchar');
-----

This is a very fast command as you have already pointed out. It read 300 MB in less than 3 sec.

Then it is time for the typecasting. I have developed the following code for the typecasting of the problematic part of the file. This part is a list of #point_counter (about 100000) elements which each element has a variable size determined by its corresponding num_bt counter. If I skip th variable part of the elements, as the following code, I can decode the rest of the data in about 20 sec.

=====================
grid_point_id = zeros(mir.point_counter,1);
grid_point_latitude = zeros(mir.point_counter,1);
grid_point_longitude = zeros(mir.point_counter,1);
grid_point_altitude = zeros(mir.point_counter,1);
grid_point_mask = zeros(mir.point_counter,1);

pos =double(0);

for k=1:mir.point_counter   
    grid_point_id(k) = typecast(buffer(pos+1:pos+4),'uint32');
    grid_point_latitude(k) = typecast(buffer(pos+5:pos+8),'single');
    grid_point_longitude(k) = typecast(buffer(pos+9:pos+12),'single');
    grid_point_altitude(k) = typecast(buffer(pos+13:pos+16),'single');
    grid_point_mask(k) = buffer(pos+17);
    num_bt = double(typecast(buffer(pos+18:pos+19),'uint16'));
    pos = 28 * num_bt + pos + 19;
end
==============

However, when I try to decode the variable part of each element the script takes more than 20 minutes.

I think that the problem is to have a loop inside another loop (see code below). I have check the average number of iterations of num_bt loop and its 140. What do you think? Do you know any solution to improve the speed performance?

Thanks in advance

=========
buffer = mir.buffer;

grid_point_id = zeros(mir.point_counter,1);
grid_point_latitude = zeros(mir.point_counter,1);
grid_point_longitude = zeros(mir.point_counter,1);
grid_point_altitude = zeros(mir.point_counter,1);
grid_point_mask = zeros(mir.point_counter,1);
grid_data = zeros(mir.point_counter,1);
pos =double(0);
t = tic;
for k=1:mir.point_counter   
    grid_point_id(k) = typecast(buffer(pos+1:pos+4),'uint32');
    grid_point_latitude(k) = typecast(buffer(pos+5:pos+8),'single');
    grid_point_longitude(k) = typecast(buffer(pos+9:pos+12),'single');
    grid_point_altitude(k) = typecast(buffer(pos+13:pos+16),'single');
    grid_point_mask(k) = buffer(pos+17);
    num_bt = double(typecast(buffer(pos+18:pos+19),'uint16'));
    
    pos = pos +19;
    
    tmp.flags = zeros(num_bt,1);
    tmp.bt = zeros(num_bt,2);
    tmp.info = zeros(num_bt,5);
    tmp.snapshot_ID = zeros(num_bt,1);
    tmp.footprint = zeros(num_bt,2);
    
    for m = 1:num_bt
        tmp.flags(m) = typecast(buffer(pos+1:pos+2),'uint16');
        tmp.bt(m,:) = typecast([buffer(pos+3:pos+6)' buffer(pos+7:pos+10)'],'single');
        tmp.info(m,:) = typecast([buffer(pos+11:pos+12)' buffer(pos+13:pos+14)' buffer(pos+15:pos+16)' buffer(pos+17:pos+18)' buffer(pos+19:pos+20)'],'uint16');
        tmp.snapshot_ID(m) = typecast(buffer(pos+21:pos+24),'uint32');
        tmp.footprint(m,:) = typecast([buffer(pos+25:pos+26)' buffer(pos+27:pos+28)'],'uint16');
        pos = 28 + pos;
    end
        
end
t2=toc(t);  


==========


 
0
Reply Fernando 1/22/2011 12:36:04 PM

"Fernando" wrote in message <iheivk$ar9$1@fred.mathworks.com>...
>     
>     for m = 1:num_bt
>         tmp.flags(m) = typecast(buffer(pos+1:pos+2),'uint16');
>         tmp.bt(m,:) = typecast([buffer(pos+3:pos+6)' buffer(pos+7:pos+10)'],'single');
>         tmp.info(m,:) = typecast([buffer(pos+11:pos+12)' buffer(pos+13:pos+14)' buffer(pos+15:pos+16)' buffer(pos+17:pos+18)' buffer(pos+19:pos+20)'],'uint16');
>         tmp.snapshot_ID(m) = typecast(buffer(pos+21:pos+24),'uint32');
>         tmp.footprint(m,:) = typecast([buffer(pos+25:pos+26)' buffer(pos+27:pos+28)'],'uint16');
>         pos = 28 + pos;
>     end
>         
> end

I strongly recommend to avoid calling TYPECAST in a for loop. The command should  be used on a big array(s).

for example for FLAGS do something like this

offsetrec = (0:num_bt-1)*28;
first = 1+offsetrec;
last = 2+offsetrec;
iflags = mcolon(first,last); % FEX
flags = typecast(buffer(iflags),'uint16');
% etc...

Where mcolon() can be found on FEX:
http://www.mathworks.com/matlabcentral/fileexchange/29854-multiple-colon

Again: If you insist putting your data in the inefficient array of structures (rather than structure of arrays), you'll spend the time here and elsewhere. Believe me if you don't organize the data structure differently you might regret later.

Bruno
0
Reply Bruno 1/22/2011 1:59:04 PM

Thanks Bruno with your proposed code and mcolon function, I could remove the internal loop and reduce the typecasting to 2 min.

Regarding not to use an array of structures, I fully agree with you. Actually, the tmp structure is not an array of structures, since there is no index in tmp. It is just a placeholder for something else since I do not know how to preallocate memory in advance for an array to store all the "flags", for instance. The number of elements for the flags array would be extracted from point_counter and num_bts.

The problem is that all the num_bts are know at the end of the parsing of the buffer.

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihenr8$kqt$1@fred.mathworks.com>...
> "Fernando" wrote in message <iheivk$ar9$1@fred.mathworks.com>...
> >     
> >     for m = 1:num_bt
> >         tmp.flags(m) = typecast(buffer(pos+1:pos+2),'uint16');
> >         tmp.bt(m,:) = typecast([buffer(pos+3:pos+6)' buffer(pos+7:pos+10)'],'single');
> >         tmp.info(m,:) = typecast([buffer(pos+11:pos+12)' buffer(pos+13:pos+14)' buffer(pos+15:pos+16)' buffer(pos+17:pos+18)' buffer(pos+19:pos+20)'],'uint16');
> >         tmp.snapshot_ID(m) = typecast(buffer(pos+21:pos+24),'uint32');
> >         tmp.footprint(m,:) = typecast([buffer(pos+25:pos+26)' buffer(pos+27:pos+28)'],'uint16');
> >         pos = 28 + pos;
> >     end
> >         
> > end
> 
> I strongly recommend to avoid calling TYPECAST in a for loop. The command should  be used on a big array(s).
> 
> for example for FLAGS do something like this
> 
> offsetrec = (0:num_bt-1)*28;
> first = 1+offsetrec;
> last = 2+offsetrec;
> iflags = mcolon(first,last); % FEX
> flags = typecast(buffer(iflags),'uint16');
> % etc...
> 
> Where mcolon() can be found on FEX:
> http://www.mathworks.com/matlabcentral/fileexchange/29854-multiple-colon
> 
> Again: If you insist putting your data in the inefficient array of structures (rather than structure of arrays), you'll spend the time here and elsewhere. Believe me if you don't organize the data structure differently you might regret later.
> 
> Bruno
0
Reply Fernando 1/22/2011 4:05:05 PM

"Fernando" wrote in message <ihev7h$itt$1@fred.mathworks.com>...
> Thanks Bruno with your proposed code and mcolon function, I could remove the internal loop and reduce the typecasting to 2 min.
> 
> Regarding not to use an array of structures, I fully agree with you. Actually, the tmp structure is not an array of structures, since there is no index in tmp. It is just a placeholder for something else since I do not know how to preallocate memory in advance for an array to store all the "flags", for instance. The number of elements for the flags array would be extracted from point_counter and num_bts.
> 
> The problem is that all the num_bts are know at the end of the parsing of the buffer.

Do two pass parsing, the first pass only retrieve num_bts by chain jumping on the buffer. Once all num_bts are known, then parse the rest in the second pass.

Bruno
0
Reply Bruno 1/22/2011 4:19:04 PM

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihf01o$6nr$1@fred.mathworks.com>...

> 
> Do two pass parsing, the first pass only retrieve num_bts by chain jumping on the buffer. Once all num_bts are known, then parse the rest in the second pass.
> 

Note that you only need parsing FOR-LOOP for the first pass for num_bts, it will take about a second.

Then do the rest of the parsing *without* nested for-loop (int Matlab language it's called "vectorizing" the parsing).

Bruno
0
Reply Bruno 1/22/2011 4:34:03 PM

"Fernando" wrote in message <iheivk$ar9$1@fred.mathworks.com>...
> Hi Bruno, James,
> 
> I have been following your suggestions.
> 
> First I read all the data in the memory using:
>  -------
> buffer = fread(id,Inf,'uchar=>uchar');
> -----
> 
> This is a very fast command as you have already pointed out. It read 300 MB in less than 3 sec.
> 
> Then it is time for the typecasting. I have developed the following code for the typecasting of the problematic part of the file. This part is a list of #point_counter (about 100000) elements which each element has a variable size determined by its corresponding num_bt counter. If I skip th variable part of the elements, as the following code, I can decode the rest of the data in about 20 sec.
> 
> =====================
> grid_point_id = zeros(mir.point_counter,1);
> grid_point_latitude = zeros(mir.point_counter,1);
> grid_point_longitude = zeros(mir.point_counter,1);
> grid_point_altitude = zeros(mir.point_counter,1);
> grid_point_mask = zeros(mir.point_counter,1);
> 
> pos =double(0);
> 
> for k=1:mir.point_counter   
>     grid_point_id(k) = typecast(buffer(pos+1:pos+4),'uint32');
>     grid_point_latitude(k) = typecast(buffer(pos+5:pos+8),'single');
>     grid_point_longitude(k) = typecast(buffer(pos+9:pos+12),'single');
>     grid_point_altitude(k) = typecast(buffer(pos+13:pos+16),'single');
>     grid_point_mask(k) = buffer(pos+17);
>     num_bt = double(typecast(buffer(pos+18:pos+19),'uint16'));
>     pos = 28 * num_bt + pos + 19;
> end

FYI, the MATLAB built-in typecast function does a deep data copy. This slows things down a bit. If you want to speed up this part of the code you might consider the version of typecast that I wrote and submitted to the FEX that does a shared data copy instead of a deep data copy. You can find it here:

http://www.mathworks.com/matlabcentral/fileexchange/17476-typecast-c-mex-function

James Tursa
0
Reply James 1/22/2011 5:50:04 PM

Thank you for your comments!

It is true that is very fast to parse only num_bt. Only 4 sec.

The loop that it is used for the vectorized parsing is two minutes now. It is a big improvement, indeed!

But my goal is less than 1 minute (actually my real goal is to parse those data in 10-20 sec). Perhaps, I should try to apply this approach in a mex file.
0
Reply Fernando 1/22/2011 5:57:03 PM

"Fernando" wrote in message <ihf5pf$e4m$1@fred.mathworks.com>...

> 
> But my goal is less than 1 minute (actually my real goal is to parse those data in 10-20 sec). Perhaps, I should try to apply this approach in a mex file.

10-20 sec is reasonable goal, I'm sure it is achievable. Please post your code if you want further suggestion.

Bruno
0
Reply Bruno 1/22/2011 7:22:04 PM

Hi,

I have just followed your comments.

In order to simplify the exposition of the problem, I have simplified the code in the following 2 loops:

for k=1:mir.point_counter   
    grid_num_bt(k) = double(typecast(buffer(pos+18:pos+19),'uint16'));  
    ref_pos = pos +19;
    pos = 28 * grid_num_bt(k) + ref_pos;        
end

The loop above takes 4 sec to be executed. The loop which would parse the buffer, now I decided to skip the parsing and not take into account the time needed by typecast. The following loop requires almost 2 min. Please, also note that I am not storing the data, I am just reading it. Therefore, I am not using indexes for the variables flags, bt_real...

Do you think that it can be improved?

Thanks

pos =0;
for k=1:mir.point_counter   
     ref_pos = pos +19;

    offsetrec = (0:grid_num_bt(k)-1)*28 +ref_pos;
    first = 1 + offsetrec;
    last = 2 + offsetrec;
    iflags = mcolon(first,last);
    flags = buffer(iflags);
    first = 3 + offsetrec;
    last = 6 + offsetrec;
    ibt_real = mcolon(first,last);
    bt_real = buffer(ibt_real);
    first = 7 + offsetrec;
    last = 10 + offsetrec;
    ibt_imag = mcolon(first,last);
    bt_imag = buffer(ibt_imag);
    first = 11 + offsetrec;
    last = 20 + offsetrec;
    iinfo = mcolon(first,last);
    info = buffer(iinfo);
    first = 21 + offsetrec;
    last = 24 + offsetrec;
    isnapshot_ID = mcolon(first,last);
    snapshot_ID = buffer(isnapshot_ID);
    first = 25 + offsetrec;
    last = 28 + offsetrec;
    ifootprint = mcolon(first,last);
    footprint = buffer(ifootprint);
    pos = 28 * grid_num_bt(k) + ref_pos;
        
end


"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihfaos$1qq$1@fred.mathworks.com>...
> "Fernando" wrote in message <ihf5pf$e4m$1@fred.mathworks.com>...
> 
> > 
> > But my goal is less than 1 minute (actually my real goal is to parse those data in 10-20 sec). Perhaps, I should try to apply this approach in a mex file.
> 
> 10-20 sec is reasonable goal, I'm sure it is achievable. Please post your code if you want further suggestion.
> 
> Bruno
0
Reply Fernando 1/22/2011 7:53:04 PM

No don't do LOOP, just building indexes and typecast once

post = [0 cumsum(28*grid_num_bt+19)];
ref_pos = postab(1:end-1)+19;

% Use colon operator, don't forget to MEX -setup 
% and call mcolon_install() befor using it
mcolonops('f');

first = ref_pos;
last = first + (grid_num_bt-1)*28;
offsetrec = first:28:last; % this kind of colon indexing is possible after mcolonops('f')
% typecast here

first = 1 + offsetrec;
last = 2 + offsetrec;
flags = buffer(first:last);
% typecast here

first = 3 + offsetrec;
last = 6 + offsetrec;
bt_real = buffer(first:last);
% typecast here

first = 7 + offsetrec;
last = 10 + offsetrec;
bt_imag = buffer(first:last);
% typecast here

first = 11 + offsetrec;
last = 20 + offsetrec;
info = buffer(first:last);
% typecast here

first = 21 + offsetrec;
last = 24 + offsetrec;
snapshot_ID = buffer(first:last);
% typecast here

first = 25 + offsetrec;
last = 28 + offsetrec;
footprint = buffer(first:last);
% typecast here

% Split the arrays in chunks using values stored in grid_num_bt(...)
% ...

% restore colon mode
mcolonops('c');

% Bruno
0
Reply Bruno 1/22/2011 8:30:05 PM

Sorry for the typo, the first two commands should be:

pos = [0 cumsum(28*grid_num_bt+19)];
ref_pos = pos(1:end-1)+19;

Bruno
0
Reply Bruno 1/22/2011 8:43:03 PM

Sorry, it seems MATLAB jit-accelerator/parser takes over mcolon. Let's start again with a longer code, but it should work now:

%%
grid_num_bt = reshape(grid_num_bt, 1, []);

pos = [0 cumsum(28*grid_num_bt+19)];
ref_pos = pos(1:end-1)+19;

% Use mcolon operator, don't forget to MEX -setup 
% and call mcolon_install() befor using it
mcolonops('f');

first = ref_pos;
last = first + (grid_num_bt-1)*28;
offsetrec = first:28:last;

first = 1 + offsetrec;
last = 2 + offsetrec;
iflags = first:last;
flags = buffer(iflags);
% typecast here

first = 3 + offsetrec;
last = 6 + offsetrec;
ibt_real = first:last;
bt_real = buffer(ibt_real);
% typecast here

first = 7 + offsetrec;
last = 10 + offsetrec;
ibt_imag = first:last;
bt_imag = buffer(ibt_imag);
% typecast here

first = 11 + offsetrec;
last = 20 + offsetrec;
iinfo = first:last;
info = buffer(iinfo);
% typecast here

first = 21 + offsetrec;
last = 24 + offsetrec;
isnapshot_ID = first:last;
snapshot_ID = buffer(isnapshot_ID);
% typecast here

first = 25 + offsetrec;
last = 28 + offsetrec;
ifootprint = first:last;
footprint = buffer(ifootprint);
% typecast here

% Split the arrays in chunks using size values stored in grid_num_bt(...)
% ...

% restore colon mode
mcolonops('c');

% Bruno
0
Reply Bruno 1/22/2011 9:06:03 PM

Hi Bruno,

Many thanks! Your proposal reduces the parsing to less than 20 sec.

Thanks.

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ihfgrr$4h2$1@fred.mathworks.com>...
> Sorry, it seems MATLAB jit-accelerator/parser takes over mcolon. Let's start again with a longer code, but it should work now:
> 
> %%
> grid_num_bt = reshape(grid_num_bt, 1, []);
> 
> pos = [0 cumsum(28*grid_num_bt+19)];
> ref_pos = pos(1:end-1)+19;
> 
> % Use mcolon operator, don't forget to MEX -setup 
> % and call mcolon_install() befor using it
> mcolonops('f');
> 
> first = ref_pos;
> last = first + (grid_num_bt-1)*28;
> offsetrec = first:28:last;
> 
> first = 1 + offsetrec;
> last = 2 + offsetrec;
> iflags = first:last;
> flags = buffer(iflags);
> % typecast here
> 
> first = 3 + offsetrec;
> last = 6 + offsetrec;
> ibt_real = first:last;
> bt_real = buffer(ibt_real);
> % typecast here
> 
> first = 7 + offsetrec;
> last = 10 + offsetrec;
> ibt_imag = first:last;
> bt_imag = buffer(ibt_imag);
> % typecast here
> 
> first = 11 + offsetrec;
> last = 20 + offsetrec;
> iinfo = first:last;
> info = buffer(iinfo);
> % typecast here
> 
> first = 21 + offsetrec;
> last = 24 + offsetrec;
> isnapshot_ID = first:last;
> snapshot_ID = buffer(isnapshot_ID);
> % typecast here
> 
> first = 25 + offsetrec;
> last = 28 + offsetrec;
> ifootprint = first:last;
> footprint = buffer(ifootprint);
> % typecast here
> 
> % Split the arrays in chunks using size values stored in grid_num_bt(...)
> % ...
> 
> % restore colon mode
> mcolonops('c');
> 
> % Bruno
0
Reply Fernando 1/22/2011 11:33:03 PM

Just thought I'd chime in to say thanks for this exhaustive thread. It allowed me to dramatically speed up my own parsing routing, which was similar to Fernando's in that each message was of arbitrary length, to be determined by reading the header.

"Fernando" wrote in message <ihfpff$fsn$1@fred.mathworks.com>...
> Hi Bruno,
> 
> Many thanks! Your proposal reduces the parsing to less than 20 sec.
> 
> Thanks.
0
Reply k3nn.s3b3sta (3) 8/15/2012 10:30:22 PM

30 Replies
330 Views

(page loaded in 0.196 seconds)

Similiar Articles:


















7/23/2012 9:57:17 AM


Reply: