f



Class Design for ASCII Input Tables

I been tasked to port an old scientific program to C++.  I am new to C++ an=
d OOP programming and could use some help in designing a set of classes to =
handle the input tables.  The program data is provided via a set of about 1=
00 ASCII input data tables (not to be confused with database tables). The p=
rogram can handle multiple versions of each table (the exact number of each=
 table is provided).  Each table has data that has no relationship to any o=
ther table.  Each table has a fixed maximum number of independent variables=
 (1, 2, or 3) although many tables can be provided at less than the maximum=
.. For example, a 3-D table can be input as a 2-D table if one parameter is =
not available. The format of each table is similar but there will be a need=
 to handle special situations.  The class(s) will have methods to allocate =
the array of structures to hold the number of each table, method to allocat=
e the array to hold the data of a specific table (one of the structure memb=
ers), method to read and write the tables and final a method for table look=
up.

How many classes do you think is required to handle the input?  Any other t=
houghts or suggestions on the class design would really be appreciated=E2=
=80=A6.
0
sgihal56
12/18/2016 8:12:11 PM
comp.lang.c++ 49423 articles. 7 followers. Post Follow

10 Replies
586 Views

Similar Articles

[PageSpeed] 16

On 18.12.2016 21:12, sgihal56@gmail.com wrote:
> I been tasked to port an old scientific program to C++.

Then it matters what language it was written in originally.


> I am new to C++ and OOP programming and could use some help in
> designing a set of classes to handle the input tables.

The C++ standard library lacks a 2D array class and  you need that.

There's one in the Boost library but as I recall it's designed for speed 
rather than safety.

Oh look, there a simple one in the SO C++ documentation!

<url: 
http://stackoverflow.com/documentation/c%2b%2b/3017/arrays/10246/a-dynamic-size-matrix-using-stdvector-for-storage#t=201612182117279263232>

[code]
// A dynamic size matrix using std::vector for storage.

//--------------------------------------------- Machinery:
#include <algorithm>        // std::copy
#include <assert.h>         // assert
#include <initializer_list> // std::initializer_list
#include <vector>           // std::vector
#include <stddef.h>         // ptrdiff_t

namespace my {
     using Size = ptrdiff_t;
     using std::initializer_list;
     using std::vector;

     template< class Item >
     class Matrix
     {
     private:
         vector<Item>    items_;
         Size            n_cols_;

         auto index_for( Size const x, Size const y ) const
             -> Size
         { return y*n_cols_ + x; }

     public:
         auto n_rows() const -> Size { return items_.size()/n_cols_; }
         auto n_cols() const -> Size { return n_cols_; }

         auto item( Size const x, Size const y )
             -> Item&
         { return items_[index_for(x, y)]; }

         auto item( Size const x, Size const y ) const
             -> Item const&
         { return items_[index_for(x, y)]; }

         Matrix(): n_cols_( 0 ) {}

         Matrix( Size const n_cols, Size const n_rows )
             : items_( n_cols*n_rows )
             , n_cols_( n_cols )
         {}

         Matrix( initializer_list< initializer_list<Item> > const& values )
             : items_()
             , n_cols_( values.size() == 0? 0 : values.begin()->size() )
         {
             for( auto const& row : values )
             {
                 assert( Size( row.size() ) == n_cols_ );
                 items_.insert( items_.end(), row.begin(), row.end() );
             }
         }
     };
}  // namespace my

//--------------------------------------------- Usage:
using my::Matrix;

auto some_matrix()
     -> Matrix<int>
{
     return
     {
         {  1,  2,  3,  4,  5,  6,  7 },
         {  8,  9, 10, 11, 12, 13, 14 },
         { 15, 16, 17, 18, 19, 20, 21 }
     };
}

#include <iostream>
#include <iomanip>
using namespace std;
auto main() -> int
{
     Matrix<int> const m = some_matrix();
     assert( m.n_cols() == 7 );
     assert( m.n_rows() == 3 );
     for( int y = 0, y_end = m.n_rows(); y < y_end; ++y )
     {
         for( int x = 0, x_end = m.n_cols(); x < x_end; ++x )
         {
             cout << setw( 4 ) << m.item( x, y );        // ← Note: not 
`m[y][x]`!
         }
         cout << '\n';
     }
}
[/code]



> The program data is provided via a set of about 100 ASCII input data
> tables (not to be confused with database tables). The program can
> handle multiple versions of each table (the exact number of each
> table is provided).  Each table has data that has no relationship to
> any other table.  Each table has a fixed maximum number of
> independent variables (1, 2, or 3) although many tables can be
> provided at less than the maximum. For example, a 3-D table can be
> input as a 2-D table if one parameter is not available. The format of
> each table is similar but there will be a need to handle special
> situations.  The class(s) will have methods to allocate the array of
> structures to hold the number of each table, method to allocate the
> array to hold the data of a specific table (one of the structure
> members), method to read and write the tables and final a method for
> table lookup.
>
> How many classes do you think is required to handle the input?

One.


> Any other thoughts or suggestions on the class design would really be
> appreciated….

For ordinary main processor execution it's a good idea to use 
`std::vector` to handle the memory management for you.

Depending on what you're doing you might find `std::valarray` useful. 
It's a 1990's design for doing operations possibly in parallel on all 
items of an array or arrays. However, I have yet to see it used and I 
have not used it myself (it's a bit complex here and there), and I don't 
think that's the way to go if you want to leverage GPU processing.

I guess some others will chime in with suggestions about what to use for 
GPU processing, which you probably will want to use.


Cheers & hth.,

- Alf
0
Alf
12/18/2016 9:24:44 PM
On Sun, 18 Dec 2016 12:12:11 -0800 (PST), sgihal56@gmail.com wrote:

>I been tasked to port an old scientific program to C++.  I am new to C++=
 and OOP programming and could use some help in designing a set of =
classes to handle the input tables.  The program data is provided via a =
set of about 100 ASCII input data tables (not to be confused with =
database tables). The program can handle multiple versions of each table =
(the exact number of each table is provided).  Each table has data that =
has no relationship to any other table.  Each table has a fixed maximum =
number of independent variables (1, 2, or 3) although many tables can be =
provided at less than the maximum. For example, a 3-D table can be input =
as a 2-D table if one parameter is not available. The format of each =
table is similar but there will be a need to handle special situations.  =
The class(s) will have methods to allocate the array of structures to =
hold the number of each table, method to allocate the array to hold the =
data of a specific table (one of the structure members), method to read
>and write the tables and final a method for table lookup.
>
>How many classes do you think is required to handle the input?  Any =
other thoughts or suggestions on the class design would really be =
appreciated=85.

Can you give an example of what you're talking about? If you were
going to write a detailed statement of the problem to be solved, what
would that look like?

Is the old program still running? What language does it use? Can you
post the source or, if it's too long, a link to the source?

Louis
0
Louis
12/18/2016 9:24:46 PM
The original program was written in Fortran 66 and is still operational. Ma=
nagement has directed us to start migrating our old codes to C++. The sourc=
e code is proprietary (sorry). All tables are stored in single dimensional =
arrays (common for old code) so a 2D array is not necessary. There are sepa=
rate table lookup functions for single, double and triple independent varia=
bles. =20

Alf...how would I design a single class to handle 1, 2, and 3 independent v=
ariables? Would the class contain code specific to each number of independe=
nt variables?  Would I pass the table number (used to identify the tables) =
so the class can determine how many independent variables are required?

Thanks again for the help.
0
sgihal56
12/18/2016 10:13:30 PM
On 18.12.2016 22:12, sgihal56@gmail.com wrote:
> I been tasked to port an old scientific program to C++.  I am new to C++ and OOP programming and could use some help in designing a set of classes to handle the input tables.  The program data is provided via a set of about 100 ASCII input data tables (not to be confused with database tables). The program can handle multiple versions of each table (the exact number of each table is provided).  Each table has data that has no relationship to any other table.  Each table has a fixed maximum number of independent variables (1, 2, or 3) although many tables can be provided at less than the maximum. For example, a 3-D table can be input as a 2-D table if one parameter is not available. The format of each table is similar but there will be a need to handle special situations.  The class(s) will have methods to allocate the array of structures to hold the number of each table, method to allocate the array to hold the data of a specific table (one of the structure members), method to read
 and write the tables and final a method for table lookup.
>
> How many classes do you think is required to handle the input?  Any other thoughts or suggestions on the class design would really be appreciated….
>

The first question is why this program needs to be ported to C++? If it 
is just for not having to deal with multiple compilers, then find and 
use an automatic converter from the original language the C, then rename 
the resulting files from .c to .cpp, and voila, you are done!

I see another misconception in your post, apparently you think that C++ 
and OOP are somehow synonymous. They are not, C++ supports OOP indeed, 
but there are many other methodologies it supports. It might well be 
that there appear a couple of classes in a decent port of an old 
library, but it is not given that the classes would play a central role 
there.

0
Paavo
12/18/2016 10:17:08 PM
Paavo, 

Converters tend to generate horrible code.  In addition to porting the code to C++, we also plan on a major update to the program overall architecture to make it much easier to maintain and modify.

What would be nice is to see a UML diagram of what a table class would contain. At this point I am just look for a top level class overview to see how it might work....Thanks


0
sgihal56
12/18/2016 10:29:36 PM
On Sun, 18 Dec 2016 14:13:30 -0800 (PST), sgihal56@gmail.com wrote:

>The original program was written in Fortran 66 and is still operational.=
 Management has directed us to start migrating our old codes to C++. The =
source code is proprietary (sorry). All tables are stored in single =
dimensional arrays (common for old code) so a 2D array is not necessary. =
There are separate table lookup functions for single, double and triple =
independent variables. =20
>
>Alf...how would I design a single class to handle 1, 2, and 3 =
independent variables? Would the class contain code specific to each =
number of independent variables?  Would I pass the table number (used to =
identify the tables) so the class can determine how many independent =
variables are required?
>
>Thanks again for the help.

Are you talking about table lookup? Is a table with three independent
variables a way to represent a function f(x, y, z), with each row of
the table containing values for x, y, z and f, and do you want to
search the table for x, y and z and then return f?

Or do you need, say, linear interpolation, where the table gives you
f(1, 2, 3) and f(2, 2, 3) and you need to compute f(1.5, 2, 3)?

Can you give any examples of what the program does? Or would that risk
revealing proprietary information?

You might try converting the program by hand to a C-like subset of
C++, getting it to work and verifying it by comparing test output to
the original, and then refactoring as needed.

Louis
0
Louis
12/18/2016 10:49:00 PM
On 18.12.2016 23:13, sgihal56@gmail.com wrote:
> The original program was written in Fortran 66 and is still
> operational. Management has directed us to start migrating our old
> codes to C++. The source code is proprietary (sorry). All tables are
> stored in single dimensional arrays (common for old code) so a 2D
> array is not necessary.

Oh sorry, I read “tables” as “two dimensional collections of numbers”.

The “one” class comment was for representing the input in that situation.


> There are separate table lookup functions for
> single, double and triple independent variables.
>
> Alf...how would I design a single class to handle 1, 2, and 3
> independent variables?

Now I gather we're talking processing, not representing the input.

“Handle” is a verb, and classes are generally not doers. In general a 
class collects related operations on some state. So wherever the current 
code has umpteen functions that all operate on some particular state 
passed as a first or last argument, that is a candidate for a class with 
that state as state, and those functions as member functions.

A class can simplify things by restricting access. In particular, if 
there is a general assumption about the state that the current code 
needs to check in many places because the state could have been modified 
in a bad way, then a class lets you restrict modification to only good 
ways. If the common assumption (called a class invariant) is established 
by every constructor, and is maintained by every public operation, then 
it's guaranteed and does not need to be checked.

A class can also simplify things by just reducing clutter.

For example, a point p is less clutter than three individual variables 
x, y and z, and it can support clean notation for e.g. calculating 
distances, moving a point a given distance, and so on.


> Would the class contain code specific to each
> number of independent variables?  Would I pass the table number (used
> to identify the tables) so the class can determine how many
> independent variables are required?

I think much more information is needed for me and others to get a 
sufficiently clear picture of things to make concrete recommendations.

But on the level of general advice, what I remember from my consulting 
days about what was most important and difficult to make others aware 
of, you need to think seriously and up front about ERROR HANDLING.

In modern C++ this is done in two main ways:

• Logic errors such as precondition breaches (breach of contract) are 
generally detected by assertions, using e.g. `assert` and 
`static_assert`. Essentially, where this is done one prefers a crash to 
a possibly more orderly processing, because the process state might be 
completely fouled up.

• Failures, such as failure to obtain a resource, is reported to calling 
code via exceptions (the `throw` statement), and cleanup where an 
exception passes through some code is done automatically by object 
destructors. The C++ cleanup strategy is called RAII. Do read up on it.

Due to the way failures are handled, with exceptions, dynamic memory 
management is generally best delegated to standard containers such as 
`std::vector`, and/or to smart pointers such as `std::unique_ptr` and 
`std::shared_ptr`. Then the container or smart pointer owns the memory, 
and is responsible for deallocation, reallocation etc. Your code should 
have no raw pointers that /own/ memory.


Cheers!,

- Alf
0
Alf
12/18/2016 10:51:39 PM
Am 18.12.16 um 23:29 schrieb sgihal56@gmail.com:
> Converters tend to generate horrible code.

yes and no - Fortran 66 is so simple that it can be converted to C 
almost mechanically - f2c from netlib is not that bad. The code is 
typically still horrible mostly because it was horrible before (6 char 
identifiers and similar nonsense)

> In addition to porting
> the code to C++, we also plan on a major update to the program
> overall architecture to make it much easier to maintain and modify.

That makes sense.

> What would be nice is to see a UML diagram of what a table class
> would contain. At this point I am just look for a top level class
> overview to see how it might work....

Nobody can do that without knowing the program. Only you can judge what 
classes or containers are actually needed. Maybe it is possible to 
separate the real algorithm from the I/O stuff. Then try to strip down 
the original program until it only does computations from one array to 
another array. Then convert to C (can be done by a converter) and write 
the I/O stuff afresh.


	Christian

0
Christian
12/19/2016 2:32:50 PM
[Please do not mail me a copy of your followup]

1. Create an automated acceptance test suite around the existing system
2. Port the existing system before you add new features
3. Use TDD and the automated acceptance tests from 1) on your
   replacement system

TL;DR:

When porting legacy code from one language to another, you are
essentially talking about a complete rewrite.  Unfortunately, unless
you do this in an automated way, you are likely to introduce many bugs
along the way.

So how do we guard against such bugs, minimize them, and find them as
soon as possible?

First, start by writing an automated test suite around your current
system.  Think of these as integration tests, regression tests, or
acceptance tests, but they aren't unit tests.  Try to create tests
that surround the major subsystems of your code base.  A great tool
for expressing acceptance tests is FitNesse <http://fitnesse.org>,
which gives you a nice editable wiki as a way of expressing your tests
as tables and allows you to write any additional explanatory notes or
links to images or other files directly in the wiki pages describing
your tests.  These keeps the test and all the other information about
the subsystem together.

This automated suite of tests can be run against the existing system
and your replacement system to identify discrepencies between the two.
Given that your original system is FORTRAN 66, it most likely operates
on input files and creates output files.  It is very easy to wrap a
test harness around this by creating the input files from the FitNesse
wiki table data, invoke the system under test, and then read the
output files for comparison against the expected results in the
FitNesse wiki table.

Second, try to change only one thing at once.  In your followup posts
you described how the desire to port the code was motivated by the
need to more easily extend the existing system with new features.
Don't try to add new features at the same time as you are trying to
capture the existing behavior.  Your first goal should be to create
new code that can replace the existing code with no change in
observable behavior.

How far "deep" you want the existing behavior preserved depends on the
organization of your existing system.  It may be useful to literally
replace the exsisting FORTRAN 66 subroutines with C++ functions/procedures
that are declared 'extern "C"' so that they can be linked into the
FORTRAN code.  However, it could be considerably easier to replace
things one subsystem at a time instead of one function at a time.
Perhaps your FORTRAN code consists of multiple executables which each
do a very specific thing.  Each executable can be thought of as a
subsystem to be converted.  If your FORTRAN code is a single, large,
monolithic executable, then the subsystem boundaries are going to be
inside that executable.  Look for groups of functions that work
together to identify subsystems.  It is highly unlikely that every
subroutine interacts with every other subroutine.  It is more likely
they are connected in clusters.  If you don't know the code well
enough to identify the clusters, use a source code analyzer to
identify the interactions.  FORTRAN COMMON blocks are a way that
functions are coupled that is unique to FORTRAN, so don't forget to
check for those.

Third, write your new code using test-driven development.  Make your
new code easy to unit test by writing the test first and then satisfy
the test with new code in your replacement module.  If you're using
FitNesse for the regression acceptance tests, have a way to run those
against your new system to know that your new system is reproducing
the behavior of the old system.  If you kept your subsystem boundaries
relatively high level (e.g. not at the level of individual FORTRAN
subroutines), then you will have more freedom in the C++ that you
write in your replacement subsystem.
-- 
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
            The Terminals Wiki <http://terminals-wiki.org>
     The Computer Graphics Museum <http://computergraphicsmuseum.org>
  Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
0
legalize
12/19/2016 7:08:10 PM
sgihal56@gmail.com writes:

> I been tasked to port an old scientific program to C++.  I am new to
> C++ and OOP programming and could use some help in designing a set of
> classes to handle the input tables.  The program data is provided via
> a set of about 100 ASCII input data tables (not to be confused with
> database tables).  The program can handle multiple versions of each
> table (the exact number of each table is provided).  Each table has
> data that has no relationship to any other table.  Each table has a
> fixed maximum number of independent variables (1, 2, or 3) although
> many tables can be provided at less than the maximum.  For example, a
> 3-D table can be input as a 2-D table if one parameter is not
> available.  The format of each table is similar but there will be a
> need to handle special situations.  The class(s) will have methods to
> allocate the array of structures to hold the number of each table,
> method to allocate the array to hold the data of a specific table
> (one of the structure members), method to read and write the tables
> and final a method for table lookup.
>
> How many classes do you think is required to handle the input?  Any
> other thoughts or suggestions on the class design would really be
> appreciated?.

Assuming I understand you correctly, what you want is something
like an array, or "parallel arrays", indexed by something numeric
or close to numeric (integers?), out of which you can extract
three (or up to three) independent values for each index.  Is
this a fair statement of your problem?

If it is, I would start with something like this (disclaimer: not
tested!):

  #include <vector>

  struct xyz_table {
    struct xyz { double x, y, z; };
    std::vector <xyz> stuff;

    void add( double x ){  add( x, 0, 0 );  }
    void add( double x, double y ){  add( x, y, 0 );  }
    void add( double x, double y, double z ){  stuff.push_back( {x,y,z} );  }

    xyz& operator[]( size_t n ){  return stuff[n];  }
  };

  void
  use_xyz_table_example(){
    xyz_table foo;

    foo.add( 1, 2, 3 );
    foo.add( 2, 3, 4 );
    foo.add( 3, 4, 5 );
    foo.add( 4, 5, 6 );
    foo.add( 5, 6, 7 );
    foo.add( 6, 7, 8 );

    foo[3].x = 7;
    foo[5].z = foo[2].y;
  }

and not worry about any extra space taken up by "unused" values,
nor about flagging access to unavailable "coordinates".

Of course, if those things matter, it isn't hard to have two
more classes (disclaimer: not compiled):


  struct xy_table {
    struct xy { double x, y; };
    std::vector <xy> stuff;

    void add( double x, double y ){  stuff.push_back( {x,y} );  }

    xy& operator[]( size_t n ){  return stuff[n];  }
  };


  struct x_table {
    struct x { double x, y; };
    std::vector <x> stuff;

    void add( double x ){  stuff.push_back( {x} );  }

    x& operator[]( size_t n ){  return stuff[n];  }
  };


and use x_table, xy_table, or xyz_table, as appropriate.

If the data type being stored is some type other than double,
that can be made (one or more) template parameters.  (And
hopefully you know enough to find out how to do that so I don't
need to explain it...)
0
Tim
12/23/2016 6:24:16 AM
Reply: