f



Piece of code to read and process data from a file, separating chars and ints

Hello!

I will be grateful for your reviews and comments on this first piece I have=
=20
written. This is the first part of the bigger program, which I have mention=
ed=20
in other posts.=20
The code below reads each line from a csv file, parses character and numeri=
cal
values, and stores those in two arrays - array_names and array_values.=20

Before looking at the code, please, read my explanations/questions/wrongdoi=
ngs;
this list is very important and contains lots of caveats.

(1) I have not yet implemented what Barry and others have suggested on=20
how to work with csv files in case they have inconvenient format, i.e. file=
s,
in which lines end with some unexpected blank spaces, commas, etc, like
the ones I come up with every time I save excel file as csv. I will do my b=
est
to implement these features.

(2) Attn. Richard: I have disobeyed you, Sir ) My apologies. I have impleme=
nted
the dynamic memory allocation to follow one of the headsups suggested by Ke=
ith.
This is the only one feature that is =E2=80=9Csort of=E2=80=9D more advance=
d, but I do believe it is=20
necessary for this program, not only because the lines might be arbitrarily=
 long,=20
but because it is a great practice, and it is not such a fancy feature afte=
r all.

As to the memory allocation and reallocation: I have quite a few test versi=
ons,
and in the one that precedes this code, I do not break mallocing and reallo=
cing
into separate functions - all calls to malloc and realloc are within three =
main
functions. I know that it is always better to initialize a new pointer with=
 malloc
or realloc, and, if the allocation was successful, assign the value of that=
 new
pointer to the original pointer. I do that in the previous code, but I fail=
ed=20
to implement this feature once I have separated all mallocing and reallocin=
g
into separate functions, even though I repeat the same code "word for word"=
,
except for changes required by function's arguments' types.=20
Later I will post this code (with new pointers inside memory functions)
to illustrate this point, and ask for help on this matter.

Memory leaks: I have problems re-installing Valgrind. But even without
it I know I have lots of memory leaks. I address one of the most important
ones within the program =E2=80=93 it is a comment written in upper case let=
ters
at the end of the function  int read_file_data(FILE *input_file),
and in this comment I suggest how to free unused memory allocated
for both arrays, names and values. Please, let me know if it
sounds reasonable.=20

(3) For now the program records all data into two arrays: array of chars an=
d=20
array of ints. I have put both into the scope intentionally because I have =
not=20
yet decided if I leave all the data in the form of arrays, or create a=20
structure, or a few structures. I will come back with this question here, o=
r=20
in another thread.
Therefore, please, do not yet comment on these arrays.=20

(4) As to strtol: I have tried to implement the advice Richard gave me,=20
i.e. to write a correct version of the code that uses strtol function, but
I failed and got same problems as with malloc and new pointers. More
on this feature in another post, which I have started few days ago.=20

(5) Please, help me with the style: some of my functions have rather
long arguments=E2=80=99 lists due to types and names used; is it possible t=
o=20
break such long lines into separate ones? Same for if statements.=20

(6) To shorten the appearance of the code for your convenience, I have
 removed lots of printf debugging I used while writing this code.=20

Thank you very much!
I hope you understand that I am just learning, and hence this piece
does require numerous improvements =E2=80=93 I will be happy to implement
some of them, and leave other ones till later.=20

INPUT for test:
Years,2011,2012,2013,2014,2015
Sales,1062,1252,1587,1934,2519
Cost of Goods Sold,654,814,1009,1190,1499
  Gross Profit,408,438,578,744,1020
"Selling, General, and Admin Exp",254,271,364,454,576
  Operating Income before Depr,154,167,214,290,444
Depreciation and Amortization,25,31,38,52,70
  Operating Profit,129,136,176,238,374
Interest Expense,4,3,3,1,4
Other Gains and Losses,0,7,10,0,-1
  Pretax Income,125,126,163,237,371
Income Tax Expense,55,52,65,92,141
  Net Income,70,74,98,145,230

OUTPUT:
(NAMES)
Years
Sales
Cost of Goods Sold
  Gross Profit
Selling, General, and Admin Exp
  Operating Income before Depr
Depreciation and Amortization
  Operating Profit
Interest Expense
Other Gains and Losses
  Pretax Income
Income Tax Expense
  Net Income
(VALUES)
2011 2012 2013 2014 2015 0=20
1062 1252 1587 1934 2519 0=20
654 814 1009 1190 1499 0=20
408 438 578 744 1020 0=20
254 271 364 454 576 0=20
154 167 214 290 444 0=20
25 31 38 52 70 0=20
129 136 176 238 374 0=20
4 3 3 1 4 0=20
0 7 10 0 -1 0=20
125 126 163 237 371 0=20
55 52 65 92 141 0=20
70 74 98 145 230 0=20
SUCCESS

CODE:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <ctype.h>
#include <string.h>

#define BUFFER_SIZE 10
#define ARRAY_LINES 5
#define ARRAY_COLUMNS 5
#define INCOME_STATEMENT "files/income_test.csv"

char **array_names =3D NULL;
int **array_values =3D NULL;

int read_file_data(FILE *input_file);
int fill_row_names(char *use_buffer, size_t *line_names, size_t *index, siz=
e_t *new_names_len);
int fill_row_values(char *use_buffer, size_t *line_values, size_t *index, s=
ize_t *values_index, size_t *new_values_len);
int allocate_arrays(void);
int chars_memalloc(char **buffer, size_t *buff_size);
int ints_memalloc(int **array, size_t *array_size);
int chars_realloc(char **buffer, size_t *buff_size);
int ints_realloc(int **buffer, size_t *buff_size);
int names_row_realloc(size_t *new_names_lines, size_t *new_names_len);
int values_row_realloc(size_t *new_values_lines, size_t *new_values_len);

int main(int argc, char *argv[])
{
    //Check for correct number of arguments
    if(argc > 2)
    {
        fputs("Sorry, the program processes only one file.\n", stderr);
        return EXIT_FAILURE;
    }
    //Determine a file to use
    const char *data_file =3D (argc =3D=3D 2)? argv[1] : INCOME_STATEMENT;
    int read_data;
   =20
    //Allocate memory for array_names and array_values
    if((allocate_arrays()) =3D=3D EXIT_FAILURE)
        return EXIT_FAILURE;
   =20
    //Open the file, read and store the data
    FILE *fp =3D fopen(data_file, "r");
    if ( fp )
    {
        if((read_data =3D read_file_data(fp)) =3D=3D EXIT_SUCCESS)
            puts("SUCCESS");
        else
        {
            puts("The function failed");
            return EXIT_FAILURE;
        }
    }
    //fopen() returned NULL
    else
    {
        perror(data_file);
        return EXIT_FAILURE;
    }
    fclose(fp);
    return 0;
}

/** Function to read and process the data
 from the file line by line
 **/
int read_file_data(FILE *input_file)
{
    size_t buff_size =3D BUFFER_SIZE;
    size_t buff_len =3D 0;
    size_t temp_buff_len =3D 0;
    size_t line_names =3D 0;
    size_t line_values =3D 0;
    size_t index;
    size_t new_names_lines =3D ARRAY_LINES;
    size_t new_names_len =3D ARRAY_COLUMNS;
    size_t new_values_lines =3D ARRAY_LINES;
    size_t new_values_len =3D ARRAY_COLUMNS;
   =20
    //DEBUG: USE VALUES_INDEX TO PRINT OUT ARRAY
    size_t values_index;
   =20
    char *buffer =3D NULL;
    //Allocate memory for the buffer to store a line
    if((chars_memalloc(&buffer, &buff_size)) =3D=3D EXIT_FAILURE)
        return EXIT_FAILURE;
   =20
    //Read each line from the file, store in the buffer
    while(fgets(buffer, buff_size, input_file) !=3D NULL)
    {
        //Track the length of the filled buffer
        buff_len =3D strlen(buffer);
        char *new_buffer =3D NULL;
        char *use_buffer =3D NULL;
        char *temp_buffer =3D NULL;
       =20
        //If the line has not been read entirely
        if(buffer[buff_len - 1] !=3D '\n')
        {
            //Allocate memory for the new_buffer
            if((chars_memalloc(&temp_buffer, &buff_size)) =3D=3D EXIT_FAILU=
RE)
                return EXIT_FAILURE;
           =20
            strcpy(temp_buffer, buffer);
            temp_buff_len =3D buff_len;
            *buffer =3D '\0';
           =20
            //Add memory to the buffer and
            //Continue reading the line until the newline
            while(temp_buffer[temp_buff_len - 1] !=3D '\n')
            {
                buff_size *=3D 2;
               =20
                //Re_allocate memory for the use_buffer
                if((chars_realloc(&temp_buffer, &buff_size)) =3D=3D EXIT_FA=
ILURE)
                    return EXIT_FAILURE;
               =20
                fgets(&temp_buffer[temp_buff_len], buff_size, input_file);
                temp_buff_len =3D strlen(temp_buffer);
            }
            new_buffer =3D temp_buffer;
            free(temp_buffer);
        }
       =20
        //Allocate memory for the use_buffer
        if((chars_memalloc(&use_buffer, &buff_size)) =3D=3D EXIT_FAILURE)
            return EXIT_FAILURE;
       =20
        use_buffer =3D (new_buffer =3D=3D NULL) ? buffer : new_buffer;
       =20
        index =3D 0;
        values_index =3D 0;
       =20
        /* Add memory to array_names if the number of rows
         has reached the allocated number */
        if(line_names =3D=3D (new_names_lines - 1))
        {
            new_names_lines *=3D 2;
           =20
            //Re_allocate memory for array_names
            if((names_row_realloc(&new_names_lines, &new_names_len)) =3D=3D=
 EXIT_FAILURE)
                return EXIT_FAILURE;
        }
       =20
        /* Store row names in array_names and column values
         in array_values */
        if((fill_row_names(use_buffer, &line_names, &index, &new_names_len)=
) =3D=3D EXIT_SUCCESS)
        {
            line_names++;
           =20
            /* Add memory to array_values if the number of rows
             has reached the allocated number */
            if(line_values =3D=3D (new_values_lines - 1))
            {
                new_values_lines *=3D 2;
                //Re_allocate memory for array_values
                if((values_row_realloc(&new_values_lines, &new_values_len))=
 =3D=3D EXIT_FAILURE)
                    return EXIT_FAILURE;
            }
            //Store numerical values in array_values
            if((fill_row_values(use_buffer, &line_values, &index, &values_i=
ndex, &new_values_len)) =3D=3D EXIT_SUCCESS)
                line_values++;
            else
            {
                printf("The function_values failed\n");
                return EXIT_FAILURE;
            }
        }
        else
        {
            printf("The function_names failed\n");
            return EXIT_FAILURE;
        }
    }
   =20
    /*******************/
    //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
    printf("(NAMES)\n");
    for(size_t i =3D 0; i < line_names; i++)
    {
        for (int j =3D 0; array_names[i][j] !=3D '\0'; j++)
            printf("%c", array_names[i][j]);
        putchar('\n');
    }
    /*******************/
   =20
    /*******************/
    //DEBUG: CHECK IF VALUES ARE FILLED CORRECTLY
    printf("(VALUES)\n");
    for(size_t i =3D 0; i < line_values; i++)
    {
        for (size_t j =3D 0; j < values_index; j++)
            printf("%i ", array_values[i][j]);
        putchar('\n');
    }
    /*******************/
    /*******
     METHOD TO FREE ADDITIONAL MEMORY ALLOCATED FOR
     EACH ARRAY (NAMES AND VALUES):
     CREATE TEMPORARY ARRAYS TO STORE ARRAY VALUES;
     ASSIGN VALUES OF BOTH NAMES & VALUES ARRAYS
     TO TEMPORARY ONES; FREE NAMES & VALUES ARRAYS;
     COPY CONTENTS OF BOTH ARRAY BACK FROM TEMPORARY
     ONES, SET TEMP ARRAYS TO NULL.
     *******/
   =20
    free(buffer);
   =20
    return EXIT_SUCCESS;
}

/** Function to parse row names **/
int fill_row_names(char *use_buffer, size_t *line_names, size_t *index, siz=
e_t *new_names_len)
{
    //Allocate memory for the row
    if((chars_memalloc(&array_names[*line_names], new_names_len)) =3D=3D EX=
IT_FAILURE)
        return EXIT_FAILURE;
   =20
    char *ptr_buffer =3D NULL;
    ptr_buffer =3D use_buffer;
    size_t names_column;
   =20
    //Parse row names from each line into 'array_names'
    for ( names_column =3D 0; ( !isdigit(*ptr_buffer));
         names_column++, ptr_buffer++, (*index)++ )
    {
        /* Add memory if the number of columns has reached
         the allocated number */
        if(names_column =3D=3D (*new_names_len - 1))
        {
            *new_names_len *=3D 2;
            //Re_allocate memory
            if((chars_realloc(&array_names[*line_names],new_names_len)) =3D=
=3D EXIT_FAILURE)
                return EXIT_FAILURE;
        }
        //Check if the line starts with double quotes
        if(*ptr_buffer =3D=3D '"')
        {
            //Skip double quotes
            ptr_buffer++;
            (*index)++;
            //Store every character between double quotes
            while(*ptr_buffer !=3D '"')
            {
                array_names[*line_names][names_column] =3D *ptr_buffer++;
                names_column++;
                (*index)++;
            }
            ptr_buffer++;
            (*index)++;
        }
        //End if the next character is comma followed by number
        if(*ptr_buffer =3D=3D ',' && isdigit(*(ptr_buffer + 1)))
        {
            array_names[*line_names][names_column] =3D '\0';
            ptr_buffer++;
            (*index)++;
            return EXIT_SUCCESS;
        }
        array_names[*line_names][names_column] =3D *ptr_buffer;
    }
   =20
    return EXIT_FAILURE;
}

/** Function to parse numerical characters,
 and convert into array of integers
 **/
int fill_row_values(char *use_buffer, size_t *line_values, size_t *index, s=
ize_t *values_index, size_t *new_values_len)
{
    //Allocate memory for the row in array_values
    if((ints_memalloc(&array_values[*line_values], new_values_len)) =3D=3D =
EXIT_FAILURE)
        return EXIT_FAILURE;
   =20
    char *ptr_buffer =3D NULL;
    ptr_buffer =3D &use_buffer[*index];
    size_t values_column;
   =20
    //Parse each line's comma-separated values into 'array_values'
    for ( values_column =3D 0; *ptr_buffer !=3D '\0';
         values_column++, ptr_buffer++, (*index)++)
    {
        /* Add memory if the number of columns has reached
         the allocated number */
        if(values_column =3D=3D (*new_values_len - 1))
        {
            *new_values_len *=3D 2;
            //Re_allocate memory
            if((ints_realloc(&array_values[*line_values],new_values_len)) =
=3D=3D EXIT_FAILURE)
                return EXIT_FAILURE;
        }
       =20
        array_values[*line_values][values_column] =3D
        (int)strtol(ptr_buffer, &ptr_buffer, 10);
        (*values_index)++;
    }
    putchar('\n');
    //Check if conversion was successful
    if(*ptr_buffer)
    {
        printf("Conversion failed; not converted string: %s\n",
               ptr_buffer);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to allocate memory for
 arrays names and values
 **/
int allocate_arrays(void)
{
    //Allocate memory for 2D array of row names
    array_names =3D malloc(ARRAY_LINES * sizeof(*array_names));
    //Allocate memory for 2D array of row values
    array_values =3D malloc(ARRAY_LINES * sizeof(*array_values));
   =20
    if(!array_names || !array_values)
    {
        fputs("Sorry, couldn't allocate memory for the array before reading=
 a file.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to allocate initial memory for
 all character arrays
 **/
int chars_memalloc(char **buffer, size_t *buff_size)
{
    *buffer =3D malloc(*buff_size * (sizeof **buffer));
    if(!(*buffer))
    {
        fputs("Sorry, buffer memory allocation failed.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to allocate initial memory for
 rows of numbers array
 **/
int ints_memalloc(int **array, size_t *array_size)
{
    *array =3D malloc(*array_size * (sizeof **array));
    if(!(*array))
    {
        fputs("Sorry, buffer memory allocation failed.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to re_allocate memory for all character arrays **/
int chars_realloc(char **buffer, size_t *buff_size)
{
    *buffer =3D realloc(*buffer, *buff_size * (sizeof **buffer));
   =20
    if(!(*buffer))
    {
        fputs("Sorry, buffer memory re_allocation failed.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to re_allocate memory for values array **/
int ints_realloc(int **buffer, size_t *buff_size)
{
    *buffer =3D realloc(*buffer, *buff_size * (sizeof **buffer));
   =20
    if(!(*buffer))
    {
        fputs("Sorry, buffer memory re_allocation failed.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to re_allocate memory for rows
 of char array
 **/
int names_row_realloc(size_t *new_names_lines, size_t *new_names_len)
{
    array_names =3D realloc(array_names, (*new_names_lines) * (*new_names_l=
en));
   =20
    if(!(array_names))
    {
        fputs("Sorry, array_names memory re_allocation failed.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

/** Function to re_allocate memory for rows
 of number array
 **/
int values_row_realloc(size_t *new_values_lines, size_t *new_values_len)
{
    array_values =3D realloc(array_values, (*new_values_lines) * (*new_valu=
es_len));
   =20
    if(!(array_values))
    {
        fputs("Sorry, array_names memory re_allocation failed.\n", stderr);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

Thank you!
0
Alla
12/23/2016 1:02:33 PM
comp.lang.c 30656 articles. 5 followers. spinoza1111 (3246) is leader. Post Follow

11 Replies
916 Views

Similar Articles

[PageSpeed] 7

On 23/12/16 13:02, Alla _ wrote:
> Hello!
>
> I will be grateful for your reviews and comments

Are you sure about that? You seem to have ignored a lot of previous 
reviews and comments, by your own admission.

> (1) I have not yet implemented what Barry and others have suggested on
> how to work with csv files in case they have inconvenient format, i.e. files,
> in which lines end with some unexpected blank spaces, commas, etc, like
> the ones I come up with every time I save excel file as csv. I will do my best
> to implement these features.

That doesn't bother me, although it might bother others. It's just an 
assignment. Assignments generally don't expect you to bullet-proof your 
code to the extent that would be usual in the "real world" of commercial 
production code. The important thing is to be aware of the issues that 
can arise, so that you will know how to deal with them when you have to. 
In the meantime, if your program does basic input checking, I think 
that's sufficient. The level of robustness that is sometimes appropriate 
in a commercial environment would just obscure the lessons that you are 
trying to learn about C programming. That /doesn't/ mean you should be 
off-hand about such matters, but you can cut yourself a /little/ slack.

>
> (2) Attn. Richard: I have disobeyed you, Sir )

Now /that/ bothers me.

> My apologies.

Oh, in that case you're forgiven. Maybe. Let's wait and see.

> I have implemented
> the dynamic memory allocation to follow one of the headsups suggested by Keith.
> This is the only one feature that is “sort of” more advanced, but I do believe it is
> necessary for this program, not only because the lines might be arbitrarily long,
> but because it is a great practice, and it is not such a fancy feature after all.

Well, that's fair enough if it works. But if it doesn't work, there's 
going to be trouble.

> As to the memory allocation and reallocation: I have quite a few test versions,
> and in the one that precedes this code, I do not break mallocing and reallocing
> into separate functions - all calls to malloc and realloc are within three main
> functions. I know that it is always better to initialize a new pointer with malloc
> or realloc, and, if the allocation was successful, assign the value of that new
> pointer to the original pointer.

For realloc, yes. The reason is simple: if the re-allocation *fails*, 
you still have a pointer to the original data, and sometimes that can be 
good enough --- it may be possible, for example, to adopt an alternative 
algorithm that doesn't need the extra memory, perhaps at the expense of 
speed. In a student exercise, I wouldn't worry too much about it, but 
don't get into the habit of taking short-cuts like this.

> Memory leaks: I have problems re-installing Valgrind. But even without
> it I know I have lots of memory leaks.

That's because you're going too quickly. Because you continually ignore 
advice to slow down, you are well on the way to learning how to write 
bad programs quickly. It is a well-trodden path.

> (3) For now the program records all data into two arrays: array of chars and
> array of ints. I have put both into the scope intentionally because I have not
> yet decided if I leave all the data in the form of arrays, or create a
> structure, or a few structures. I will come back with this question here, or
> in another thread.
> Therefore, please, do not yet comment on these arrays.

This is a direct result of not designing the program before you write 
it, and that's one of the many bad consequences of rushing ahead.

> (5) Please, help me with the style: some of my functions have rather
> long arguments’ lists due to types and names used; is it possible to
> break such long lines into separate ones? Same for if statements.

Functional decomposition is a skill. It's easy to do, but it's harder to 
do well. A good, clean, simple design will typically lead to short 
argument lists.

> OUTPUT:
> (NAMES)
> Years
<snip lots of result data>
> 70 74 98 145 230 0
> SUCCESS

That's not what I get. I get:

$ ./alla test6.csv
*** Error in `./alla': realloc(): invalid next size: 0x0000000000bc4a80 ***
Aborted (core dumped)

The message "invalid next size", while no doubt accurate, is a little 
misleading, but it's generally a clue that you've completely screwed up 
your memory at some point in the program before this symptom occurs.

No doubt someone else here will have the time to spend tracking down the 
exact problem for you. But my own advice would be to scrap it and start 
again, this time paying attention to all the advice you've been 
studiously ignoring.

MORE HASTE, LESS SPEED.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/23/2016 2:28:00 PM
On Friday, December 23, 2016 at 5:28:08 PM UTC+3, Richard Heathfield wrote:
> On 23/12/16 13:02, Alla _ wrote:
> > Hello!

<snip>
> > OUTPUT:
> > (NAMES)
> > Years
> <snip lots of result data>
> > 70 74 98 145 230 0
> > SUCCESS
> 
> That's not what I get. I get:
> 
> $ ./alla test6.csv
> *** Error in `./alla': realloc(): invalid next size: 0x0000000000bc4a80 ***
> Aborted (core dumped)
> 
> The message "invalid next size", while no doubt accurate, is a little 
> misleading, but it's generally a clue that you've completely screwed up 
> your memory at some point in the program before this symptom occurs.
> 
> No doubt someone else here will have the time to spend tracking down the 
> exact problem for you. But my own advice would be to scrap it and start 
> again, this time paying attention to all the advice you've been 
> studiously ignoring.
> 
Thank you very much for your comments. 
I have not ignored any advices, this is an unfair accusation. The only 
extra feature I use is dynamic memory allocation. But it is an important
one for my program. I have chosen a fairly easy program for my project,
 the one that does help me learn C. And I am not rushing ahead. 
Of course, I hope for help here: I have worked a lot on this piece,
constructing it out of tiny pieces, using lots of debugging (deleted
those because they produce lots of visual noise), etc. And I have learned
a lot during this work, but I have reached the point at which I need 
help, if it is possible. 
0
Alla
12/23/2016 2:39:54 PM
Alla _ <modelling.data@gmail.com> writes:
<snip>
> ... The only 
> extra feature I use is dynamic memory allocation. But it is an important
> one for my program.

Why?  All the examples show relatively small input files and the rows
appear to particular accounting items not general data.  The columns are
years (so the number of them is logically unbounded), but how many will
you actually need?

An important skill in design is making it good enough.  Would, say, a
couple of centuries years of accounting data be enough?  And are you
likely to have more than a few hundred accounting items?  I would do a
bounded data version first and add dynamic allocation later if I wanted
to learn about it.

> I have chosen a fairly easy program for my project,

That depends how much you decide to do.  CSV files can have arbitrarily
long lines, fields can span lines, numbers can contain commas to mean
more than one thing, text can have an arbitrary character encoding and
so on.  So you decide to simplify -- ignore long lines, multi-line
fields and assume the data are all in some simple format.  You keep
simplifying until you have written some version of the program.  You
don't yet have a version of the program, so you have not yet simplified
enough!

All programmers do this when they are able to.  When they are prevented
from doing this because of some rigid set of requirements laid down
before hand, the effect is often a disaster.  At the very least, the
result will be expensive.

As you know, I am not against some forms of "rushing ahead".  By all
means work on five programs at once, or develop a web app along side this
C program.  That's all good.  But for any one task you need to simplify
until it's manageable and then add features one by one.  Here I mean
logically one by one -- you can be working on adding three features at
once, but that mist be in separate versions of the program that get
merged when all three are working.

<snip>
-- 
Ben.
0
Ben
12/23/2016 4:29:11 PM
On Friday, December 23, 2016 at 7:29:20 PM UTC+3, Ben Bacarisse wrote:
> Alla _ <modelling.data@gmail.com> writes:
> <snip>
> > ... The only 
> > extra feature I use is dynamic memory allocation. But it is an important
> > one for my program.
> 
> Why?  All the examples show relatively small input files and the rows
> appear to particular accounting items not general data.  The columns are
> years (so the number of them is logically unbounded), but how many will
> you actually need?
> 
> An important skill in design is making it good enough.  Would, say, a
> couple of centuries years of accounting data be enough?  And are you
> likely to have more than a few hundred accounting items?  I would do a
> bounded data version first and add dynamic allocation later if I wanted
> to learn about it.
> 
> > I have chosen a fairly easy program for my project,
> 
> That depends how much you decide to do.  CSV files can have arbitrarily
> long lines, fields can span lines, numbers can contain commas to mean
> more than one thing, text can have an arbitrary character encoding and
> so on.  So you decide to simplify -- ignore long lines, multi-line
> fields and assume the data are all in some simple format.  You keep
> simplifying until you have written some version of the program.  You
> don't yet have a version of the program, so you have not yet simplified
> enough!
> 
> All programmers do this when they are able to.  When they are prevented
> from doing this because of some rigid set of requirements laid down
> before hand, the effect is often a disaster.  At the very least, the
> result will be expensive.
> 
> As you know, I am not against some forms of "rushing ahead".  By all
> means work on five programs at once, or develop a web app along side this
> C program.  That's all good.  But for any one task you need to simplify
> until it's manageable and then add features one by one.  Here I mean
> logically one by one -- you can be working on adding three features at
> once, but that mist be in separate versions of the program that get
> merged when all three are working.
Thank you very much. (I am not snipping, because I am replying to the whole
post). 
I am not planning to process hundreds of items; the program will have very 
strict assumptions and limits. Still I truly want to use dynamic memory here. 
Actually I thought that I have done the basic part that uses fixed array bounds, and 
I have initially considered that one as a first test step, and during these last days
I was working only on implementing dynamic memory allocation for buffers
and for arrays (I have lots of versions and pieces, which I have written before
implementing dynamic memory allocation). 
Shall I now show the same piece but without the dynamic memory? 
Actually we have already discussed one part of it, in which I fill in 
the array of values. 
I am truly not rushing ahead, and, unfortunately, I am not yet capable of 
working at a few programs simultaneously; a lot to learn before 
I am able to do that, if ever. 
I would like to create this program as a basis, and then add features,
which will help to process more complex csv files. 
0
Alla
12/23/2016 4:44:10 PM
On 23/12/16 14:39, Alla _ wrote:
> And I have learned
> a lot during this work, but I have reached the point at which I need
> help, if it is possible.

Okay. Unfortunately, I really don't have the time to investigate your 
code. What I /can/ do is give you some pointers(!) on what I would 
consider to be a good approach.

Looking at the output you posted, it seems to me that what you are doing 
is reading lines of this form:

title_n,value_n_0[,value_n_x]*

into an internal representation that you later display in this form:

(NAMES)
title0
title1
....
titlen
(VALUES)
value_0_0[ value_0_x]*
value_1_0[ value_0_x]*
....
value_n_0[ value_n_x]*
SUCCESS

That is, you echo the titles separately, one per line, and then echo the 
values in space-separated groups, one group per line.

Just as with all programs, then, we need to read the data, process it, 
and write it. So main() will look like this:

get the filename
read the data
process the data
write the processed data
report on whether the program succeeded
clean up

Getting the filename is trivial, so we'll skip over that.

To read the data, we need a structure to read the data into. We want it 
to be capable of expanding to fit the data we are given. So something 
like this will be in order:

struct DataLine_
{
   char *title;
   int *value;
   size_t num_values;
   size_t max_values;
};

typedef struct DataLine_ DataLine;

And ONE of those represents ONE line of the file. But we have multiple 
lines, and we don't know how many, so we need a structure to manage 
multiple lines:

struct Data_
{
   DataLine *line;
   size_t num_lines;
   size_t max_lines;
};

typedef struct Data_ Data;

Thus, given Data d, we will have:

d---line[0]---line[1]---line[2]---...---line[n-1]
       |         |         |               |
     title     title      title          title
     value[0]  value[0]   value[0]      value[0]
     value[1]  value[1]   value[1]      value[1]
etc.

Before we can do anything, we have to be able to read a line from input. 
Since we don't know in advance how long the line will be, we would like 
a function that can read /any/ length line (up to some sensible 
maximum). Rather than use char line[MAXLINE] and fgets, though, we'd 
prefer to allocate only (more or less) what we need.

Our line-reading function should therefore be able to allocate space as 
it goes along, and maintain that space, resizing it when required. So we 
want something like this:

static int fgetline(char **line,
                     size_t *size,
                     size_t maxrecsize,
                     FILE *fp,
                     size_t *plen)

Where line is a pointer to a char pointer; size is a pointer to the size 
of the memory we currently have; maxrecsize is the absolute upper limit 
above which we think something screwy is going on (I chose 4096, which 
should be enough to give you 400 years of data, BUT the function doesn't 
allocate 4096 straight away - it stretches the memory only (more or 
less) as far as it needs to. fp is of course the stream pointer. Since 
it's cheap to do, we'll also allow the function to report the length of 
the string it reads. It will read a line right up to the '\n'.

Now for read_data(). It should:

* open the file
* loop through the lines
* addline() - see below
* free the file-reading string memory
* close the file

What about process_data()? Well, this is easy. It should do... nothing!

How about write_data()? It should:

print (NAMES)
print the names, one per line
print (VALUES)
print the values, one group per line, separated by spaces

It should NOT print "SUCCESS", because this isn't part of the data. It's 
just a status report.

Okay, what about addline()? It should:

* ensure there is enough room for the new line of data
* create the line - which I've put in a separate function

What about create_line()? It should:

* extract the title from the string
* store it
* loop through the rest of the string, like this:
*** if a new value is found
****** find its extent
****** make sure there's enough room for it
****** convert it to int
****** store the value


And that's it. That's our design.

I have implemented this program for you. It's about 100 lines longer 
than your program, but that's partly because I've added quite a few 
comments. Also, my programming style tends to use a lot of vertical space.

But I hope you'll find it reasonably clean and simple to understand.

I welcome comments from "the regulars" on how it might be improved. (In 
my defence, however I did write this in a tearing hurry!)

KNOWN BUGS:

1) the error handling could be significantly improved by better 
reporting of exactly what went wrong, but I was trying to keep things 
simple.

2) the functional decomposition could be better, given the time to do it.

(I think that's all.)

I suggest that you read it very carefully. I have commented it 
reasonably generously in order to help you with that task.

Here is the implementation:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

#define INCOME_STATEMENT "files/income_test.csv"
#define MAXLINE 4096 /* the longest input line
                         you are prepared to accept */

struct DataLine_
{
   char *title;
   int *value;
   size_t num_values;
   size_t max_values;
};

typedef struct DataLine_ DataLine;

struct Data_
{
   DataLine *line;
   size_t num_lines;
   size_t max_lines;
};

typedef struct Data_ Data;

/* read a line from a stream:
    line = pointer to char *. If the char * is
             NULL, that's fine.
    size = size of existing allocated buffer. Must be 0 or
             the size provided by a previous call.
    maxrecsize = the most memory you are prepared to commit
                   to this purpose.
    plen = if not NULL, will be used for storing the length
           of the string that was read (which may be less
           than size).

    return value: 0 = line was read
                  1 = EOF - no data to read
                 -1 = couldn't provide initial allocation
                        of BUFSIZ bytes!
                 -2 = user-defined limit (maxrecsize) would
                        be exceeded
                 -3 = couldn't resize the allocated area to
                        the new size
                 -4 = error on input stream
  */
static int fgetline(char **line,
                     size_t *size,
                     size_t maxrecsize,
                     FILE *fp,
                     size_t *plen)
{
   int rc = 0;
   size_t count = 0;
   size_t newsize = 0;
   int ch = 0;
   char *tmp = NULL;

   /* although line and size can /point/ to NULL, they
        mustn't /be/ NULL - if they are, that's an
        error in the program, not the data. So we'll
        use assert() to test them.
    */
   assert(fp   != NULL);
   assert(line != NULL);
   assert(size != NULL);

   /* if we're already at the end of the file,
      there's nothing we can do. */
   if(feof(fp))
   {
     rc = 1;
   }
   /* if we don't have memory, we should get some */
   if(0 == rc && NULL == *line)
   {
     *line = malloc(BUFSIZ);
     if(*line != NULL)
     {
       *size = BUFSIZ;
     }
     else
     {
       rc = -1; /* can't get initial memory */
     }
   }

   /* main loop: keep reading a character until
      we hit newline or EOF */
   while(0 == rc && (ch = fgetc(fp)) != '\n' && ch != EOF)
   {
     /* if we don't have enough room, we need to get
        some more. count is the number of characters
        we've already stored, so we need room for
        those, + 1 for the character we just
        read, + 1 for the null terminator. */

     if(count + 2 >= *size)
     {
       /* multiplying the memory size by a factor means
          that we don't have to keep going back to the
          OS too often */
       newsize = 3 * (count + 2 + *size) / 2;

       /* observe user-specified limit */
       if(newsize > maxrecsize)
       {
         newsize = maxrecsize;
       }
       if(maxrecsize <= *size)
       {
         rc = -2;
       }
       else
       {
         /* get the new memory */
         tmp = realloc(*line, newsize);
         if(NULL == tmp) /* if it failed, don't touch
                            the existing memory */
         {
           rc = -3;
         }
         else
         {
           *line = tmp;
           tmp = NULL;
           *size = newsize;
         }
       }
     }
     /* if everything worked, store the character */
     if(0 == rc)
     {
       (*line)[count++] = ch;
     }
   }
   if(0 == rc)
   {
     /* terminate the string */
     (*line)[count] = '\0';

     /* does the user want the string length? */
     if(plen != NULL)
     {
       *plen = count;
     }
   }

   /* stream error? */
   if(0 == rc && ferror(fp))
   {
     rc = -4;
   }

   return rc;
}

/* create a substring of s, starting at left, and
    finishing ONE BEFORE right.
  */
static char *substring(const char *s,
                        size_t left,
                        size_t right)
{
   /* allocate enough memory for substring and
      for null terminating character */
   char *new = malloc(right + 1 - left);
   if(new != NULL)
   {
     memcpy(new, s + left, right - left);
     new[right - left] = '\0';
   }
   return new;
}

/* find the end of the current CSV field.
    The grammar here is very simplistic:
    (1) fields cannot cross line boundaries
    (2) commas delimit fields UNLESS the
        first character is a double-quote "
        in which case the double-quote trumps it.
    (3) double-quotes /within/ fields are not supported.

    line is the line to be parsed.
    left is the index of the first character in the
      current field. If it's a quote, this will get
      bumped on by one.
    right will end up containing the index of the
      first character that is NOT in the field.
  */
static void csv_delimit(const char *line,
                         size_t *left,
                         size_t *right)
{
   int delimiter = ',';

   assert(line != NULL);
   assert(left != NULL);
   assert(right != NULL);

   if(line[*left] == '"')
   {
     delimiter = '"';
     ++*left;
   }
   *right = *left;
   while(line[*right] != '\0' && line[*right] != delimiter)
   {
     ++*right;
   }
}

/* create_line manages the details of parsing a line
    of CSV data and turning it into a DataLine structure.
    It dynamically allocates enough room to hold all
    the fields on the line.
    Note that the first field is treated as a label.

    data = pointer to the overall data structure
    line = the line just read from the CSV file
    len = the length of the line
  */
static int create_line(Data *data, char *line, size_t len)
{
   DataLine *curr = NULL;
   size_t left = 0;
   size_t right = 0;
   int rc = EXIT_SUCCESS;

   assert(data != NULL);
   assert(line != NULL);
   assert(len > 0);

   /* we're going to be using data->line[data->num_lines]
      rather a lot, so let's set up a shorter alias for it.
    */
   curr = data->line + data->num_lines;

   /* first, get the title (the label) */
   csv_delimit(line, &left, &right);
   assert(right < len);
   if(right > left)
   {
     curr->title = substring(line, left, right);
     if(curr->title != NULL)
     {
       /* 'right' now has the index of the
            end of the field. Keep looping
            until we hit the end of the string.
        */
       while(EXIT_SUCCESS == rc && line[right] != '\0')
       {
         /* we have some data to process -
              but is there enough room for it?
          */
         if(curr->num_values ==
            curr->max_values)
         {
           size_t newlinecount = 3 *
                    (curr->max_values + 1) / 2;
           int *new = realloc(curr->value,
                              newlinecount * sizeof *new);
           if(new != NULL)
           {
             curr->value = new;
             curr->max_values = newlinecount;
           }
           else
           {
             /* couldn't get the memory we need */
             rc = EXIT_FAILURE;
           }
         }
         /* If nothing has gone wrong, we're
            all set to add the data */
         if(EXIT_SUCCESS == rc)
         {
           left = right + 1;
           csv_delimit(line, &left, &right);
           if(right > left)
           {
             char *endptr = NULL;

             curr->value[curr->num_values] =
               strtol(line + left, &endptr, 10);

             if(endptr == line + left)
             {
               /* no conversion was performed:
                  this means we have non-numeric
                  data in a numeric field! */
               rc = EXIT_FAILURE;
             }
             else
             {
               /* keep track of the number of
                  number fields */
               ++curr->num_values;
             }
           }
         }
       }
     }
     else
     {
       rc = EXIT_FAILURE;
     }
   }

   if(EXIT_SUCCESS == rc)
   {
     ++data->num_lines;
   }
   return rc;
}

/* addline ensures there is enough room for a new
    element within the Data structure. It does
    /not/ allocate space for the DataLine struct.
    That is done in create_line.

    data = a pointer to a structure that can manage
             lines of data taken from a CSV file
    line = the line we just read from a text file
    len  = the length of that line
  */

static int addline(Data *data, char *line, size_t len)
{
   int rc = EXIT_SUCCESS;
   /* do we need more memory? */
   if(data->num_lines == data->max_lines)
   {
     /* exercise: the + 1 in the next line is
        important. Why? */

     size_t newsize = 3 * (data->max_lines + 1) / 2;

     /* get the new memory */
     DataLine *new = realloc(data->line,
                             newsize * sizeof *new);
     if(new != NULL)
     {
       data->line = new;
       /* ensure that each element is blanked out */
       for(data->max_lines = data->num_lines;
           data->max_lines < newsize;
           data->max_lines++)
       {
         data->line[data->max_lines].title = NULL;
         data->line[data->max_lines].value = NULL;
         data->line[data->max_lines].num_values = 0;
         data->line[data->max_lines].max_values = 0;
         /*
            If that list had been any longer, I'd have
            done this:
            DataLine d = {0};
            data->line[data->max_lines] = d;
          */
       }
       data->max_lines = newsize;
     }
     else
     {
       rc = EXIT_FAILURE;
     }
   }
   if(EXIT_SUCCESS == rc)
   {
     /* all the housekeeping is done, so let's
        create the line.
      */
     rc = create_line(data, line, len);
   }
   return rc;
}

/* read_data is the high-level management function
    for reading the data from the file into our
    struct.
    data = pointer to a struct that can store CSV data
    filename = the CSV filename
  */
static int read_data(Data *data, const char *filename)
{
   int rc = EXIT_SUCCESS;

   FILE *fp = fopen(filename, "r");
   if(fp != NULL)
   {
     int readresult = 0;
     char *line = NULL;
     size_t line_size = 0;
     size_t len = 0;

     while(EXIT_SUCCESS == rc &&
           (readresult = fgetline(&line,
                                  &line_size,
                                  MAXLINE,
                                  fp,
                                  &len)) == 0 &&
           len > 0)
     {
       rc = addline(data, line, len);
     }
     free(line);
     line = NULL;

     fclose(fp);
   }
   else
   {
     fprintf(stderr,
             "Error: Sorry, can't open %s for reading.\n",
             filename);
     rc = EXIT_FAILURE;
   }

   return rc;
}

static int process_data(Data *data)
{
   /* nothing to do. But if there /were/ something
      to do, this is where it would be done.
    */
   return EXIT_SUCCESS;
}

static int write_data(Data *data)
{
   int rc = EXIT_SUCCESS;

   size_t datum = 0;

   printf("(NAMES)\n");
   for(datum = 0; datum < data->num_lines; datum++)
   {
     printf("%s\n", data->line[datum].title);
   }
   printf("(VALUES)\n");
   for(datum = 0; datum < data->num_lines; datum++)
   {
     size_t detail = 0;
     for(detail = 0;
         detail < data->line[datum].num_values;
         detail++)
     {
       /* print each field, trailing all of them
          (except the last) with a space.
        */
       printf("%d%s", data->line[datum].value[detail],
              detail + 1 < data->line[datum].num_values ?
              " " : "");
     }
     putchar('\n');
   }
   if(ferror(stdout))
   {
     rc = EXIT_FAILURE;
   }
   return rc;
}

/* cleanup */
static void release(Data *d)
{
   if(d != NULL)
   {
     size_t datum = 0;
     for(datum = 0; datum < d->num_lines; datum++)
     {
       free(d->line[datum].title);
       free(d->line[datum].value);
     }
     free(d->line);
     d->line = NULL;
   }
}

/* main
    argv[1] = filename to process. If absent,
                a default is used.
    This function reads the data, processes it,
    writes the report, cleans up, and quits.
  */

int main(int argc, char **argv)
{
   const char *filename = INCOME_STATEMENT;

   int rc = EXIT_SUCCESS;
   Data data = {0};

   if(argc > 1)
   {
     filename = argv[1];
     if(argc > 2)
     {
       fputs("Warning: Sorry, currently only one input"
             " file is supported.\n", stderr);
     }
   }

   rc = read_data(&data, filename);
   if(EXIT_SUCCESS == rc)
   {
     rc = process_data(&data);
   }
   if(EXIT_SUCCESS == rc)
   {
     rc = write_data(&data);
   }
   if(EXIT_SUCCESS == rc)
   {
     fprintf(stderr, "SUCCESS\n");
   }
   else
   {
     fprintf(stderr, "FAILURE\n");
   }
   release(&data);

   return rc;
}

Input:

Years,2011,2012,2013,2014,2015
Sales,1062,1252,1587,1934,2519
Cost of Goods Sold,654,814,1009,1190,1499
   Gross Profit,408,438,578,744,1020
"Selling, General, and Admin Exp",254,271,364,454,576
   Operating Income before Depr,154,167,214,290,444
Depreciation and Amortization,25,31,38,52,70
   Operating Profit,129,136,176,238,374
Interest Expense,4,3,3,1,4
Other Gains and Losses,0,7,10,0,-1
   Pretax Income,125,126,163,237,371
Income Tax Expense,55,52,65,92,141
   Net Income,70,74,98,145,230

Output:

(NAMES)
Years
Sales
Cost of Goods Sold
   Gross Profit
Selling, General, and Admin Exp
   Operating Income before Depr
Depreciation and Amortization
   Operating Profit
Interest Expense
Other Gains and Losses
   Pretax Income
Income Tax Expense
   Net Income
(VALUES)
2011 2012 2013 2014 2015
1062 1252 1587 1934 2519
654 814 1009 1190 1499
408 438 578 744 1020
254 271 364 454 576
154 167 214 290 444
25 31 38 52 70
129 136 176 238 374
4 3 3 1 4
0 7 10 0 -1
125 126 163 237 371
55 52 65 92 141
70 74 98 145 230
SUCCESS

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/23/2016 7:34:11 PM
Alla _ <modelling.data@gmail.com> writes:

> On Friday, December 23, 2016 at 7:29:20 PM UTC+3, Ben Bacarisse wrote:
>> Alla _ <modelling.data@gmail.com> writes:
>> <snip>
>> > ... The only 
>> > extra feature I use is dynamic memory allocation. But it is an important
>> > one for my program.
>> 
>> Why?
<snip>

> I am not planning to process hundreds of items; the program will have very 
> strict assumptions and limits. Still I truly want to use dynamic
> memory here.

I just wanted to know why!

> Actually I thought that I have done the basic part that uses fixed
> array bounds,

Ah, sorry.  My mistake.

That should make debugging simpler.  At every stage where you make a
change, run all your tests.  You will then have two very similar
programs and only a small code change that could be responsible for the
error.  Of course, not all bug show up in testing, but I presume you are
seeing a problem in testing or you would not be asking here.

If I get time I'll take a look but I don't have the advantage of being
able to look only at the changes!

<snip>
-- 
Ben.
0
Ben
12/23/2016 8:31:31 PM
Alla _ <modelling.data@gmail.com> writes:
[...]
> (2) Attn. Richard: I have disobeyed you, Sir ) My apologies. I have
> implemented the dynamic memory allocation to follow one of the
> headsups suggested by Keith.  This is the only one feature that is
> “sort of” more advanced, but I do believe it is necessary for this
> program, not only because the lines might be arbitrarily long, but
> because it is a great practice, and it is not such a fancy feature
> after all.

I'm sorry if my advice led you astray.

What I meant to suggest (and I think I did, but I'm too lazy to
go back and check) is that if you want to handle arbitrarily long
lines, you should implement a function to do that *separately*,
not as part of your large CSV-processing program.

For example, you might write a function:

    char *get_line(FILE *in);

(Note: not "getline", there's a POSIX function by that name) that
returns a pointer to an allocated string.  Internally, get_line()
would probably use fgets(), malloc(), and realloc().

Implement and debug that function *separately*, using a program
that does nothing but call get_line() to read lines from a file.
Once you've tested it with as many cases as you can think of (empty
lines, short lines, very long lines, multiple calls, correctly
handling EOF, correctly handling a trailing line with no '\n'),
*then* you can drop that function into your CSV program.

As it is, you haven't even written a separate line-reading function.
You've created some wrappers around your realloc() calls (which
I'm not sure was a useful thing to do), but your allocation logic
is mixed up with your input loop.

What I think you *should* have done was write your CSV program to
use fgets(), handling lines up to, say, 1000 characters, and test it
with input data that meets that restriction.  And once that works,
you could drop in your get_line() function if you insist on doing
that extra work.

As it is, you have something going wrong that's causing realloc()
to fail -- and it's going to be difficult to track down that error.

Another observation (I haven't read your entire program): Several of
your functions return EXIT_FAILURE and EXIT_SUCCESS to indicate whether
they failed or succeeded.  That might seem like a good idea, but in my
opinion it's poor style.  Those values specify the return status of your
main program.  A common convention is for a function to return an int
status, with 0 denoting success and non-0 denoting failure (note that
EXIT_SUCCESS is not necessarily 0).  Or your functions could return bool
if you don't need to distinguish among different kinds of failures.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
12/23/2016 10:10:05 PM
On Friday, December 23, 2016 at 11:31:44 PM UTC+3, Ben Bacarisse wrote:
> Alla _ <modelling.data@gmail.com> writes:
> 
> > On Friday, December 23, 2016 at 7:29:20 PM UTC+3, Ben Bacarisse wrote:
> >> Alla _ <modelling.data@gmail.com> writes:
> >> <snip>
> >> > ... The only 
> >> > extra feature I use is dynamic memory allocation. But it is an important
> >> > one for my program.
> >> 
> >> Why?
> <snip>
> 
> > I am not planning to process hundreds of items; the program will have very 
> > strict assumptions and limits. Still I truly want to use dynamic
> > memory here.
> 
> I just wanted to know why!
I use dynamic memory only because I want it, and I think it is a very 
important tool, and it is a truly great practice in C.
> 
> > Actually I thought that I have done the basic part that uses fixed
> > array bounds,
> 
> Ah, sorry.  My mistake.
> 
> That should make debugging simpler.  At every stage where you make a
> change, run all your tests.  You will then have two very similar
> programs and only a small code change that could be responsible for the
> error.  Of course, not all bug show up in testing, but I presume you are
> seeing a problem in testing or you would not be asking here.
> 
Thank you very much. Actually I didn't see any problems at the run time.
I have posted this piece because I know how bad I am at coding yet, and 
therefore only advices I get here help me to improve my understanding. 
Also I was hoping to get an advice on whether it is a good or bad idea
to create a temporary array to temporarily store values (after reading 
the whole file), clear memory allocated for array_names and array_values,
and copy back values from temporary arrays, thus freeing all unused
memory in arrays. I am referring to this in the code. 
I will do the following now: I will again compare two codes (the one with
dynamic memory allocation and the one without it), see what I can improve,
and only then come back here and repost the code again. 
Please, do not spend your time now looking at the code I published 
yesterday. 

Offtopic: interesting that after I have read your message yesterday,
I have seen a news on some interesting job offer, so I thought that 
maybe, if I find time for that, I will create a tiny website. It's not a 
programming job, of course, and not even what I am aiming at now; it's
a creative job, so a website might be an interesting way to show
them something. 
> If I get time I'll take a look but I don't have the advantage of being
> able to look only at the changes!
> 
> <snip>
> -- 
> Ben.
0
Alla
12/24/2016 4:52:40 AM
On Saturday, December 24, 2016 at 1:09:39 AM UTC+3, Keith Thompson wrote:
> Alla _ <modelling.data@gmail.com> writes:
> [...]
> > (2) Attn. Richard: I have disobeyed you, Sir ) My apologies. I have
> > implemented the dynamic memory allocation to follow one of the
> > headsups suggested by Keith.  This is the only one feature that is
> > =E2=80=9Csort of=E2=80=9D more advanced, but I do believe it is necessa=
ry for this
> > program, not only because the lines might be arbitrarily long, but
> > because it is a great practice, and it is not such a fancy feature
> > after all.
>=20
> I'm sorry if my advice led you astray.
>=20
Please, do not worry - I am not easily led astray, and your advices
were very helpful and useful; I would have used dynamic memory
allocation anyway, that was my goal, and your advice has only
helped me.=20
> What I meant to suggest (and I think I did, but I'm too lazy to
> go back and check) is that if you want to handle arbitrarily long
> lines, you should implement a function to do that *separately*,
> not as part of your large CSV-processing program.
>=20
> For example, you might write a function:
>=20
>     char *get_line(FILE *in);
>=20
> (Note: not "getline", there's a POSIX function by that name) that
> returns a pointer to an allocated string.  Internally, get_line()
> would probably use fgets(), malloc(), and realloc().
>=20
> Implement and debug that function *separately*, using a program
> that does nothing but call get_line() to read lines from a file.
> Once you've tested it with as many cases as you can think of (empty
> lines, short lines, very long lines, multiple calls, correctly
> handling EOF, correctly handling a trailing line with no '\n'),
> *then* you can drop that function into your CSV program.
>=20
This was my main mistake, as I see now. Yes, I have first constructed
a program that reads data into fixed sized arrays, and then I have=20
mistakenly brought in mallocing and reallocing right into that=20
version, without testing the new approach separately. I see.=20
I will think how to correct that now, and write a separate get_line()
function.=20
> As it is, you haven't even written a separate line-reading function.
> You've created some wrappers around your realloc() calls (which
> I'm not sure was a useful thing to do), but your allocation logic
> is mixed up with your input loop.
>=20
True. Even visually looks rather convoluted.
> What I think you *should* have done was write your CSV program to
> use fgets(), handling lines up to, say, 1000 characters, and test it
> with input data that meets that restriction.  And once that works,
> you could drop in your get_line() function if you insist on doing
> that extra work.
>=20
Yes. I have addressed this few paragraphs above. I have written
a fixed-array program, but have not written a separate one with=20
dynamic memory allocation; this is a big mistake that made my=20
life much harder )=20

> As it is, you have something going wrong that's causing realloc()
> to fail -- and it's going to be difficult to track down that error.
>=20
> Another observation (I haven't read your entire program): Several of
> your functions return EXIT_FAILURE and EXIT_SUCCESS to indicate whether
> they failed or succeeded.  That might seem like a good idea, but in my
> opinion it's poor style.  Those values specify the return status of your
> main program.  A common convention is for a function to return an int
> status, with 0 denoting success and non-0 denoting failure (note that
> EXIT_SUCCESS is not necessarily 0).  Or your functions could return bool
> if you don't need to distinguish among different kinds of failures.
>=20
I will correct that. Thank you. I thought about using boolean values but=20
I recall reading somewhere here that all these boolean returns are not=20
the best programming practice. Is it wrong? If so, I'd be happy to use=20
them - looks more neat than these huge macros )
0
Alla
12/24/2016 5:09:24 AM
On Friday, December 23, 2016 at 10:34:26 PM UTC+3, Richard Heathfield wrote:
> On 23/12/16 14:39, Alla _ wrote:
<snip>
First of all, thank you very much for your very helpful explanations, and for the 
code. Please, forgive me but I will not look at your code now; I am 
saving it and will look at it only when I am done with my one, and
then use your code to improve my work. 
Once again - you came up with more than a hundred lines of code in just no
time. I truly feel as if I am one year old in terms of my brain ability compared
to yours. Tears on my eyes coupled with a happy smile - at least I see my 
goal, and sort of have a road map; so maybe.. one day.. Though they say
the older your get the harder it is to change your thinking process. I don't 
believe in these limits and don't believe people who say these things, 
but it does require a lot of effort.
> Just as with all programs, then, we need to read the data, process it, 
> and write it. So main() will look like this:
> 
> get the filename
> read the data
> process the data
> write the processed data
> report on whether the program succeeded
> clean up
> 
Yes, that's the plan. 

> Getting the filename is trivial, so we'll skip over that.
> 
> To read the data, we need a structure to read the data into. We want it 
> to be capable of expanding to fit the data we are given. So something 
> like this will be in order:
> 
> struct DataLine_
> {
>    char *title;
>    int *value;
>    size_t num_values;
>    size_t max_values;
> };
> 
> typedef struct DataLine_ DataLine;
> 
> And ONE of those represents ONE line of the file. But we have multiple 
> lines, and we don't know how many, so we need a structure to manage 
> multiple lines:
> 
> struct Data_
> {
>    DataLine *line;
>    size_t num_lines;
>    size_t max_lines;
> };
> 
> typedef struct Data_ Data;
> 
> Thus, given Data d, we will have:
> 
> d---line[0]---line[1]---line[2]---...---line[n-1]
>        |         |         |               |
>      title     title      title          title
>      value[0]  value[0]   value[0]      value[0]
>      value[1]  value[1]   value[1]      value[1]
> etc.
> 
Oh, that is such a relief for me. I thought about using the 
structure that should contain structures as its members, because
that is the only useful way I could think of, which is suitable for further data
processing, given the data I have. But being sure I will get flak for only 
mentioning this idea, I have postponed bringing it up. 

<snip>
0
Alla
12/24/2016 5:31:45 AM
What happens when the name at the start of use_buffer, in the following
code that you posted, when array_names[*line_names] is not large enough 
to hold the name?

On 2016-12-23, Alla _ <modelling.data@gmail.com> wrote:
<snip>
> /** Function to parse row names **/
> int fill_row_names(char *use_buffer, size_t *line_names, size_t *index, size_t *new_names_len)
> {
>     //Allocate memory for the row
>     if((chars_memalloc(&array_names[*line_names], new_names_len)) == EXIT_FAILURE)
>         return EXIT_FAILURE;
>     
>     char *ptr_buffer = NULL;
>     ptr_buffer = use_buffer;
>     size_t names_column;
>     
>     //Parse row names from each line into 'array_names'
>     for ( names_column = 0; ( !isdigit(*ptr_buffer));
>          names_column++, ptr_buffer++, (*index)++ )
>     {
>         /* Add memory if the number of columns has reached
>          the allocated number */
>         if(names_column == (*new_names_len - 1))
>         {
>             *new_names_len *= 2;
>             //Re_allocate memory
>             if((chars_realloc(&array_names[*line_names],new_names_len)) == EXIT_FAILURE)
>                 return EXIT_FAILURE;
>         }
>         //Check if the line starts with double quotes
>         if(*ptr_buffer == '"')
>         {
>             //Skip double quotes
>             ptr_buffer++;
>             (*index)++;
>             //Store every character between double quotes
>             while(*ptr_buffer != '"')
>             {
>                 array_names[*line_names][names_column] = *ptr_buffer++;
>                 names_column++;
>                 (*index)++;
>             }
>             ptr_buffer++;
>             (*index)++;
>         }
<snip>
0
A
12/24/2016 2:49:17 PM
Reply: