f



Program to read csv file, store characters and numbers in separate arrays

Hello!

I am not a fan of using exclamation marks, and any other text stressing effects, 
so, instead of those, by resorting to a detailed and a bit verbose description below,
I hope to make myself clear in terms of what I am trying to achieve, which I often
fail to do )

Please, take a look at program, which:
1) opens the file csv, which name is provided by the user, or, if user has not
provided the file, open a specific file (my test file); I will post the contents
of two test csv files below the program;

2) reads contents of the file line by line, and counts the length of the line;

3) while reading each line:
     - stores character values in the array_names;
     - converts numerical characters into integers and stores those in the array_values;

4) I use printf as a debugging tool to check how each step of the program
performs; all these are separated from the program text with additional 
newlines (yes, making the program vertically longer; sorry) and by comments
in capital letters.

Important part: 
(1) please, please, please, do me a great favor and first read the 
program to the end before starting saying how bad everything is, and what 
a horrific code I have produced yet again,  and don't stop at the beginning
lines of the code; the output I get is below the program and both csv inputs;
(2) please, note that I have decided to present this in one whole piece of 
code without separating the program into functions; I will definitely separate
it later - I plan to take all pieces into separate functions;
(3) I will use a header file, in which I will store preprocessor commands, like
#define and #include, as well as functions' declarations, etc.;
for now, (2) and (3) are part of the same file;
(4) please, ignore array and buffer sizes for now; I have used these numbers
only for this test version for certain reasons. 
(5) as I have written elsewhere, I am not testing for all possible conditions,
because there will be a separate notification for the user with a description
of csv file parameters required for the program; this is a student's program,
therefore I making lots of assumptions, and standardize the input information;
(6) I am working on certain improvements to this part, and will post them 
later on; those are regarding the way I store values from the buffer. 
(7) lengths of each line: I use strlen to see what is happening with the length
of the line; I have two files - the first one is the one I have converted and it 
has two additional, yet undetected values at the end of each line before the 
'\0'; another file, which is definitely a correct csv file, has been kindly 
converted by Richard, and this one prints out additional 0 at the end of 
each line upon converting numerical characters into integers.

Thank you very much!

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>


#define BUFFER_SIZE 100
//#define ARRAY_SIZE(a) (sizeof(a)/sizeof(*(a)))
#define INCOME_STATEMENT "files/income_test.csv"

int main(int argc, char *argv[])
{
    //check for correct number of arguments
    if(argc > 2)
    {
        printf("Usage: program_name [file_name]");
        return EXIT_FAILURE;
    }
    //determine a file to use
    char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
    
    FILE *fp = fopen(data_file, "r");
    
    if ( fp )
    {
        char buffer[BUFFER_SIZE] = "";
        char array_names[50][50];
        int  array_values[50][50];
        char *ptr_buffer = NULL;
        size_t i, j, k;
        
        //Read each line from the file, store in the buffer
        for ( i = 0; fgets(buffer, sizeof(buffer), fp) != NULL; ++i )
        {
            size_t buffer_length = strlen(buffer);
            
            //DEBUG: CHECK IF BUFFER IS FILLED CORRECTLY
            printf("(1) %s", buffer);
            printf("(1.1) buffer length = %zu\n", buffer_length);
            //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
            printf("(2) ");
            
            
            //Parse row names from each line into 'array_names'
            for ( j = 0, ptr_buffer = buffer; ( !isdigit(*ptr_buffer));  j++, ptr_buffer++ )
            {
                //Check if the line starts with double quotes
                if(*ptr_buffer == '"')
                {
                    //Skip double quotes
                    ptr_buffer++;
                    //Store every character between double quotes
                    while(*ptr_buffer != '"')
                    {
                        array_names[i][j] = *ptr_buffer++;
                    
                    //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
                    printf("%c", array_names[i][j]);
                        j++;
                   //Decrement the length of the buffer
                    buffer_length--;
                    }
                    
                    ptr_buffer++;
                }
                
/** THE IF STATEMENT THAT CHECKS FOR DOUBLE QUOTES WILL GO INTO A SEPARATE FUNCTION **/
                if(*ptr_buffer == ',' && isdigit(*(ptr_buffer + 1)))
                    break; 
                array_names[i][j] = *ptr_buffer;
                
                //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
                printf("%c", array_names[i][j]);
                //Decrement the length of the buffer
                buffer_length--;
            }
            array_names[i][j] = '\0';
            putchar('\n');
            
            //Move to the next character in the line beyond comma
            ptr_buffer++;
            buffer_length--;
            
            //DEBUG: CHECK BUFFER LENGTH AFTER FILLING NAMES
            printf("(2.2) buffer length = %zu\n", buffer_length);

            
            //DEBUG: CHECK IF VALUES ARE FILLED CORRECTLY
            printf("(3) ");

            //Parse the comma-separated values from each line into 'array_values'
            for ( k = 0; *ptr_buffer != '\0' /** && k < buffer_length - 2**/;
                 k++, ptr_buffer++)
            {
                array_values[i][k] = (int)strtol(ptr_buffer, &ptr_buffer,
                                                 10);
                
                //DEBUG: CHECK IF VALUES ARE FILLED CORRECTLY
                printf("%d ", array_values[i][k]);
            }
            printf("\n\n");
        }
        if (!feof(fp))
        {
            puts("Something went wrong with the provided file\n");
            return EXIT_FAILURE;
        }
        fclose(fp);
       
    }    
    //fopen() returned NULL
    else
    {
        perror(data_file);
    }
    return 0;
}

First csv file, converted by me:
Years,2011,2012,2013,2014,2015,
Sales,1062,1252,1587,1934,2519,
Cost of Goods Sold,654,814,1009,1190,1499,
  Gross Profit,408,438,578,744,1020,
SG&A,254,271,364,454,576,
  Operating Income before Depr,154,167,214,290,444,
Depreciation and Amortization,25,31,38,52,70,
  Operating Profit,129,136,176,238,374,
Interest Expense,4,3,3,1,4,
Other Gains and Losses,0,7,10,0,-1,
  Pretax Income,125,126,163,237,371,
Income Tax Expense,55,52,65,92,141,
  Net Income,70,74,98,145,230,

Second csv file (a correct one in terms of csv format):
Years,2011,2012,2013,2014,2015
Sales,1062,1252,1587,1934,2519
Cost of Goods Sold,654,814,1009,1190,1499
  Gross Profit,408,438,578,744,1020
"Selling, General, and Admin Exp",254,271,364,454,576
  Operating Income before Depr,154,167,214,290,444
Depreciation and Amortization,25,31,38,52,70
  Operating Profit,129,136,176,238,374
Interest Expense,4,3,3,1,4
Other Gains and Losses,0,7,10,0,-1
  Pretax Income,125,126,163,237,371
Income Tax Expense,55,52,65,92,141
  Net Income,70,74,98,145,230

First output (will show only beginning parts of both outputs to shorten
the text):

(1) Years,2011,2012,2013,2014,2015,
(1.1) buffer length = 32
(2) Years
(2.2) buffer length = 26
(3) 2011 2012 2013 2014 2015 0 

(1) Sales,1062,1252,1587,1934,2519,
(1.1) buffer length = 32
(2) Sales
(2.2) buffer length = 26
(3) 1062 1252 1587 1934 2519 0 

(1) Cost of Goods Sold,654,814,1009,1190,1499,
(1.1) buffer length = 44
(2) Cost of Goods Sold
(2.2) buffer length = 25
(3) 654 814 1009 1190 1499 0 0 

(1)   Gross Profit,408,438,578,744,1020,
(1.1) buffer length = 38
(2)   Gross Profit
(2.2) buffer length = 23
(3) 408 438 578 744 1020 0 0 

(1) SG&A,254,271,364,454,576,
(1.1) buffer length = 27
(2) SG&A
(2.2) buffer length = 22
(3) 254 271 364 454 576 0 0 

etc.

Second output:
(1) Years,2011,2012,2013,2014,2015
(1.1) buffer length = 32
(2) Years
(2.2) buffer length = 26
(3) 2011 2012 2013 2014 2015 0 

(1) Sales,1062,1252,1587,1934,2519
(1.1) buffer length = 32
(2) Sales
(2.2) buffer length = 26
(3) 1062 1252 1587 1934 2519 0 

(1) Cost of Goods Sold,654,814,1009,1190,1499
(1.1) buffer length = 43
(2) Cost of Goods Sold
(2.2) buffer length = 24
(3) 654 814 1009 1190 1499 0 

(1)   Gross Profit,408,438,578,744,1020
(1.1) buffer length = 37
(2)   Gross Profit
(2.2) buffer length = 22
(3) 408 438 578 744 1020 0 

(1) "Selling, General, and Admin Exp",254,271,364,454,576
(1.1) buffer length = 55
(2) Selling, General, and Admin Exp
(2.2) buffer length = 23
(3) 254 271 364 454 576 0 

etc.
0
Alla
12/15/2016 1:05:12 PM
comp.lang.c 30656 articles. 5 followers. spinoza1111 (3246) is leader. Post Follow

70 Replies
614 Views

Similar Articles

[PageSpeed] 0

Alla _ <modelling.data@gmail.com> writes:
[...]
> I always use only the following flag, which you have taught me to use:
> gcc -Wall -Werror -Wextra -pedantic -std=c99 program.c -o program
> Shall I use some more?

That's fine.

Warning: The following could be a distraction, and you can consider
ignoring it.

You could consider using -std=c11 rather than -std=c99 if you
have a sufficiently new version of gcc, but unless you're using
C11-specific features (which you probably aren't) it doesn't matter.
(With recent versions, say gcc 6.0 and up, C11 support is roughly
as good as C99 support.  Also, the default is "-std=gnu11" rather
than the "-std=gnu90" of older releases.  But the default doesn't
matter if you're specifying "-std=..." anyway.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
12/15/2016 1:01:01 AM
On 15/12/16 13:05, Alla _ wrote:
> Hello!
>
> (7) lengths of each line: I use strlen to see what is happening with the length
> of the line; I have two files - the first one is the one I have converted and it
> has two additional, yet undetected values at the end of each line before the
> '\0'; another file, which is definitely a correct csv file, has been kindly
> converted by Richard, and this one prints out additional 0 at the end of
> each line upon converting numerical characters into integers.

In this article, I'm only going to explain that trailing 0. And I'm 
going to try to do it in such a way that you can find the problem for 
yourself.

Here is a sample from the file I sent you:

Income Statement,2011,2012,2013,2014,2015

Here is a sample from the file that is causing you problems with a 
trailing 0:

Years,2011,2012,2013,2014,2015,

Let's compare them directly:

Income Statement,2011,2012,2013,2014,2015
Years,2011,2012,2013,2014,2015,

Now let's ignore the first field of each:

2011,2012,2013,2014,2015
2011,2012,2013,2014,2015,


Do you see the difference? It's very slight, but it's very significant. 
Knowing the difference will, I think, enable you to see the problem, and 
it's a problem with your data, not your program. (Which isn't to say 
that the problem could not be dealt with in code. But in this case, I 
think it would be far simpler just to fix the data.)

If I get time, I'll post code annotations separately.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 1:48:15 PM
[the code]

On 15/12/16 13:05, Alla _ wrote:

> (3) I will use a header file, in which I will store preprocessor commands, like
> #define and #include, as well as functions' declarations, etc.;

A well-designed header contains *only* the following:

1) inclusion guards;
2) macro definitions (#defines);
3) type definitions (typedefs);
4) function prototypes;
5) comments;
6) #include directives *ONLY* for those headers that are required for 
the immediate purposes of the header itself.

For example, here's a quick header I threw together a year or so ago:

/* CISPRNG - Cryptographically In-Secure
  * Pseudo-Random Number Generator
  *
  * Written by Richard Heathfield,
  * 20 October 2015.
  *
  * You will also need cisprng.c and a
  * C compiler.
  *
  * The author places this code into the
  * public domain, and so it may be freely
  * used for any legal purpose, although
  * I would recommend against using it
  * in circumstances where you really need
  * a cryptographically secure PRNG, because
  * it isn't one.
  */
#ifndef H_CISPRNG_H_
#define H_CISPRNG_H_

#define CISPRNG_RAND_MAX (0x7FFFFFFFUL)

/* Single state versions */
/* set the seed */
void cisprng_set(unsigned long seed);
/* get a PRN */
unsigned long cisprng_random(void);
/* get a PRN in a range */
unsigned long cisprng_range(unsigned long low,
                             unsigned long high);
/* Custom state versions */
unsigned long cisprng_csrandom(unsigned long *pseed);
unsigned long cisprng_csrange(unsigned long *pseed,
                               unsigned long low,
                               unsigned long high);
#endif
/* end of cisprng.h */

Note that it doesn't define any types. But if it did, they would go 
after the #defines but before the function prototypes. Nor does it 
include any headers. But *if* I'd used, say, size_t *in the header*, I'd 
have included <stddef.h> in the header. But I would *not* include 
<stddef.h> in the header just because the corresponding C file used 
size_t (which, actually, it doesn't). A module header is not a place to 
hide #includes.

> #include <stdio.h>
> #include <stdlib.h>
> #include <ctype.h>
> #include <string.h>
>
>
> #define BUFFER_SIZE 100
> //#define ARRAY_SIZE(a) (sizeof(a)/sizeof(*(a)))
> #define INCOME_STATEMENT "files/income_test.csv"
>
> int main(int argc, char *argv[])
> {
>     //check for correct number of arguments
>     if(argc > 2)
>     {
>         printf("Usage: program_name [file_name]");

You might want a \n on there. Or use puts() instead if you prefer.

Otherwise, this happens:

Usage: program_name [file_name]me@mymachine$

which probably isn't what you wanted to see.

>         return EXIT_FAILURE;
>     }
>     //determine a file to use
>     char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;

I don't suppose you plan to change the filename, so why not make this const?

const char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;

This will remove the *only* diagnostic message that gcc gave me (and by 
the way, to get only one warning when using my flag choices is quite an 
achievement!).

What should the program do if there are several arguments? At the 
moment, if I tell it:

../alla test1.csv test2.csv

I get this message:

Usage: program_name [file_name]

If the program can only handle one data file, I would expect to see 
something like:

Sorry, I can only process one file. Ignoring subsequent arguments...

which you would code like this:

const char *data_file = (argc > 1)? argv[1] : INCOME_STATEMENT;
if(argc > 2)
{
   fputs("Sorry, I can only process one file."
         " Ignoring subsequent arguments...\n",
         stderr);
}

>
>     FILE *fp = fopen(data_file, "r");
>
>     if ( fp )
>     {
>         char buffer[BUFFER_SIZE] = "";
>         char array_names[50][50];
>         int  array_values[50][50];

I am taking you at your word and forbearing to comment on these values, 
but for the sake of a predictable life I would suggest that you 
initialise these two arrays:

char array_names[50][50] = {0};
int  array_values[50][50] = {0};

which will take advantage of the default static initialisation rule to 
set everything to 0.

>         char *ptr_buffer = NULL;
>         size_t i, j, k;
>
>         //Read each line from the file, store in the buffer
>         for ( i = 0; fgets(buffer, sizeof(buffer), fp) != NULL; ++i )

Since i isn't part of the loop control logic, I would suggest that you 
replace this line with the much more usual:

   i = 0;
   while(fgets(buffer, sizeof buffer, fp) != NULL)
   {
     /* stuff */
     i++;
   }

And if i is a line number, why not call it line_number? That's much more 
descriptive than i.

The loop body looked more or less okay to me, except that it's a touch 
on the large side. Think about functional decomposition.

>         if (!feof(fp))
>         {
>             puts("Something went wrong with the provided file\n");
>             return EXIT_FAILURE;
>         }

Why not use ferror(fp) instead? It's a more direct description of what 
you're trying to find out:

   if(ferror(fp))
   {

And while we're on the subject, diagnostic messages sit much better on 
stderr than on stdout. That way, they can be separated from the 
program's normal output, like this:

../alla test1.csv > test1.out 2> test1.log

which will put stdout stuff into test1.out and stderr stuff into test1.log.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 2:12:44 PM
On 15/12/16 15:12, Richard Heathfield wrote:
> [the code]
> 
> On 15/12/16 13:05, Alla _ wrote:
> 
>> (3) I will use a header file, in which I will store preprocessor
>> commands, like
>> #define and #include, as well as functions' declarations, etc.;
> 
> A well-designed header contains *only* the following:
> 
> 1) inclusion guards;
> 2) macro definitions (#defines);
> 3) type definitions (typedefs);
> 4) function prototypes;
> 5) comments;
> 6) #include directives *ONLY* for those headers that are required for
> the immediate purposes of the header itself.
> 
> For example, here's a quick header I threw together a year or so ago:
> 
> /* CISPRNG - Cryptographically In-Secure
>  * Pseudo-Random Number Generator
>  *
>  * Written by Richard Heathfield,
>  * 20 October 2015.
>  *
>  * You will also need cisprng.c and a
>  * C compiler.
>  *
>  * The author places this code into the
>  * public domain, and so it may be freely
>  * used for any legal purpose, although
>  * I would recommend against using it
>  * in circumstances where you really need
>  * a cryptographically secure PRNG, because
>  * it isn't one.
>  */
> #ifndef H_CISPRNG_H_
> #define H_CISPRNG_H_
> 
> #define CISPRNG_RAND_MAX (0x7FFFFFFFUL)
> 
> /* Single state versions */
> /* set the seed */
> void cisprng_set(unsigned long seed);
> /* get a PRN */
> unsigned long cisprng_random(void);
> /* get a PRN in a range */
> unsigned long cisprng_range(unsigned long low,
>                             unsigned long high);
> /* Custom state versions */
> unsigned long cisprng_csrandom(unsigned long *pseed);
> unsigned long cisprng_csrange(unsigned long *pseed,
>                               unsigned long low,
>                               unsigned long high);
> #endif
> /* end of cisprng.h */
> 
> Note that it doesn't define any types. But if it did, they would go
> after the #defines but before the function prototypes. Nor does it
> include any headers. But *if* I'd used, say, size_t *in the header*, I'd
> have included <stddef.h> in the header. But I would *not* include
> <stddef.h> in the header just because the corresponding C file used
> size_t (which, actually, it doesn't). A module header is not a place to
> hide #includes.
> 

I hope I am not going to cause too much confusion to a learner here, but
I have a few comments about your rules for headers.  It may be that you
agree with me, and just did not write your complete rule set.

I think it is important to emphasis that you should only have macros,
types, function prototypes, etc., in a header if these declarations will
be used by other modules.  Anything that is used only within the
implementation C file, should be in that file alone - and not the
header.  So if you have a function "cisprng_calculate" that is not
exported, it should not be declared in the header file, and it should be
declared "static" in the C file.

Regarding macros, I would always limit the use of macros to when you
actually need them to be macros.  If usage of CISPRNG_RAND_MAX permits,
I would prefer:

static const unsigned long int cisprng_rand_max = 0x7ffffffful;

The rules of "const" objects in C mean that you often /do/ need to make
such things #define'd values rather than static consts.

Assuming you are using C99 or C11 rather than C90 (I know some people
still use C90), then "static inline" functions are often a better
alternative to function-like macros, and can go in headers as well.


I would also put more emphasis on the comments.  Ideally, the comments
in a header should tell the reader all they need to know about the
functions and how to use them.  It makes it easier for people using the
code - they don't have to look elsewhere for documentation.  Of course,
there are limits - comments should not be so extensive that it is hard
to see the code!

0
David
12/15/2016 3:14:21 PM
On 15/12/16 15:14, David Brown wrote:
> On 15/12/16 15:12, Richard Heathfield wrote:
>> [the code]
>>
>> On 15/12/16 13:05, Alla _ wrote:
>>
>>> (3) I will use a header file, in which I will store preprocessor
>>> commands, like
>>> #define and #include, as well as functions' declarations, etc.;
>>
>> A well-designed header contains *only* the following:
>>
>> 1) inclusion guards;
>> 2) macro definitions (#defines);
>> 3) type definitions (typedefs);
>> 4) function prototypes;
>> 5) comments;
>> 6) #include directives *ONLY* for those headers that are required for
>> the immediate purposes of the header itself.
>>
>> For example, here's a quick header I threw together a year or so ago:
>>
>> /* CISPRNG - Cryptographically In-Secure
>>  * Pseudo-Random Number Generator
>>  *
>>  * Written by Richard Heathfield,
>>  * 20 October 2015.
>>  *
>>  * You will also need cisprng.c and a
>>  * C compiler.
>>  *
>>  * The author places this code into the
>>  * public domain, and so it may be freely
>>  * used for any legal purpose, although
>>  * I would recommend against using it
>>  * in circumstances where you really need
>>  * a cryptographically secure PRNG, because
>>  * it isn't one.
>>  */
>> #ifndef H_CISPRNG_H_
>> #define H_CISPRNG_H_
>>
>> #define CISPRNG_RAND_MAX (0x7FFFFFFFUL)
>>
>> /* Single state versions */
>> /* set the seed */
>> void cisprng_set(unsigned long seed);
>> /* get a PRN */
>> unsigned long cisprng_random(void);
>> /* get a PRN in a range */
>> unsigned long cisprng_range(unsigned long low,
>>                             unsigned long high);
>> /* Custom state versions */
>> unsigned long cisprng_csrandom(unsigned long *pseed);
>> unsigned long cisprng_csrange(unsigned long *pseed,
>>                               unsigned long low,
>>                               unsigned long high);
>> #endif
>> /* end of cisprng.h */
>>
>> Note that it doesn't define any types. But if it did, they would go
>> after the #defines but before the function prototypes. Nor does it
>> include any headers. But *if* I'd used, say, size_t *in the header*, I'd
>> have included <stddef.h> in the header. But I would *not* include
>> <stddef.h> in the header just because the corresponding C file used
>> size_t (which, actually, it doesn't). A module header is not a place to
>> hide #includes.
>>
>
> I hope I am not going to cause too much confusion to a learner here, but
> I have a few comments about your rules for headers.  It may be that you
> agree with me, and just did not write your complete rule set.

I think we might be able to save Alla some confusion here by suggesting 
that she avert her eyes. :-)

> I think it is important to emphasis that you should only have macros,
> types, function prototypes, etc., in a header if these declarations will
> be used by other modules.

Yes, but.

With me so far? :-)

I was going to dig out a quotation from King Lear about not taking 
things far enough, but it turned out not to be quite so apt as I'd 
hoped. Oh well.

Module, right? Well, we have to say what we mean by "module". We might 
reasonably define it as a source and its associated header, in which 
case I think your point is fine. But we might also define it as a set of 
related functions, which might be spread over several sources. If we 
define it that way (and, frankly, I do), then it becomes reasonable to 
think of a module as having two headers: one public, and one private. 
The private header is shared across all the sources that go to make up 
that module, and the public header is shared across other modules.

So, for example, we might have an abstract data type that is defined in 
mylittlemoduleinternal.h and shared across mylittlemoduleA.c, 
mylittlemoduleB.c, and mylittlemoduleC.c. And then we have a bunch of 
prototypes listed in mylittlemodule.h, for sharing with other modules.

Would you agree with that?

> Anything that is used only within the
> implementation C file, should be in that file alone - and not the
> header.  So if you have a function "cisprng_calculate" that is not
> exported, it should not be declared in the header file, and it should be
> declared "static" in the C file.

But if it is needed by other sources within the same module, that gives 
us a problem. One possible solution is to share a function pointer in 
the private header --- but it's messy.

> Regarding macros, I would always limit the use of macros to when you
> actually need them to be macros.  If usage of CISPRNG_RAND_MAX permits,
> I would prefer:
>
> static const unsigned long int cisprng_rand_max = 0x7ffffffful;

Yeah, but this is C90, and it's an array size. :-)

Okay, obviously it isn't. But I'm sure you take my point. That one could 
indeed have been a const, but that isn't always possible.

> The rules of "const" objects in C mean that you often /do/ need to make
> such things #define'd values rather than static consts.

Yes.

> Assuming you are using C99 or C11 rather than C90 (I know some people
> still use C90),

You hear that, folks? David thinks I'm *some people*! Fame at last!

> then "static inline" functions are often a better
> alternative to function-like macros, and can go in headers as well.
>
> I would also put more emphasis on the comments.  Ideally, the comments
> in a header should tell the reader all they need to know about the
> functions and how to use them.

Up to a point, Lord Copper.

> It makes it easier for people using the
> code - they don't have to look elsewhere for documentation.  Of course,
> there are limits - comments should not be so extensive that it is hard
> to see the code!

And that's why we have external docs.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 3:37:55 PM
On Thursday, December 15, 2016 at 4:48:30 PM UTC+3, Richard Heathfield wrote:
> On 15/12/16 13:05, Alla _ wrote:
> > Hello!
> >
> > (7) lengths of each line: I use strlen to see what is happening with the length
> > of the line; I have two files - the first one is the one I have converted and it
> > has two additional, yet undetected values at the end of each line before the
> > '\0'; another file, which is definitely a correct csv file, has been kindly
> > converted by Richard, and this one prints out additional 0 at the end of
> > each line upon converting numerical characters into integers.
> 
> In this article, I'm only going to explain that trailing 0. And I'm 
> going to try to do it in such a way that you can find the problem for 
> yourself.
> 
> Here is a sample from the file I sent you:
> 
> Income Statement,2011,2012,2013,2014,2015
> 
> Here is a sample from the file that is causing you problems with a 
> trailing 0:
> 
> Years,2011,2012,2013,2014,2015,
> 
> Let's compare them directly:
> 
> Income Statement,2011,2012,2013,2014,2015
> Years,2011,2012,2013,2014,2015,
> 
> Now let's ignore the first field of each:
> 
> 2011,2012,2013,2014,2015
> 2011,2012,2013,2014,2015,
> 
> 
> Do you see the difference? It's very slight, but it's very significant. 
> Knowing the difference will, I think, enable you to see the problem, and 
> it's a problem with your data, not your program. (Which isn't to say 
> that the problem could not be dealt with in code. But in this case, I 
> think it would be far simpler just to fix the data.)
> 
Thank you very much. Yes, I see the difference in that trailing comma. And the file
that you have sent me (which is the second example I use; I have only
corrected the name of the row in that file - years instead of income 
statement) produces one extra zero as can be seen in the output. 
Where could that zero come from?







0
Alla
12/15/2016 4:22:10 PM
On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
> [the code]
> 
> On 15/12/16 13:05, Alla _ wrote:
> 
> > (3) I will use a header file, in which I will store preprocessor commands, like
> > #define and #include, as well as functions' declarations, etc.;
> 
> A well-designed header contains *only* the following:
> 
> 1) inclusion guards;
> 2) macro definitions (#defines);
> 3) type definitions (typedefs);
> 4) function prototypes;
Thank you very much for this list. This is what I have planned 
to include.
> 5) comments;
> 6) #include directives *ONLY* for those headers that are required for 
> the immediate purposes of the header itself.
Thank you - I have seen somewhere that sometimes all headers
can be included in the single header file (not only directly needed
for processing things within this header file), which sounded strange;
thank you for confirming that it is an incorrect approach, so I will
not repeat that bad practice. 
> 
> For example, here's a quick header I threw together a year or so ago:
> 
> /* CISPRNG - Cryptographically In-Secure
>   * Pseudo-Random Number Generator
>   *
>   * Written by Richard Heathfield,
>   * 20 October 2015.
>   *
>   * You will also need cisprng.c and a
>   * C compiler.
>   *
>   * The author places this code into the
>   * public domain, and so it may be freely
>   * used for any legal purpose, although
>   * I would recommend against using it
>   * in circumstances where you really need
>   * a cryptographically secure PRNG, because
>   * it isn't one.
>   */
> #ifndef H_CISPRNG_H_
> #define H_CISPRNG_H_
> 
> #define CISPRNG_RAND_MAX (0x7FFFFFFFUL)
> 
> /* Single state versions */
> /* set the seed */
> void cisprng_set(unsigned long seed);
> /* get a PRN */
> unsigned long cisprng_random(void);
> /* get a PRN in a range */
> unsigned long cisprng_range(unsigned long low,
>                              unsigned long high);
> /* Custom state versions */
> unsigned long cisprng_csrandom(unsigned long *pseed);
> unsigned long cisprng_csrange(unsigned long *pseed,
>                                unsigned long low,
>                                unsigned long high);
> #endif
> /* end of cisprng.h */
> 
> Note that it doesn't define any types. But if it did, they would go 
> after the #defines but before the function prototypes. Nor does it 
> include any headers. But *if* I'd used, say, size_t *in the header*, I'd 
> have included <stddef.h> in the header. But I would *not* include 
> <stddef.h> in the header just because the corresponding C file used 
> size_t (which, actually, it doesn't). A module header is not a place to 
> hide #includes.
<snip>
0
Alla
12/15/2016 4:26:41 PM
On 15/12/16 16:22, Alla _ wrote:
> On Thursday, December 15, 2016 at 4:48:30 PM UTC+3, Richard Heathfield wrote:
>> On 15/12/16 13:05, Alla _ wrote:
>>> Hello!
>>>
>>> (7) lengths of each line: I use strlen to see what is happening with the length
>>> of the line; I have two files - the first one is the one I have converted and it
>>> has two additional, yet undetected values at the end of each line before the
>>> '\0'; another file, which is definitely a correct csv file, has been kindly
>>> converted by Richard, and this one prints out additional 0 at the end of
>>> each line upon converting numerical characters into integers.
>>
>> In this article, I'm only going to explain that trailing 0. And I'm
>> going to try to do it in such a way that you can find the problem for
>> yourself.
>>
>> Here is a sample from the file I sent you:
>>
>> Income Statement,2011,2012,2013,2014,2015
>>
>> Here is a sample from the file that is causing you problems with a
>> trailing 0:
>>
>> Years,2011,2012,2013,2014,2015,
>>
>> Let's compare them directly:
>>
>> Income Statement,2011,2012,2013,2014,2015
>> Years,2011,2012,2013,2014,2015,
>>
>> Now let's ignore the first field of each:
>>
>> 2011,2012,2013,2014,2015
>> 2011,2012,2013,2014,2015,
>>
>>
>> Do you see the difference? It's very slight, but it's very significant.
>> Knowing the difference will, I think, enable you to see the problem, and
>> it's a problem with your data, not your program. (Which isn't to say
>> that the problem could not be dealt with in code. But in this case, I
>> think it would be far simpler just to fix the data.)
>>
> Thank you very much. Yes, I see the difference in that trailing comma. And the file
> that you have sent me (which is the second example I use; I have only
> corrected the name of the row in that file - years instead of income
> statement) produces one extra zero as can be seen in the output.

No, it doesn't. I just tested it here, and it's fine.

I think you've got your test files muddled up.

> Where could that zero come from?

My data:
2011,2012,2013,2014,2015
Your data:
2011,2012,2013,2014,2015,

My data doesn't produce the trailing zero. Yours does. Hence there is 
something significant about that trailing comma. Hint: how does your 
parsing routine know when it has processed the last field in one record 
line?

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 4:28:54 PM
On 15/12/2016 15:14, David Brown wrote:

> Regarding macros, I would always limit the use of macros to when you
> actually need them to be macros.  If usage of CISPRNG_RAND_MAX permits,
> I would prefer:
>
> static const unsigned long int cisprng_rand_max = 0x7ffffffful;

You said elsewhere that assembly syntax is not literature, but this is 
verging on War and Peace.

It took a couple of readings to establish it was only declaring one thing.

-- 
bartc
0
BartC
12/15/2016 4:30:59 PM
On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
> [the code]
> 
> On 15/12/16 13:05, Alla _ wrote:
> 
<snip>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <ctype.h>
> > #include <string.h>
> >
> >
> > #define BUFFER_SIZE 100
> > //#define ARRAY_SIZE(a) (sizeof(a)/sizeof(*(a)))
> > #define INCOME_STATEMENT "files/income_test.csv"
> >
> > int main(int argc, char *argv[])
> > {
> >     //check for correct number of arguments
> >     if(argc > 2)
> >     {
> >         printf("Usage: program_name [file_name]");
> 
> You might want a \n on there. Or use puts() instead if you prefer.
> 
> Otherwise, this happens:
> 
> Usage: program_name [file_name]me@mymachine$
> 
> which probably isn't what you wanted to see.
> 
> >         return EXIT_FAILURE;
> >     }
> >     //determine a file to use
> >     char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
> 
> I don't suppose you plan to change the filename, so why not make this const?
> 
> const char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
> 
> This will remove the *only* diagnostic message that gcc gave me (and by 
> the way, to get only one warning when using my flag choices is quite an 
> achievement!).
Oh ) I tried, and very glad to read this. Thank you!
I always use only the following flag, which you have taught me to use:
gcc -Wall -Werror -Wextra -pedantic -std=c99 program.c -o program
Shall I use some more?
> 
<snip>
0
Alla
12/15/2016 4:32:33 PM
On 15/12/16 16:26, Alla _ wrote:
 > Richard Heathfield wrote:
<snip>
>> 6) #include directives *ONLY* for those headers that are required for
>> the immediate purposes of the header itself.
> Thank you - I have seen somewhere that sometimes all headers
> can be included in the single header file (not only directly needed
> for processing things within this header file), which sounded strange;
> thank you for confirming that it is an incorrect approach, so I will
> not repeat that bad practice.

It's bad practice, yes. I wouldn't go so far as to say that it is 
"incorrect" - it doesn't break the rules of the language. It's just not 
a very good idea. It's a bit like putting porridge and custard on toast.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 4:33:08 PM
On 15/12/16 16:30, BartC wrote:
> On 15/12/2016 15:14, David Brown wrote:
>
>> Regarding macros, I would always limit the use of macros to when you
>> actually need them to be macros.  If usage of CISPRNG_RAND_MAX permits,
>> I would prefer:
>>
>> static const unsigned long int cisprng_rand_max = 0x7ffffffful;
>
> You said elsewhere that assembly syntax is not literature, but this is
> verging on War and Peace.

"War and Peace" is 587,287 words, so even if we take every token as a 
"word", this is only 0.0000153247496 of a war. Barely even a playground 
skirmish.

> It took a couple of readings to establish it was only declaring one thing.

You need to do a lot more C programming. One learns, after a while, to 
take these things in at a single glance.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 4:36:56 PM
On 15/12/16 16:32, Alla _ wrote:
> On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
>> [the code]
>>
>> On 15/12/16 13:05, Alla _ wrote:
>>
> <snip>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <ctype.h>
>>> #include <string.h>
>>>
>>>
>>> #define BUFFER_SIZE 100
>>> //#define ARRAY_SIZE(a) (sizeof(a)/sizeof(*(a)))
>>> #define INCOME_STATEMENT "files/income_test.csv"
>>>
>>> int main(int argc, char *argv[])
>>> {
>>>     //check for correct number of arguments
>>>     if(argc > 2)
>>>     {
>>>         printf("Usage: program_name [file_name]");
>>
>> You might want a \n on there. Or use puts() instead if you prefer.
>>
>> Otherwise, this happens:
>>
>> Usage: program_name [file_name]me@mymachine$
>>
>> which probably isn't what you wanted to see.
>>
>>>         return EXIT_FAILURE;
>>>     }
>>>     //determine a file to use
>>>     char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
>>
>> I don't suppose you plan to change the filename, so why not make this const?
>>
>> const char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
>>
>> This will remove the *only* diagnostic message that gcc gave me (and by
>> the way, to get only one warning when using my flag choices is quite an
>> achievement!).
> Oh ) I tried, and very glad to read this. Thank you!
> I always use only the following flag, which you have taught me to use:
> gcc -Wall -Werror -Wextra -pedantic -std=c99 program.c -o program
> Shall I use some more?

That's plenty for now.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 4:37:52 PM
On Thursday, December 15, 2016 at 7:29:03 PM UTC+3, Richard Heathfield wrote:
> On 15/12/16 16:22, Alla _ wrote:
> > On Thursday, December 15, 2016 at 4:48:30 PM UTC+3, Richard Heathfield wrote:
> >> On 15/12/16 13:05, Alla _ wrote:
> >>> Hello!
> >>><snip>
> >>
> > Thank you very much. Yes, I see the difference in that trailing comma. And the file
> > that you have sent me (which is the second example I use; I have only
> > corrected the name of the row in that file - years instead of income
> > statement) produces one extra zero as can be seen in the output.
> 
> No, it doesn't. I just tested it here, and it's fine.
> 
> I think you've got your test files muddled up.
> 
> > Where could that zero come from?
> 
> My data:
> 2011,2012,2013,2014,2015
> Your data:
> 2011,2012,2013,2014,2015,
> 
Please, take a look at two versions of csv file I have posted in the first message:
the first one does have that trailing comma, and the second doesn't.
The first version has two trailing zeros, the second has one trailing zero;
I think that the zero I see when processing your file comes from the 
newline character, which is all right because I don't stop the loop before 
the '\n'. 
I will look it all this again. 
> My data doesn't produce the trailing zero. Yours does. Hence there is 
> something significant about that trailing comma. Hint: how does your 
> parsing routine know when it has processed the last field in one record 
> line?
I use the null terminating character as the end of the line; initially I wanted
to use the newline character, but had problems with the condition; I didn't
get the correct output. That is why I have opened the csv topic in another 
thread. I have seen your hex program, so I have to give it a shot to see 
what's happening with my conversions. 
> 
> -- 
> Richard Heathfield
> Email: rjh at cpax dot org dot uk
> "Usenet is a strange place" - dmr 29 July 1999
> Sig line 4 vacant - apply within
0
Alla
12/15/2016 5:25:34 PM
On 15/12/16 17:25, Alla _ wrote:
<snip>
> Please, take a look at two versions of csv file I have posted in the first message:
> the first one does have that trailing comma, and the second doesn't.
> The first version has two trailing zeros, the second has one trailing zero;
> I think that the zero I see when processing your file comes from the
> newline character, which is all right because I don't stop the loop before
> the '\n'.

I'm going to show you four sets of test data, and four sets of results.

The first set of test data is the CSV file I sent you.
The second set is your first CSV file (the uppermost one in your article).
The third set is your second CSV file (the lowermost one in your article).
The fourth set is a CSV file that I have modified to illustrate a point, 
which we'll come to in its proper place.

First, the test data I sent you:

++++ inc_st_test.csv ++++
Income Statement,2011,2012,2013,2014,2015
Sales,1062,1252,1587,1934,2519
Cost of Goods Sold,654,814,1009,1190,1499
   Gross Profit,408,438,578,744,1020
"Selling, General, and Admin Exp",254,271,364,454,576
   Operating Income before Depr,154,167,214,290,444
Depreciation and Amortization,25,31,38,52,70
   Operating Profit,129,136,176,238,374
Interest Expense,4,3,3,1,4
Other Gains and Losses,0,7,10,0,-1
   Pretax Income,125,126,163,237,371
Income Tax Expense,55,52,65,92,141
   Net Income,70,74,98,145,230
++++ end of inc_st_test.csv ++++

Results (first 5 lines only):
(1) Income Statement,2011,2012,2013,2014,2015
(1.1) buffer length = 42
(2) Income Statement
(2.2) buffer length = 25
(3) 2011 2012 2013 2014 2015

Note the ABSENCE of trailing 0s in those results.

Now for the first file you posted. Note the trailing commas.

++++ test1.csv ++++
Years,2011,2012,2013,2014,2015,
Sales,1062,1252,1587,1934,2519,
Cost of Goods Sold,654,814,1009,1190,1499,
   Gross Profit,408,438,578,744,1020,
SG&A,254,271,364,454,576,
   Operating Income before Depr,154,167,214,290,444,
Depreciation and Amortization,25,31,38,52,70,
   Operating Profit,129,136,176,238,374,
Interest Expense,4,3,3,1,4,
Other Gains and Losses,0,7,10,0,-1,
   Pretax Income,125,126,163,237,371,
Income Tax Expense,55,52,65,92,141,
   Net Income,70,74,98,145,230,
++++ end of test1.csv ++++

Results (first 5 lines only):
(1) Years,2011,2012,2013,2014,2015,
(1.1) buffer length = 32
(2) Years
(2.2) buffer length = 26
(3) 2011 2012 2013 2014 2015 0

Note the trailing 0.

Now for the second CSV file you posted, with NO trailing commas:

++++ test2.csv ++++
Years,2011,2012,2013,2014,2015
Sales,1062,1252,1587,1934,2519
Cost of Goods Sold,654,814,1009,1190,1499
   Gross Profit,408,438,578,744,1020
"Selling, General, and Admin Exp",254,271,364,454,576
   Operating Income before Depr,154,167,214,290,444
Depreciation and Amortization,25,31,38,52,70
   Operating Profit,129,136,176,238,374
Interest Expense,4,3,3,1,4
Other Gains and Losses,0,7,10,0,-1
   Pretax Income,125,126,163,237,371
Income Tax Expense,55,52,65,92,141
   Net Income,70,74,98,145,230
++++ end of test2.csv ++++

Again, note that there are no trailing commas.

Results (first 5 lines only):
(1) Years,2011,2012,2013,2014,2015
(1.1) buffer length = 31
(2) Years
(2.2) buffer length = 25
(3) 2011 2012 2013 2014 2015

Note the ABSENCE of trailing 0s.

Now for an experiment.

For my fourth test, I will deliberately place some extra commas at the 
end of each line.

++++ test3.csv ++++
Years,2011,2012,2013,2014,2015,,,
Sales,1062,1252,1587,1934,2519,,,
Cost of Goods Sold,654,814,1009,1190,1499,,,
   Gross Profit,408,438,578,744,1020,,,
SG&A,254,271,364,454,576,,,
   Operating Income before Depr,154,167,214,290,444,,,
Depreciation and Amortization,25,31,38,52,70,,,
   Operating Profit,129,136,176,238,374,,,
Interest Expense,4,3,3,1,4,,,
Other Gains and Losses,0,7,10,0,-1,,,
   Pretax Income,125,126,163,237,371,,,
Income Tax Expense,55,52,65,92,141,,,
   Net Income,70,74,98,145,230,,,
++++ end of test3.csv ++++

Note that there are THREE trailing commas at the end of each line. 
Prediction: 3 trailing 0s.

Results (first 5 lines only):
(1) Years,2011,2012,2013,2014,2015,,,
(1.1) buffer length = 34
(2) Years
(2.2) buffer length = 28
(3) 2011 2012 2013 2014 2015 0 0 0

Three trailing 0s. Hypothesis confirmed.

The file I sent you has NO trailing commas, and NO trailing 0s.

The file that I doctored to have THREE trailing commas has THREE 
trailing 0s.

I conclude that each trailing comma produces a trailing 0 to match.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 7:31:16 PM
Richard Heathfield <rjh@cpax.org.uk> writes:
> [the code]
>
> On 15/12/16 13:05, Alla _ wrote:
>
>> (3) I will use a header file, in which I will store preprocessor
>> commands, like #define and #include, as well as functions'
>> declarations, etc.;
>
> A well-designed header contains *only* the following:
>
> 1) inclusion guards;
> 2) macro definitions (#defines);
> 3) type definitions (typedefs);
> 4) function prototypes;
> 5) comments;
> 6) #include directives *ONLY* for those headers that are required for 
> the immediate purposes of the header itself.

I'd also include struct, union, and enum type declarations if
appropriate.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
12/15/2016 8:23:03 PM
On Thu, 15 Dec 2016 05:05:12 -0800 (PST), Alla _
<modelling.data@gmail.com> wrote:

>Hello!
>
>I am not a fan of using exclamation marks, and any other text stressing effects, 
>so, instead of those, by resorting to a detailed and a bit verbose description below,
>I hope to make myself clear in terms of what I am trying to achieve, which I often
>fail to do )
>
>Please, take a look at program, which:

<snip> 

>Second csv file (a correct one in terms of csv format):
>Years,2011,2012,2013,2014,2015

<snip>

>Second output:
>(1) Years,2011,2012,2013,2014,2015
>(1.1) buffer length = 32
>(2) Years
>(2.2) buffer length = 26
>(3) 2011 2012 2013 2014 2015 0 

The code you show coupled with the input data you show CANNOT produce
the output you show!

There are only 30 text characters in your string.  The '\n' is the
31st and the terminating '\0' is the 32nd.  Since strlen does not
count the '\0', buffer_length should contain 31, not the 32 your
program prints.

There is something you are not showing us:  
    Possibly trailing data in the file.  That is why several have
asked for a hex dump.
    Possibly different code than what is in your message.

In fact, a trailing blank after 2015 produces exactly the result you
see.  Now the question becomes - why are you not using the correct CSV
file Richard gave you instead of ones you generate yourself with
various errors?

-- 
Remove del for email
0
Barry
12/15/2016 8:36:08 PM
On 15/12/16 20:23, Keith Thompson wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
>> [the code]
>>
>> On 15/12/16 13:05, Alla _ wrote:
>>
>>> (3) I will use a header file, in which I will store preprocessor
>>> commands, like #define and #include, as well as functions'
>>> declarations, etc.;
>>
>> A well-designed header contains *only* the following:
>>
>> 1) inclusion guards;
>> 2) macro definitions (#defines);
>> 3) type definitions (typedefs);
>> 4) function prototypes;
>> 5) comments;
>> 6) #include directives *ONLY* for those headers that are required for
>> the immediate purposes of the header itself.
>
> I'd also include struct, union, and enum type declarations if
> appropriate.

Hmmm. I wouldn't, because I'd typedef them (#3). But yes, I will concede 
that that is also an appropriate category for a header.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 8:39:24 PM
Richard Heathfield <rjh@cpax.org.uk> writes:
> On 15/12/16 13:05, Alla _ wrote:
<snip>
>>         char buffer[BUFFER_SIZE] = "";
>>         char array_names[50][50];
>>         int  array_values[50][50];
<snip>
>>         //Read each line from the file, store in the buffer
>>         for ( i = 0; fgets(buffer, sizeof(buffer), fp) != NULL; ++i )
>
> Since i isn't part of the loop control logic,

Ah, but it should be.

> I would suggest that you
> replace this line with the much more usual:
>
>   i = 0;
>   while(fgets(buffer, sizeof buffer, fp) != NULL)
>   {
>     /* stuff */
>     i++;
>   }

Maybe Alla_ is planning to grow the array, but I can't help commenting
on the code in front of me, and with a fixed-size array, I'd want to
include a test on i.  That could be written:

  for (i = 0; i < 50 && fgets(buffer, sizeof(buffer), fp); i++)

but that is verging on for abuse.  So maybe

  i = 0;
  while (i < 50 && fgets(buffer, sizeof(buffer)) {
      i++;
  }

or

  i = 0;
  while (fgets(buffer, sizeof(buffer))
      if (i < 50) {
          ...
          i++;
      }

depending on the exact intent.

<snip>
-- 
Ben.
0
Ben
12/15/2016 9:13:13 PM
On 15/12/16 16:37, Richard Heathfield wrote:
> On 15/12/16 15:14, David Brown wrote:
>> On 15/12/16 15:12, Richard Heathfield wrote:
>>> [the code]
>>>
>>> On 15/12/16 13:05, Alla _ wrote:
>>>
>>>> (3) I will use a header file, in which I will store preprocessor
>>>> commands, like
>>>> #define and #include, as well as functions' declarations, etc.;
>>>
>>> A well-designed header contains *only* the following:
>>>
>>> 1) inclusion guards;
>>> 2) macro definitions (#defines);
>>> 3) type definitions (typedefs);
>>> 4) function prototypes;
>>> 5) comments;
>>> 6) #include directives *ONLY* for those headers that are required for
>>> the immediate purposes of the header itself.
>>>
>>> For example, here's a quick header I threw together a year or so ago:
>>>
>>> /* CISPRNG - Cryptographically In-Secure
>>>  * Pseudo-Random Number Generator
>>>  *
>>>  * Written by Richard Heathfield,
>>>  * 20 October 2015.
>>>  *
>>>  * You will also need cisprng.c and a
>>>  * C compiler.
>>>  *
>>>  * The author places this code into the
>>>  * public domain, and so it may be freely
>>>  * used for any legal purpose, although
>>>  * I would recommend against using it
>>>  * in circumstances where you really need
>>>  * a cryptographically secure PRNG, because
>>>  * it isn't one.
>>>  */
>>> #ifndef H_CISPRNG_H_
>>> #define H_CISPRNG_H_
>>>
>>> #define CISPRNG_RAND_MAX (0x7FFFFFFFUL)
>>>
>>> /* Single state versions */
>>> /* set the seed */
>>> void cisprng_set(unsigned long seed);
>>> /* get a PRN */
>>> unsigned long cisprng_random(void);
>>> /* get a PRN in a range */
>>> unsigned long cisprng_range(unsigned long low,
>>>                             unsigned long high);
>>> /* Custom state versions */
>>> unsigned long cisprng_csrandom(unsigned long *pseed);
>>> unsigned long cisprng_csrange(unsigned long *pseed,
>>>                               unsigned long low,
>>>                               unsigned long high);
>>> #endif
>>> /* end of cisprng.h */
>>>
>>> Note that it doesn't define any types. But if it did, they would go
>>> after the #defines but before the function prototypes. Nor does it
>>> include any headers. But *if* I'd used, say, size_t *in the header*, I'd
>>> have included <stddef.h> in the header. But I would *not* include
>>> <stddef.h> in the header just because the corresponding C file used
>>> size_t (which, actually, it doesn't). A module header is not a place to
>>> hide #includes.
>>>
>>
>> I hope I am not going to cause too much confusion to a learner here, but
>> I have a few comments about your rules for headers.  It may be that you
>> agree with me, and just did not write your complete rule set.
>
> I think we might be able to save Alla some confusion here by suggesting
> that she avert her eyes. :-)
>
>> I think it is important to emphasis that you should only have macros,
>> types, function prototypes, etc., in a header if these declarations will
>> be used by other modules.
>
> Yes, but.
>
> With me so far? :-)

Yes...

>
> I was going to dig out a quotation from King Lear about not taking
> things far enough, but it turned out not to be quite so apt as I'd
> hoped. Oh well.
>
> Module, right? Well, we have to say what we mean by "module". We might
> reasonably define it as a source and its associated header, in which
> case I think your point is fine.

Yes, that is what I was meaning by "module".  Of course you are correct 
that I should have defined it - especially as it could mean other things.

> But we might also define it as a set of
> related functions, which might be spread over several sources. If we
> define it that way (and, frankly, I do), then it becomes reasonable to
> think of a module as having two headers: one public, and one private.
> The private header is shared across all the sources that go to make up
> that module, and the public header is shared across other modules.

Agreed.

If you prefer to call this sort of multi-file "lump" a "module", what do 
you call a header/C file "lump" ?

It is a real shame that C has no good way of dealing with this sort of 
thing properly - no namespaces, and only two levels of identifier 
visibility (static to a compilation unit, or global for the whole program).

>
> So, for example, we might have an abstract data type that is defined in
> mylittlemoduleinternal.h and shared across mylittlemoduleA.c,
> mylittlemoduleB.c, and mylittlemoduleC.c. And then we have a bunch of
> prototypes listed in mylittlemodule.h, for sharing with other modules.
>
> Would you agree with that?

Yes, that seems reasonable.  It is not exactly how I would organise it, 
but close enough for now.

>
>> Anything that is used only within the
>> implementation C file, should be in that file alone - and not the
>> header.  So if you have a function "cisprng_calculate" that is not
>> exported, it should not be declared in the header file, and it should be
>> declared "static" in the C file.
>
> But if it is needed by other sources within the same module, that gives
> us a problem. One possible solution is to share a function pointer in
> the private header --- but it's messy.

Yes.

In this particular case, the program is small enough that such 
multi-file modules are not going to be necessary, so a simpler system is 
possible.  But the simpler system does not scale to a more hierarchical 
organisation without changes such as you suggest.

>
>> Regarding macros, I would always limit the use of macros to when you
>> actually need them to be macros.  If usage of CISPRNG_RAND_MAX permits,
>> I would prefer:
>>
>> static const unsigned long int cisprng_rand_max = 0x7ffffffful;
>
> Yeah, but this is C90, and it's an array size. :-)
>
> Okay, obviously it isn't. But I'm sure you take my point. That one could
> indeed have been a const, but that isn't always possible.
>

As I note below.

In some circumstances you can use a enum constant instead of a #define'd 
constant when a "static const" won't work - but I find that to be rather 
artificial unless it is part of a collection of constants for which an 
enum makes sense.

>> The rules of "const" objects in C mean that you often /do/ need to make
>> such things #define'd values rather than static consts.
>
> Yes.
>
>> Assuming you are using C99 or C11 rather than C90 (I know some people
>> still use C90),
>
> You hear that, folks? David thinks I'm *some people*! Fame at last!

I am sure you have good reasons for using C90.  But I think it usually 
makes sense for someone relatively new to C to use C99 features when 
they make coding clearer (following Heathfield's rule, naturally) - C99 
is the current dominant C standard used, I believe.  (And C11 doesn't 
add much that I find useful - at least not beyond the gcc extensions I 
have used for years.  Static assertions are great, but you could do them 
with an ugly macro in C90 and C99.)

>
>> then "static inline" functions are often a better
>> alternative to function-like macros, and can go in headers as well.
>>
>> I would also put more emphasis on the comments.  Ideally, the comments
>> in a header should tell the reader all they need to know about the
>> functions and how to use them.
>
> Up to a point, Lord Copper.
>
>> It makes it easier for people using the
>> code - they don't have to look elsewhere for documentation.  Of course,
>> there are limits - comments should not be so extensive that it is hard
>> to see the code!
>
> And that's why we have external docs.
>

If the usage documentation can be written in the header (without passing 
"the point"), then that is a good thing.  External documentation can go 
into more detail, examples, etc., but there is always the risk of 
getting out of synchronisation between the code and the document or 
simply not having the document at hand when you are working with the 
code.  Documentation in the header file is /always/ quickly at hand when 
you are working with the code!


0
David
12/15/2016 10:20:06 PM
On 15/12/16 17:36, Richard Heathfield wrote:
> On 15/12/16 16:30, BartC wrote:
>> On 15/12/2016 15:14, David Brown wrote:
>>
>>> Regarding macros, I would always limit the use of macros to when you
>>> actually need them to be macros.  If usage of CISPRNG_RAND_MAX permits,
>>> I would prefer:
>>>
>>> static const unsigned long int cisprng_rand_max = 0x7ffffffful;
>>
>> You said elsewhere that assembly syntax is not literature, but this is
>> verging on War and Peace.
>
> "War and Peace" is 587,287 words, so even if we take every token as a
> "word", this is only 0.0000153247496 of a war. Barely even a playground
> skirmish.
>
>> It took a couple of readings to establish it was only declaring one
>> thing.
>
> You need to do a lot more C programming. One learns, after a while, to
> take these things in at a single glance.
>

In my programming, it would like be:

static const uint32_t cisprng_rand_max = 0x7fffffff;

But my code generally doesn't need to be as portable as Richard likes 
his code to be.  While my code might be used with 16-bit or 32-bit ints, 
on cpus from 8-bit up to 64-bit, there are none that I have come across 
that don't have uint32_t.

(But then, I can use C99 and not just C90 :-) )

0
David
12/15/2016 10:29:33 PM
On 15/12/16 22:20, David Brown wrote:
> On 15/12/16 16:37, Richard Heathfield wrote:

<snip>

>> Module, right? Well, we have to say what we mean by "module". We might
>> reasonably define it as a source and its associated header, in which
>> case I think your point is fine.
>
> Yes, that is what I was meaning by "module".  Of course you are correct
> that I should have defined it - especially as it could mean other things.
>
>> But we might also define it as a set of
>> related functions, which might be spread over several sources. If we
>> define it that way (and, frankly, I do), then it becomes reasonable to
>> think of a module as having two headers: one public, and one private.
>> The private header is shared across all the sources that go to make up
>> that module, and the public header is shared across other modules.
>
> Agreed.
>
> If you prefer to call this sort of multi-file "lump" a "module", what do
> you call a header/C file "lump" ?

A lump. What else?

The correct term is "preprocessing translation unit". PTU, anyone? But I 
tend to eschew the P and stick to "translation unit" (which is actually 
the correct term for what remains after preprocessing, but I'm not too 
fussed about the difference except when arguing about the difference!).

> It is a real shame that C has no good way of dealing with this sort of
> thing properly - no namespaces, and only two levels of identifier
> visibility (static to a compilation unit, or global for the whole program).

Yes. I'd like to see an hierarchy of visibility. But it isn't going to 
happen, and I can live with that.

>> So, for example, we might have an abstract data type that is defined in
>> mylittlemoduleinternal.h and shared across mylittlemoduleA.c,
>> mylittlemoduleB.c, and mylittlemoduleC.c. And then we have a bunch of
>> prototypes listed in mylittlemodule.h, for sharing with other modules.
>>
>> Would you agree with that?
>
> Yes, that seems reasonable.  It is not exactly how I would organise it,
> but close enough for now.

I think that's important, actually. What I mean is, okay, you and I 
would tend to choose to organise stuff in different ways, but those 
differences are likely to be fairly minor, and each of us would 
recognise that the other has made reasonable choices that just happen to 
be different to our own reasonable choice. This is what cultural 
diversity is all about!

>>> Anything that is used only within the
>>> implementation C file, should be in that file alone - and not the
>>> header.  So if you have a function "cisprng_calculate" that is not
>>> exported, it should not be declared in the header file, and it should be
>>> declared "static" in the C file.
>>
>> But if it is needed by other sources within the same module, that gives
>> us a problem. One possible solution is to share a function pointer in
>> the private header --- but it's messy.
>
> Yes.

Yeeesss. It's... it isn't just messy, is it? It's actually icky. There 
is no /good/ solution, as far as I know.

What I actually do in this situation (when I get to choose, anyway) is 
to declare the function in the private header, and make it extern (not 
explicitly, but by not making it static), and then just not telling the 
user-programmer it's there! That way, even though they can in theory 
call it, first they have to find out about it, and they can only do that 
by going to some serious and deliberate investigative effort.

> In this particular case, the program is small enough that such
> multi-file modules are not going to be necessary, so a simpler system is
> possible.  But the simpler system does not scale to a more hierarchical
> organisation without changes such as you suggest.

Namespaces would be pleasant, wouldn't they? But we're not going to get 
them, and that's that.

>>> Assuming you are using C99 or C11 rather than C90 (I know some people
>>> still use C90),
>>
>> You hear that, folks? David thinks I'm *some people*! Fame at last!
>
> I am sure you have good reasons for using C90.  But I think it usually
> makes sense for someone relatively new to C to use C99 features when
> they make coding clearer (following Heathfield's rule, naturally)

My blushes, Watson!

My reasons for using C90 are as follows:

1) it works everywhere;
2) C99 doesn't add anything I need that I can't get from C++. So, if I 
need those features, I'll just write that code in C++ instead, which 
means I get lots of toys to play with.

> If the usage documentation can be written in the header (without passing
> "the point"), then that is a good thing.

A brief summary, yes.

> External documentation can go
> into more detail, examples, etc., but there is always the risk of
> getting out of synchronisation between the code and the document

But that's true of the internal documentation as well --- it can and 
does get out of synch with the code. The solution to synchronisation 
problems is discipline, and that's required whether the documentation is 
internal or external.

> or
> simply not having the document at hand when you are working with the
> code.  Documentation in the header file is /always/ quickly at hand when
> you are working with the code!

As long as you know the code well enough to know where to look, or know 
what you're looking for well enough to execute a grep. External docs can 
cover several headers in one indexed document. They can offer hyperlinks 
to specific function descriptions, they can contain diagrams (and I 
don't mean ASCII art), and they can use all the tricks of the word 
processing trade to make the document more readable than can generally 
be managed in eighty columns of Courier 12.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/15/2016 11:27:40 PM
On Friday, December 16, 2016 at 11:54:40 AM UTC+3, Richard Heathfield wrote:
> On 16/12/16 08:39, Alla _ wrote:
> > On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
> >> [the code]
> >>
> <snip>
> >>>     {
> >>>         char buffer[BUFFER_SIZE] = "";
> >>>         char array_names[50][50];
> >>>         int  array_values[50][50];
> >>
> >> I am taking you at your word and forbearing to comment on these values,
> >> but for the sake of a predictable life I would suggest that you
> >> initialise these two arrays:
> >>
> >> char array_names[50][50] = {0};
> >> int  array_values[50][50] = {0};
> >>
> >> which will take advantage of the default static initialisation rule to
> >> set everything to 0.
> >>
> > Is it a typo, or did you intend to use only {}, instead of {{}}?
> 
> It isn't a typo. {0} always works, for any kind of aggregate type. But 
> if you would prefer to use {{0}}, you {{{{{{{can}}}}}}}.
My compiler complained about {0} )
<snip>
0
Alla
12/16/2016 1:01:01 AM
On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
> [the code]
> 
> On 15/12/16 13:05, Alla _ wrote:
> 
<snip>
> > int main(int argc, char *argv[])
> > {
> >     //check for correct number of arguments
> >     if(argc > 2)
> >     {
> >         printf("Usage: program_name [file_name]");
> 
> You might want a \n on there. Or use puts() instead if you prefer.
> 
> Otherwise, this happens:
> 
> Usage: program_name [file_name]me@mymachine$
> 
Yes, I missed this one; I have forgotten to add the newline.
> which probably isn't what you wanted to see.
> 
> >         return EXIT_FAILURE;
> >     }
> >     //determine a file to use
> >     char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
> 
> I don't suppose you plan to change the filename, so why not make this const?
> 
> const char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
> 
> This will remove the *only* diagnostic message that gcc gave me (and by 
> the way, to get only one warning when using my flag choices is quite an 
> achievement!).
> 
> What should the program do if there are several arguments? At the 
> moment, if I tell it:
> 
> ./alla test1.csv test2.csv
> 
> I get this message:
> 
> Usage: program_name [file_name]
> 
> If the program can only handle one data file, I would expect to see 
> something like:
> 
> Sorry, I can only process one file. Ignoring subsequent arguments...
> 
> which you would code like this:
> 
> const char *data_file = (argc > 1)? argv[1] : INCOME_STATEMENT;
> if(argc > 2)
> {
>    fputs("Sorry, I can only process one file."
>          " Ignoring subsequent arguments...\n",
>          stderr);
> }
> 
Done, with a very slight change in words choice
> >
> >     FILE *fp = fopen(data_file, "r");
> >
> >     if ( fp )
> >     {
> >         char buffer[BUFFER_SIZE] = "";
> >         char array_names[50][50];
> >         int  array_values[50][50];
> 
> I am taking you at your word and forbearing to comment on these values, 
> but for the sake of a predictable life I would suggest that you 
> initialise these two arrays:
> 
> char array_names[50][50] = {0};
> int  array_values[50][50] = {0};
> 
> which will take advantage of the default static initialisation rule to 
> set everything to 0.
> 
Is it a typo, or did you intend to use only {}, instead of {{}}? 
And, yes, I have missed this initialization for some reason, although I did 
not forget to use it for the buffer array. I remember you have taught me 
to always use this type of initialization. You see, although you and everyone
here keep griping about me not heeding to your advices no matter how I have 
tried to convince you in the opposite, I do remember the majority of them,
because I have them whether written out by hand, or copied in a file, or 
remember them in my head (hope all them will eventually end up in my 
const memory buffer :) )  :D
> >         char *ptr_buffer = NULL;
> >         size_t i, j, k;
> >
> >         //Read each line from the file, store in the buffer
> >         for ( i = 0; fgets(buffer, sizeof(buffer), fp) != NULL; ++i )
> 
> Since i isn't part of the loop control logic, I would suggest that you 
> replace this line with the much more usual:
> 
>    i = 0;
>    while(fgets(buffer, sizeof buffer, fp) != NULL)
>    {
>      /* stuff */
>      i++;
>    }
> 
Indeed. Done.
> And if i is a line number, why not call it line_number? That's much more 
> descriptive than i.
> 
Done. I have also changed j and k names into names_column_number
and values_column_number - these are lengthy names and look a bit
heavy when used as indexes, but they are descriptive, so I will live 
them this way for now. 

> The loop body looked more or less okay to me, except that it's a touch 
> on the large side. Think about functional decomposition.
As I have mentioned in my description, this is exactly what I intend to do; 
it won't be easy for me, but what has been? ) and has to be done anyway
> 
> >         if (!feof(fp))
> >         {
> >             puts("Something went wrong with the provided file\n");
> >             return EXIT_FAILURE;
> >         }
> 
> Why not use ferror(fp) instead? It's a more direct description of what 
> you're trying to find out:
> 
>    if(ferror(fp))
>    {
> 
Because I have not known of it, and have not yet used it elsewhere. 
Changed. Done )
> And while we're on the subject, diagnostic messages sit much better on 
> stderr than on stdout. That way, they can be separated from the 
> program's normal output, like this:
> 
> ./alla test1.csv > test1.out 2> test1.log
> 
> which will put stdout stuff into test1.out and stderr stuff into test1.log.
>
0
Alla
12/16/2016 8:39:53 AM
On 16/12/16 08:39, Alla _ wrote:
> On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
>> [the code]
>>
<snip>
>>>     {
>>>         char buffer[BUFFER_SIZE] = "";
>>>         char array_names[50][50];
>>>         int  array_values[50][50];
>>
>> I am taking you at your word and forbearing to comment on these values,
>> but for the sake of a predictable life I would suggest that you
>> initialise these two arrays:
>>
>> char array_names[50][50] = {0};
>> int  array_values[50][50] = {0};
>>
>> which will take advantage of the default static initialisation rule to
>> set everything to 0.
>>
> Is it a typo, or did you intend to use only {}, instead of {{}}?

It isn't a typo. {0} always works, for any kind of aggregate type. But 
if you would prefer to use {{0}}, you {{{{{{{can}}}}}}}.

>> And if i is a line number, why not call it line_number? That's much more
>> descriptive than i.
>>
> Done. I have also changed j and k names into names_column_number
> and values_column_number - these are lengthy names and look a bit
> heavy when used as indexes, but they are descriptive, so I will live
> them this way for now.

It doesn't matter whether they "look heavy". What matters is whether the 
code is easy to read. It's easier to read if people know what the 
indices represent. Thanks to modern editors and their 'word completion' 
feature, the only real excuse (and it always /was/ just an excuse) for 
using short names --- that they take longer to type --- has gone.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 8:54:33 AM
On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
> On 15/12/16 17:25, Alla _ wrote:
> <snip>
> > Please, take a look at two versions of csv file I have posted in the first message:
> > the first one does have that trailing comma, and the second doesn't.
> > The first version has two trailing zeros, the second has one trailing zero;
> > I think that the zero I see when processing your file comes from the
> > newline character, which is all right because I don't stop the loop before
> > the '\n'.
> 
> I'm going to show you four sets of test data, and four sets of results.
> 
> The first set of test data is the CSV file I sent you.
> The second set is your first CSV file (the uppermost one in your article).
> The third set is your second CSV file (the lowermost one in your article).
> The fourth set is a CSV file that I have modified to illustrate a point, 
> which we'll come to in its proper place.
> 
> First, the test data I sent you:
> 
> ++++ inc_st_test.csv ++++
> Income Statement,2011,2012,2013,2014,2015
> Sales,1062,1252,1587,1934,2519
> Cost of Goods Sold,654,814,1009,1190,1499
>    Gross Profit,408,438,578,744,1020
> "Selling, General, and Admin Exp",254,271,364,454,576
>    Operating Income before Depr,154,167,214,290,444
> Depreciation and Amortization,25,31,38,52,70
>    Operating Profit,129,136,176,238,374
> Interest Expense,4,3,3,1,4
> Other Gains and Losses,0,7,10,0,-1
>    Pretax Income,125,126,163,237,371
> Income Tax Expense,55,52,65,92,141
>    Net Income,70,74,98,145,230
> ++++ end of inc_st_test.csv ++++
> 
> Results (first 5 lines only):
> (1) Income Statement,2011,2012,2013,2014,2015
> (1.1) buffer length = 42
> (2) Income Statement
> (2.2) buffer length = 25
> (3) 2011 2012 2013 2014 2015
> 
> Note the ABSENCE of trailing 0s in those results.
> 
Great. We are working with exactly the same file, 
and you don't get any trailing zeros, while I get one
- see in (3).

(1) Income Statement,2011,2012,2013,2014,2015
(1.1) buffer length = 43
(2) Income Statement
(2.2) buffer length = 26
(3) 2011 2012 2013 2014 2015 0 

This is the evidence that there is something wrong
with my ... have no idea what - machine, os, windows
programs, bash...?

<snip>
0
Alla
12/16/2016 9:19:27 AM
On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
> On 15/12/16 17:25, Alla _ wrote:
> <snip>
> > Please, take a look at two versions of csv file I have posted in the first message:
> > the first one does have that trailing comma, and the second doesn't.
> > The first version has two trailing zeros, the second has one trailing zero;
> > I think that the zero I see when processing your file comes from the
> > newline character, which is all right because I don't stop the loop before
> > the '\n'.
> 
> I'm going to show you four sets of test data, and four sets of results.
> 
> The first set of test data is the CSV file I sent you.
> The second set is your first CSV file (the uppermost one in your article).
> The third set is your second CSV file (the lowermost one in your article).
> The fourth set is a CSV file that I have modified to illustrate a point, 
> which we'll come to in its proper place.
> 
> First, the test data I sent you:
> 
> ++++ inc_st_test.csv ++++
> Income Statement,2011,2012,2013,2014,2015
> Sales,1062,1252,1587,1934,2519
> Cost of Goods Sold,654,814,1009,1190,1499
>    Gross Profit,408,438,578,744,1020
> "Selling, General, and Admin Exp",254,271,364,454,576
>    Operating Income before Depr,154,167,214,290,444
> Depreciation and Amortization,25,31,38,52,70
>    Operating Profit,129,136,176,238,374
> Interest Expense,4,3,3,1,4
> Other Gains and Losses,0,7,10,0,-1
>    Pretax Income,125,126,163,237,371
> Income Tax Expense,55,52,65,92,141
>    Net Income,70,74,98,145,230
> ++++ end of inc_st_test.csv ++++
> 
> Results (first 5 lines only):
> (1) Income Statement,2011,2012,2013,2014,2015
> (1.1) buffer length = 42
> (2) Income Statement
> (2.2) buffer length = 25
> (3) 2011 2012 2013 2014 2015
> 
> Note the ABSENCE of trailing 0s in those results.
> 
> Now for the first file you posted. Note the trailing commas.
> 
> ++++ test1.csv ++++
> Years,2011,2012,2013,2014,2015,
> Sales,1062,1252,1587,1934,2519,
> Cost of Goods Sold,654,814,1009,1190,1499,
>    Gross Profit,408,438,578,744,1020,
> SG&A,254,271,364,454,576,
>    Operating Income before Depr,154,167,214,290,444,
> Depreciation and Amortization,25,31,38,52,70,
>    Operating Profit,129,136,176,238,374,
> Interest Expense,4,3,3,1,4,
> Other Gains and Losses,0,7,10,0,-1,
>    Pretax Income,125,126,163,237,371,
> Income Tax Expense,55,52,65,92,141,
>    Net Income,70,74,98,145,230,
> ++++ end of test1.csv ++++
> 
> Results (first 5 lines only):
> (1) Years,2011,2012,2013,2014,2015,
> (1.1) buffer length = 32
> (2) Years
> (2.2) buffer length = 26
> (3) 2011 2012 2013 2014 2015 0
> 
> Note the trailing 0.
> 
> Now for the second CSV file you posted, with NO trailing commas:
> 
> ++++ test2.csv ++++
> Years,2011,2012,2013,2014,2015
> Sales,1062,1252,1587,1934,2519
> Cost of Goods Sold,654,814,1009,1190,1499
>    Gross Profit,408,438,578,744,1020
> "Selling, General, and Admin Exp",254,271,364,454,576
>    Operating Income before Depr,154,167,214,290,444
> Depreciation and Amortization,25,31,38,52,70
>    Operating Profit,129,136,176,238,374
> Interest Expense,4,3,3,1,4
> Other Gains and Losses,0,7,10,0,-1
>    Pretax Income,125,126,163,237,371
> Income Tax Expense,55,52,65,92,141
>    Net Income,70,74,98,145,230
> ++++ end of test2.csv ++++
> 
> Again, note that there are no trailing commas.
> 
> Results (first 5 lines only):
> (1) Years,2011,2012,2013,2014,2015
> (1.1) buffer length = 31
> (2) Years
> (2.2) buffer length = 25
> (3) 2011 2012 2013 2014 2015
> 
> Note the ABSENCE of trailing 0s.
> 
> Now for an experiment.
> 
> For my fourth test, I will deliberately place some extra commas at the 
> end of each line.
> 
> ++++ test3.csv ++++
> Years,2011,2012,2013,2014,2015,,,
> Sales,1062,1252,1587,1934,2519,,,
> Cost of Goods Sold,654,814,1009,1190,1499,,,
>    Gross Profit,408,438,578,744,1020,,,
> SG&A,254,271,364,454,576,,,
>    Operating Income before Depr,154,167,214,290,444,,,
> Depreciation and Amortization,25,31,38,52,70,,,
>    Operating Profit,129,136,176,238,374,,,
> Interest Expense,4,3,3,1,4,,,
> Other Gains and Losses,0,7,10,0,-1,,,
>    Pretax Income,125,126,163,237,371,,,
> Income Tax Expense,55,52,65,92,141,,,
>    Net Income,70,74,98,145,230,,,
> ++++ end of test3.csv ++++
> 
> Note that there are THREE trailing commas at the end of each line. 
> Prediction: 3 trailing 0s.
> 
As we have discussed with Keith previously, it does seem that whenever
I save a file as csv I get some trailing commas - they appear as the very 
last row, which has a few commas, only commas; and they don't disappear
even if I manually delete about 500 rows in the excel file; what's also 
interesting is that I seem to get some hidden commas at the end of each
row, even in the the file you have sent me, i.e. in the file correctly converted
to csv. This issue has nothing to do with C, therefore I guess I have to 
opt out from posting questions on this matter here, and try to figure out
how to solve this problem - I wish I knew more about computers and 
software to at least know where to look for the solution; I think I have to 
report to Microsoft and see what they say. 
I am sorry for posting so may words on topic not related to C. I know
that people here might get annoyed with it, but I do hope for your 
understanding.

<snip>
0
Alla
12/16/2016 9:25:45 AM
On 16/12/16 09:09, Alla _ wrote:
> On Friday, December 16, 2016 at 11:54:40 AM UTC+3, Richard Heathfield wrote:
>> On 16/12/16 08:39, Alla _ wrote:
>>> On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
>>>> [the code]
>>>>
>> <snip>
>>>>>     {
>>>>>         char buffer[BUFFER_SIZE] = "";
>>>>>         char array_names[50][50];
>>>>>         int  array_values[50][50];
>>>>
>>>> I am taking you at your word and forbearing to comment on these values,
>>>> but for the sake of a predictable life I would suggest that you
>>>> initialise these two arrays:
>>>>
>>>> char array_names[50][50] = {0};
>>>> int  array_values[50][50] = {0};
>>>>
>>>> which will take advantage of the default static initialisation rule to
>>>> set everything to 0.
>>>>
>>> Is it a typo, or did you intend to use only {}, instead of {{}}?
>>
>> It isn't a typo. {0} always works, for any kind of aggregate type. But
>> if you would prefer to use {{0}}, you {{{{{{{can}}}}}}}.
> My compiler complained about {0} )

Yes, so does mine. But it's wrong to do so. {0} is a C idiom, and 
perfectly correct.

It's probably a bit early in your C career to be telling you this, 
because the knowledge can lead to a certain devil-may-care attitude to 
warnings (and such an attitude is entirely wrong!), but here goes.

An implementation's diagnostic messages have a purpose, which is to draw 
your attention to possible mistakes in your program. And nowadays, they 
are very nearly always right.

But "very nearly always" isn't quite the same thing as "always". 
Sometimes they will point out something that they think is a problem, 
but that you *know for sure* is not a problem. In such circumstances, 
you have a choice: to modify the code anyway so as to shut up the 
warning, or to leave things alone and put up with the constant nagging 
(perhaps adding a comment to the code).

In this case, when faced with a diagnostic message for code that you 
didn't think should produce one, you did *exactly* the right thing --- 
you asked.

The *wrong* things to do would have been:

* ignore the message completely, without asking;
* change the code to get rid of the warning, without finding out what 
the warning meant.

It's all about understanding.

If you know exactly why a diagnostic message has been produced and what 
that message means, then you are in a good position to make an 
independent judgement based on your experience. But if you don't 
understand the message, you can't just ignore it. You need to find out 
what it means.

Some development teams have a house rule, and generally it's a good one, 
that the code should compile completely cleanly, without any diagnostic 
messages of any kind. In such circumstances, you should modify the code, 
but add a comment explaining what was there before and why you changed 
it. But, unless and until you join such a team, you are free to make 
your own decisions.

My own view is based on the fact that I use several different compilers, 
and code that clears up a warning in one of them will sometimes 
/generate/ a warning in another, and one can waste a lot of time trying 
to find an exact expression that will satisfy every compiler. If I 
*know* the code is right (and I always take my own knowledge with a 
pinch of salt, because sometimes what I know isn't actually correct), 
then I will generally choose to put up with the warning.

This is about knowing the rules. This is about understanding what a 
diagnostic message is telling you. And if you can't work it out, then ask.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 9:32:46 AM
On Thursday, December 15, 2016 at 11:16:40 PM UTC+3, Keith Thompson wrote:
> Alla _ <modelling.data@gmail.com> writes:
> [...]
> > I always use only the following flag, which you have taught me to use:
> > gcc -Wall -Werror -Wextra -pedantic -std=c99 program.c -o program
> > Shall I use some more?
> 
> That's fine.
> 
> Warning: The following could be a distraction, and you can consider
> ignoring it.
> 
> You could consider using -std=c11 rather than -std=c99 if you
> have a sufficiently new version of gcc, but unless you're using
> C11-specific features (which you probably aren't) it doesn't matter.
> (With recent versions, say gcc 6.0 and up, C11 support is roughly
> as good as C99 support.  Also, the default is "-std=gnu11" rather
> than the "-std=gnu90" of older releases.  But the default doesn't
> matter if you're specifying "-std=..." anyway.)
> 
As I am now (for a few months already) a lucky (or not? :) ) user
of El Capitan (10.11.6), I guess I have quite new gcc. But I have 
also fount out, serendipitously, that Apple now uses clang 
instead of gcc, and hence, when I tried to learn gdb yet again,
I have stumbled upon the fact that now I have to learn a different
debugger provided by Apple, and it is so difficult, that I had to 
quit that attempt yet again. Sad that I can't use gdb. 
This was a long way of saying that yes, most likely, I have a new
version of clang. 
0
Alla
12/16/2016 9:33:33 AM
On Thursday, December 15, 2016 at 11:36:18 PM UTC+3, Barry Schwarz wrote:
> On Thu, 15 Dec 2016 05:05:12 -0800 (PST), Alla _
> <modelling.data@gmail.com> wrote:
> 
> >Hello!
> >
> >I am not a fan of using exclamation marks, and any other text stressing effects, 
> >so, instead of those, by resorting to a detailed and a bit verbose description below,
> >I hope to make myself clear in terms of what I am trying to achieve, which I often
> >fail to do )
> >
> >Please, take a look at program, which:
> 
> <snip> 
> 
> >Second csv file (a correct one in terms of csv format):
> >Years,2011,2012,2013,2014,2015
> 
> <snip>
> 
> >Second output:
> >(1) Years,2011,2012,2013,2014,2015
> >(1.1) buffer length = 32
> >(2) Years
> >(2.2) buffer length = 26
> >(3) 2011 2012 2013 2014 2015 0 
> 
> The code you show coupled with the input data you show CANNOT produce
> the output you show!
> 
> There are only 30 text characters in your string.  The '\n' is the
> 31st and the terminating '\0' is the 32nd.  Since strlen does not
> count the '\0', buffer_length should contain 31, not the 32 your
> program prints.
> 
> There is something you are not showing us:  
>     Possibly trailing data in the file.  That is why several have
> asked for a hex dump.
>     Possibly different code than what is in your message.
> 
What has led you yet again to suspect me of some hidden motives?
Did you have a chance to see all those messages on the problem I face 
with my csv files, and the reason I show the buffer length in the program?
Yes, indeed, the length is bigger that I have expected it to be, and that 
is why I have posted so many questions on why this can be the case,
and what is happening with my csv files. 

> In fact, a trailing blank after 2015 produces exactly the result you
> see.  Now the question becomes - why are you not using the correct CSV
> file Richard gave you instead of ones you generate yourself with
> various errors?
See my posts above, and see my replies to Richard. I think I have already
tried to explain that it is always better to think of the positive 
motives people might have, rather then trying to find absent ulterior 
motives; as with the code, quoting Ben,  always check for success
rather than failure, and paraphrasing this - always look for positives,
not negatives ) 
> 
0
Alla
12/16/2016 9:44:02 AM
On Friday, December 16, 2016 at 12:32:54 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 09:09, Alla _ wrote:
> > On Friday, December 16, 2016 at 11:54:40 AM UTC+3, Richard Heathfield wrote:
> >> On 16/12/16 08:39, Alla _ wrote:
> >>> On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
> >>>> [the code]
> >>>>
> >> <snip>
> >>>>>     {
> >>>>>         char buffer[BUFFER_SIZE] = "";
> >>>>>         char array_names[50][50];
> >>>>>         int  array_values[50][50];
> >>>>
> >>>> I am taking you at your word and forbearing to comment on these values,
> >>>> but for the sake of a predictable life I would suggest that you
> >>>> initialise these two arrays:
> >>>>
> >>>> char array_names[50][50] = {0};
> >>>> int  array_values[50][50] = {0};
> >>>>
> >>>> which will take advantage of the default static initialisation rule to
> >>>> set everything to 0.
> >>>>
> >>> Is it a typo, or did you intend to use only {}, instead of {{}}?
> >>
> >> It isn't a typo. {0} always works, for any kind of aggregate type. But
> >> if you would prefer to use {{0}}, you {{{{{{{can}}}}}}}.
> > My compiler complained about {0} )
> 
> Yes, so does mine. But it's wrong to do so. {0} is a C idiom, and 
> perfectly correct.
> 
> It's probably a bit early in your C career to be telling you this, 
> because the knowledge can lead to a certain devil-may-care attitude to 
> warnings (and such an attitude is entirely wrong!), but here goes.
> 
> An implementation's diagnostic messages have a purpose, which is to draw 
> your attention to possible mistakes in your program. And nowadays, they 
> are very nearly always right.
> 
> But "very nearly always" isn't quite the same thing as "always". 
> Sometimes they will point out something that they think is a problem, 
> but that you *know for sure* is not a problem. In such circumstances, 
> you have a choice: to modify the code anyway so as to shut up the 
> warning, or to leave things alone and put up with the constant nagging 
> (perhaps adding a comment to the code).
> 
> In this case, when faced with a diagnostic message for code that you 
> didn't think should produce one, you did *exactly* the right thing --- 
> you asked.
> 
> The *wrong* things to do would have been:
> 
> * ignore the message completely, without asking;
> * change the code to get rid of the warning, without finding out what 
> the warning meant.
> 
> It's all about understanding.
> 
> If you know exactly why a diagnostic message has been produced and what 
> that message means, then you are in a good position to make an 
> independent judgement based on your experience. But if you don't 
> understand the message, you can't just ignore it. You need to find out 
> what it means.
> 
> Some development teams have a house rule, and generally it's a good one, 
> that the code should compile completely cleanly, without any diagnostic 
> messages of any kind. In such circumstances, you should modify the code, 
> but add a comment explaining what was there before and why you changed 
> it. But, unless and until you join such a team, you are free to make 
> your own decisions.
> 
> My own view is based on the fact that I use several different compilers, 
> and code that clears up a warning in one of them will sometimes 
> /generate/ a warning in another, and one can waste a lot of time trying 
> to find an exact expression that will satisfy every compiler. If I 
> *know* the code is right (and I always take my own knowledge with a 
> pinch of salt, because sometimes what I know isn't actually correct), 
> then I will generally choose to put up with the warning.
> 
> This is about knowing the rules. This is about understanding what a 
> diagnostic message is telling you. And if you can't work it out, then ask.
> 
Thank you very much for this explanation. 
The only thing - my compiler didn't try to warn me, it screamed: "Error" )
parse_csv_2.c:29:37: error: suggest braces around initialization of subobject
      [-Werror,-Wmissing-braces]
        char array_names[50][50] = {0};
0
Alla
12/16/2016 9:51:12 AM
On 16/12/16 09:19, Alla _ wrote:
> On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
>> On 15/12/16 17:25, Alla _ wrote:
>> <snip>
>>> Please, take a look at two versions of csv file I have posted in the first message:
>>> the first one does have that trailing comma, and the second doesn't.
>>> The first version has two trailing zeros, the second has one trailing zero;
>>> I think that the zero I see when processing your file comes from the
>>> newline character, which is all right because I don't stop the loop before
>>> the '\n'.
>>
>> I'm going to show you four sets of test data, and four sets of results.
>>
>> The first set of test data is the CSV file I sent you.
>> The second set is your first CSV file (the uppermost one in your article).
>> The third set is your second CSV file (the lowermost one in your article).
>> The fourth set is a CSV file that I have modified to illustrate a point,
>> which we'll come to in its proper place.
>>
>> First, the test data I sent you:
>>
>> ++++ inc_st_test.csv ++++
>> Income Statement,2011,2012,2013,2014,2015
>> Sales,1062,1252,1587,1934,2519
>> Cost of Goods Sold,654,814,1009,1190,1499
>>    Gross Profit,408,438,578,744,1020
>> "Selling, General, and Admin Exp",254,271,364,454,576
>>    Operating Income before Depr,154,167,214,290,444
>> Depreciation and Amortization,25,31,38,52,70
>>    Operating Profit,129,136,176,238,374
>> Interest Expense,4,3,3,1,4
>> Other Gains and Losses,0,7,10,0,-1
>>    Pretax Income,125,126,163,237,371
>> Income Tax Expense,55,52,65,92,141
>>    Net Income,70,74,98,145,230
>> ++++ end of inc_st_test.csv ++++
>>
>> Results (first 5 lines only):
>> (1) Income Statement,2011,2012,2013,2014,2015
>> (1.1) buffer length = 42
>> (2) Income Statement
>> (2.2) buffer length = 25
>> (3) 2011 2012 2013 2014 2015
>>
>> Note the ABSENCE of trailing 0s in those results.
>>
> Great. We are working with exactly the same file,
> and you don't get any trailing zeros, while I get one
> - see in (3).

But I'm using the same code as you, so clearly we are /not/ using 
exactly the same file.

Barry Schwarz has suggested that you may have a trailing space in your 
data, which wouldn't show up on Usenet, and he says (presumably after 
trying it out) that this will produce the symptom you are describing.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 10:01:39 AM
On 16/12/16 09:44, Alla _ wrote:
> On Thursday, December 15, 2016 at 11:36:18 PM UTC+3, Barry Schwarz wrote:
<snip>

>> There is something you are not showing us:
>>     Possibly trailing data in the file.  That is why several have
>> asked for a hex dump.
>>     Possibly different code than what is in your message.
>>
> What has led you yet again to suspect me of some hidden motives?

I can't see anything in Barry's article that justifies such an 
accusation. What makes you think he suspects you of anything? He's 
describing facts, not ascribing motives.

I would imagine that Barry, like me, tends to apply Hanlon's Razor to 
every Usenet article he reads.

If Barry is correct, the thing you are not showing us is trailing data 
in the file. If that trailing data is a space character, it is very 
likely that you didn't know the data was there. That, as Barry rightly 
says, is where a hex dump can be invaluable, because nothing can hide 
from a hex dump.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 10:04:37 AM
On 16/12/16 09:51, Alla _ wrote:
<snip>
> The only thing - my compiler didn't try to warn me, it screamed: "Error" )
> parse_csv_2.c:29:37: error: suggest braces around initialization of subobject
>       [-Werror,-Wmissing-braces]
>         char array_names[50][50] = {0};

Then you have three choices: remove -Werror from your list of compiler 
flags, or remove -Wmissing-braces from your list of compiler flags, or 
change the code.

-Werror turns every warning into an error. That's *good*, if you are 
determined that your code must compile cleanly. Not so good if the 
compiler warns about code that is perfectly good and that you don't wish 
to change.

Your call.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 10:08:10 AM
On 16/12/16 09:25, Alla _ wrote:
> On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
>> On 15/12/16 17:25, Alla _ wrote:
>> <snip>
>>> Please, take a look at two versions of csv file I have posted in the first message:
>>> the first one does have that trailing comma, and the second doesn't.
>>> The first version has two trailing zeros, the second has one trailing zero;
>>> I think that the zero I see when processing your file comes from the
>>> newline character, which is all right because I don't stop the loop before
>>> the '\n'.
>>
>> I'm going to show you four sets of test data, and four sets of results.
>>
>> The first set of test data is the CSV file I sent you.
>> The second set is your first CSV file (the uppermost one in your article).
>> The third set is your second CSV file (the lowermost one in your article).
>> The fourth set is a CSV file that I have modified to illustrate a point,
>> which we'll come to in its proper place.
>>
>> First, the test data I sent you:
>>
>> ++++ inc_st_test.csv ++++
>> Income Statement,2011,2012,2013,2014,2015
>> Sales,1062,1252,1587,1934,2519
>> Cost of Goods Sold,654,814,1009,1190,1499
>>    Gross Profit,408,438,578,744,1020
>> "Selling, General, and Admin Exp",254,271,364,454,576
>>    Operating Income before Depr,154,167,214,290,444
>> Depreciation and Amortization,25,31,38,52,70
>>    Operating Profit,129,136,176,238,374
>> Interest Expense,4,3,3,1,4
>> Other Gains and Losses,0,7,10,0,-1
>>    Pretax Income,125,126,163,237,371
>> Income Tax Expense,55,52,65,92,141
>>    Net Income,70,74,98,145,230
>> ++++ end of inc_st_test.csv ++++
>>
>> Results (first 5 lines only):
>> (1) Income Statement,2011,2012,2013,2014,2015
>> (1.1) buffer length = 42
>> (2) Income Statement
>> (2.2) buffer length = 25
>> (3) 2011 2012 2013 2014 2015
>>
>> Note the ABSENCE of trailing 0s in those results.
>>
>> Now for the first file you posted. Note the trailing commas.
>>
>> ++++ test1.csv ++++
>> Years,2011,2012,2013,2014,2015,
>> Sales,1062,1252,1587,1934,2519,
>> Cost of Goods Sold,654,814,1009,1190,1499,
>>    Gross Profit,408,438,578,744,1020,
>> SG&A,254,271,364,454,576,
>>    Operating Income before Depr,154,167,214,290,444,
>> Depreciation and Amortization,25,31,38,52,70,
>>    Operating Profit,129,136,176,238,374,
>> Interest Expense,4,3,3,1,4,
>> Other Gains and Losses,0,7,10,0,-1,
>>    Pretax Income,125,126,163,237,371,
>> Income Tax Expense,55,52,65,92,141,
>>    Net Income,70,74,98,145,230,
>> ++++ end of test1.csv ++++
>>
>> Results (first 5 lines only):
>> (1) Years,2011,2012,2013,2014,2015,
>> (1.1) buffer length = 32
>> (2) Years
>> (2.2) buffer length = 26
>> (3) 2011 2012 2013 2014 2015 0
>>
>> Note the trailing 0.
>>
>> Now for the second CSV file you posted, with NO trailing commas:
>>
>> ++++ test2.csv ++++
>> Years,2011,2012,2013,2014,2015
>> Sales,1062,1252,1587,1934,2519
>> Cost of Goods Sold,654,814,1009,1190,1499
>>    Gross Profit,408,438,578,744,1020
>> "Selling, General, and Admin Exp",254,271,364,454,576
>>    Operating Income before Depr,154,167,214,290,444
>> Depreciation and Amortization,25,31,38,52,70
>>    Operating Profit,129,136,176,238,374
>> Interest Expense,4,3,3,1,4
>> Other Gains and Losses,0,7,10,0,-1
>>    Pretax Income,125,126,163,237,371
>> Income Tax Expense,55,52,65,92,141
>>    Net Income,70,74,98,145,230
>> ++++ end of test2.csv ++++
>>
>> Again, note that there are no trailing commas.
>>
>> Results (first 5 lines only):
>> (1) Years,2011,2012,2013,2014,2015
>> (1.1) buffer length = 31
>> (2) Years
>> (2.2) buffer length = 25
>> (3) 2011 2012 2013 2014 2015
>>
>> Note the ABSENCE of trailing 0s.
>>
>> Now for an experiment.
>>
>> For my fourth test, I will deliberately place some extra commas at the
>> end of each line.
>>
>> ++++ test3.csv ++++
>> Years,2011,2012,2013,2014,2015,,,
>> Sales,1062,1252,1587,1934,2519,,,
>> Cost of Goods Sold,654,814,1009,1190,1499,,,
>>    Gross Profit,408,438,578,744,1020,,,
>> SG&A,254,271,364,454,576,,,
>>    Operating Income before Depr,154,167,214,290,444,,,
>> Depreciation and Amortization,25,31,38,52,70,,,
>>    Operating Profit,129,136,176,238,374,,,
>> Interest Expense,4,3,3,1,4,,,
>> Other Gains and Losses,0,7,10,0,-1,,,
>>    Pretax Income,125,126,163,237,371,,,
>> Income Tax Expense,55,52,65,92,141,,,
>>    Net Income,70,74,98,145,230,,,
>> ++++ end of test3.csv ++++
>>
>> Note that there are THREE trailing commas at the end of each line.
>> Prediction: 3 trailing 0s.
>>
> As we have discussed with Keith previously, it does seem that whenever
> I save a file as csv I get some trailing commas - they appear as the very
> last row, which has a few commas, only commas; and they don't disappear
> even if I manually delete about 500 rows in the excel file;

Then that should make an interesting little program for you to write: 
excelcsv_to_propercsv.c

> what's also
> interesting is that I seem to get some hidden commas at the end of each
> row, even in the the file you have sent me,

Are you saying your email program is corrupting data that is emailed to 
you? That seems very unlikely. But it's easily resolved. Show me a hex 
dump of the file I sent you, and let's find out whether it really has 
any hidden commas in it.

> i.e. in the file correctly converted
> to csv. This issue has nothing to do with C, therefore I guess I have to
> opt out from posting questions on this matter here, and try to figure out
> how to solve this problem - I wish I knew more about computers and
> software to at least know where to look for the solution; I think I have to
> report to Microsoft and see what they say.

Not much point in doing that. First you need to establish exactly what 
is going on, and the hex dump is the solution to that. Once you know 
that, the chances are that any contact with MS will be a waste of their 
time and yours.

Hex dump, please.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 10:11:50 AM
On Friday, December 16, 2016 at 1:01:46 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 09:19, Alla _ wrote:
> > On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
> >> On 15/12/16 17:25, Alla _ wrote:
> >> <snip>
> >>> Please, take a look at two versions of csv file I have posted in the first message:
> >>> the first one does have that trailing comma, and the second doesn't.
> >>> The first version has two trailing zeros, the second has one trailing zero;
> >>> I think that the zero I see when processing your file comes from the
> >>> newline character, which is all right because I don't stop the loop before
> >>> the '\n'.
> >>
> >> I'm going to show you four sets of test data, and four sets of results.
> >>
> >> The first set of test data is the CSV file I sent you.
> >> The second set is your first CSV file (the uppermost one in your article).
> >> The third set is your second CSV file (the lowermost one in your article).
> >> The fourth set is a CSV file that I have modified to illustrate a point,
> >> which we'll come to in its proper place.
> >>
> >> First, the test data I sent you:
> >>
> >> ++++ inc_st_test.csv ++++
> >> Income Statement,2011,2012,2013,2014,2015
> >> Sales,1062,1252,1587,1934,2519
> >> Cost of Goods Sold,654,814,1009,1190,1499
> >>    Gross Profit,408,438,578,744,1020
> >> "Selling, General, and Admin Exp",254,271,364,454,576
> >>    Operating Income before Depr,154,167,214,290,444
> >> Depreciation and Amortization,25,31,38,52,70
> >>    Operating Profit,129,136,176,238,374
> >> Interest Expense,4,3,3,1,4
> >> Other Gains and Losses,0,7,10,0,-1
> >>    Pretax Income,125,126,163,237,371
> >> Income Tax Expense,55,52,65,92,141
> >>    Net Income,70,74,98,145,230
> >> ++++ end of inc_st_test.csv ++++
> >>
> >> Results (first 5 lines only):
> >> (1) Income Statement,2011,2012,2013,2014,2015
> >> (1.1) buffer length = 42
> >> (2) Income Statement
> >> (2.2) buffer length = 25
> >> (3) 2011 2012 2013 2014 2015
> >>
> >> Note the ABSENCE of trailing 0s in those results.
> >>
> > Great. We are working with exactly the same file,
> > and you don't get any trailing zeros, while I get one
> > - see in (3).
> 
> But I'm using the same code as you, so clearly we are /not/ using 
> exactly the same file.
> 
Well, I have nothing to add here. One of the files I am using, and the one
I have been referring in this discussion today, is the one you sent me. 
Are you using a different one?

> Barry Schwarz has suggested that you may have a trailing space in your 
> data, which wouldn't show up on Usenet, and he says (presumably after 
> trying it out) that this will produce the symptom you are describing.
I thought so too. I have tried manually to delete rows and columns around
the data I need to preserve. I am out of tools, and definitely don't have 
any knowledge to continue figuring this out. But it is a huge problem for 
me as I can't proceed with working on the code because I can't even have
the loop with a condition I need. I am thinking now how to solve the issue
with the csv file. I will be back. 
> 
0
Alla
12/16/2016 10:43:26 AM
On Friday, December 16, 2016 at 1:11:58 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 09:25, Alla _ wrote:
> > On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
> >> On 15/12/16 17:25, Alla _ wrote:
> >> <snip>
> >>> Please, take a look at two versions of csv file I have posted in the first message:
> >>> the first one does have that trailing comma, and the second doesn't.
> >>> The first version has two trailing zeros, the second has one trailing zero;
> >>> I think that the zero I see when processing your file comes from the
> >>> newline character, which is all right because I don't stop the loop before
> >>> the '\n'.
> >>
> >> I'm going to show you four sets of test data, and four sets of results.
> >>
> >> The first set of test data is the CSV file I sent you.
> >> The second set is your first CSV file (the uppermost one in your article).
> >> The third set is your second CSV file (the lowermost one in your article).
> >> The fourth set is a CSV file that I have modified to illustrate a point,
> >> which we'll come to in its proper place.
> >>
> >> First, the test data I sent you:
> >>
> >> ++++ inc_st_test.csv ++++
> >> Income Statement,2011,2012,2013,2014,2015
> >> Sales,1062,1252,1587,1934,2519
> >> Cost of Goods Sold,654,814,1009,1190,1499
> >>    Gross Profit,408,438,578,744,1020
> >> "Selling, General, and Admin Exp",254,271,364,454,576
> >>    Operating Income before Depr,154,167,214,290,444
> >> Depreciation and Amortization,25,31,38,52,70
> >>    Operating Profit,129,136,176,238,374
> >> Interest Expense,4,3,3,1,4
> >> Other Gains and Losses,0,7,10,0,-1
> >>    Pretax Income,125,126,163,237,371
> >> Income Tax Expense,55,52,65,92,141
> >>    Net Income,70,74,98,145,230
> >> ++++ end of inc_st_test.csv ++++
> >>
> >> Results (first 5 lines only):
> >> (1) Income Statement,2011,2012,2013,2014,2015
> >> (1.1) buffer length = 42
> >> (2) Income Statement
> >> (2.2) buffer length = 25
> >> (3) 2011 2012 2013 2014 2015
> >>
> >> Note the ABSENCE of trailing 0s in those results.
> >>
> >> Now for the first file you posted. Note the trailing commas.
> >>
> >> ++++ test1.csv ++++
> >> Years,2011,2012,2013,2014,2015,
> >> Sales,1062,1252,1587,1934,2519,
> >> Cost of Goods Sold,654,814,1009,1190,1499,
> >>    Gross Profit,408,438,578,744,1020,
> >> SG&A,254,271,364,454,576,
> >>    Operating Income before Depr,154,167,214,290,444,
> >> Depreciation and Amortization,25,31,38,52,70,
> >>    Operating Profit,129,136,176,238,374,
> >> Interest Expense,4,3,3,1,4,
> >> Other Gains and Losses,0,7,10,0,-1,
> >>    Pretax Income,125,126,163,237,371,
> >> Income Tax Expense,55,52,65,92,141,
> >>    Net Income,70,74,98,145,230,
> >> ++++ end of test1.csv ++++
> >>
> >> Results (first 5 lines only):
> >> (1) Years,2011,2012,2013,2014,2015,
> >> (1.1) buffer length = 32
> >> (2) Years
> >> (2.2) buffer length = 26
> >> (3) 2011 2012 2013 2014 2015 0
> >>
> >> Note the trailing 0.
> >>
> >> Now for the second CSV file you posted, with NO trailing commas:
> >>
> >> ++++ test2.csv ++++
> >> Years,2011,2012,2013,2014,2015
> >> Sales,1062,1252,1587,1934,2519
> >> Cost of Goods Sold,654,814,1009,1190,1499
> >>    Gross Profit,408,438,578,744,1020
> >> "Selling, General, and Admin Exp",254,271,364,454,576
> >>    Operating Income before Depr,154,167,214,290,444
> >> Depreciation and Amortization,25,31,38,52,70
> >>    Operating Profit,129,136,176,238,374
> >> Interest Expense,4,3,3,1,4
> >> Other Gains and Losses,0,7,10,0,-1
> >>    Pretax Income,125,126,163,237,371
> >> Income Tax Expense,55,52,65,92,141
> >>    Net Income,70,74,98,145,230
> >> ++++ end of test2.csv ++++
> >>
> >> Again, note that there are no trailing commas.
> >>
> >> Results (first 5 lines only):
> >> (1) Years,2011,2012,2013,2014,2015
> >> (1.1) buffer length = 31
> >> (2) Years
> >> (2.2) buffer length = 25
> >> (3) 2011 2012 2013 2014 2015
> >>
> >> Note the ABSENCE of trailing 0s.
> >>
> >> Now for an experiment.
> >>
> >> For my fourth test, I will deliberately place some extra commas at the
> >> end of each line.
> >>
> >> ++++ test3.csv ++++
> >> Years,2011,2012,2013,2014,2015,,,
> >> Sales,1062,1252,1587,1934,2519,,,
> >> Cost of Goods Sold,654,814,1009,1190,1499,,,
> >>    Gross Profit,408,438,578,744,1020,,,
> >> SG&A,254,271,364,454,576,,,
> >>    Operating Income before Depr,154,167,214,290,444,,,
> >> Depreciation and Amortization,25,31,38,52,70,,,
> >>    Operating Profit,129,136,176,238,374,,,
> >> Interest Expense,4,3,3,1,4,,,
> >> Other Gains and Losses,0,7,10,0,-1,,,
> >>    Pretax Income,125,126,163,237,371,,,
> >> Income Tax Expense,55,52,65,92,141,,,
> >>    Net Income,70,74,98,145,230,,,
> >> ++++ end of test3.csv ++++
> >>
> >> Note that there are THREE trailing commas at the end of each line.
> >> Prediction: 3 trailing 0s.
> >>
> > As we have discussed with Keith previously, it does seem that whenever
> > I save a file as csv I get some trailing commas - they appear as the very
> > last row, which has a few commas, only commas; and they don't disappear
> > even if I manually delete about 500 rows in the excel file;
> 
> Then that should make an interesting little program for you to write: 
> excelcsv_to_propercsv.c
> 
> > what's also
> > interesting is that I seem to get some hidden commas at the end of each
> > row, even in the the file you have sent me,
> 
> Are you saying your email program is corrupting data that is emailed to 
> you? That seems very unlikely. But it's easily resolved. Show me a hex 
> dump of the file I sent you, and let's find out whether it really has 
> any hidden commas in it.
> 
> > i.e. in the file correctly converted
> > to csv. This issue has nothing to do with C, therefore I guess I have to
> > opt out from posting questions on this matter here, and try to figure out
> > how to solve this problem - I wish I knew more about computers and
> > software to at least know where to look for the solution; I think I have to
> > report to Microsoft and see what they say.
> 
> Not much point in doing that. First you need to establish exactly what 
> is going on, and the hex dump is the solution to that. Once you know 
> that, the chances are that any contact with MS will be a waste of their 
> time and yours.
> 
> Hex dump, please.
> 
Yes, Sir.
0
Alla
12/16/2016 10:45:54 AM
On 16/12/16 10:43, Alla _ wrote:
> On Friday, December 16, 2016 at 1:01:46 PM UTC+3, Richard Heathfield wrote:
>> On 16/12/16 09:19, Alla _ wrote:
<snip>
>>> Great. We are working with exactly the same file,
>>> and you don't get any trailing zeros, while I get one
>>> - see in (3).
>>
>> But I'm using the same code as you, so clearly we are /not/ using
>> exactly the same file.
>>
> Well, I have nothing to add here. One of the files I am using, and the one
> I have been referring in this discussion today, is the one you sent me.
> Are you using a different one?

No. Here's the hex dump, so you can see exactly what I'm using (and it's 
exactly what I sent you):

49 6E 63 6F 6D 65 20 53 74 61 74 65 6D 65 6E 74  |Income Statement|
2C 32 30 31 31 2C 32 30 31 32 2C 32 30 31 33 2C  |,2011,2012,2013,|
32 30 31 34 2C 32 30 31 35 0A 53 61 6C 65 73 2C  |2014,2015.Sales,|
31 30 36 32 2C 31 32 35 32 2C 31 35 38 37 2C 31  |1062,1252,1587,1|
39 33 34 2C 32 35 31 39 0A 43 6F 73 74 20 6F 66  |934,2519.Cost of|
20 47 6F 6F 64 73 20 53 6F 6C 64 2C 36 35 34 2C  | Goods Sold,654,|
38 31 34 2C 31 30 30 39 2C 31 31 39 30 2C 31 34  |814,1009,1190,14|
39 39 0A 20 20 47 72 6F 73 73 20 50 72 6F 66 69  |99.  Gross Profi|
74 2C 34 30 38 2C 34 33 38 2C 35 37 38 2C 37 34  |t,408,438,578,74|
34 2C 31 30 32 30 0A 22 53 65 6C 6C 69 6E 67 2C  |4,1020."Selling,|
20 47 65 6E 65 72 61 6C 2C 20 61 6E 64 20 41 64  | General, and Ad|
6D 69 6E 20 45 78 70 22 2C 32 35 34 2C 32 37 31  |min Exp",254,271|
2C 33 36 34 2C 34 35 34 2C 35 37 36 0A 20 20 4F  |,364,454,576.  O|
70 65 72 61 74 69 6E 67 20 49 6E 63 6F 6D 65 20  |perating Income |
62 65 66 6F 72 65 20 44 65 70 72 2C 31 35 34 2C  |before Depr,154,|
31 36 37 2C 32 31 34 2C 32 39 30 2C 34 34 34 0A  |167,214,290,444.|
44 65 70 72 65 63 69 61 74 69 6F 6E 20 61 6E 64  |Depreciation and|
20 41 6D 6F 72 74 69 7A 61 74 69 6F 6E 2C 32 35  | Amortization,25|
2C 33 31 2C 33 38 2C 35 32 2C 37 30 0A 20 20 4F  |,31,38,52,70.  O|
70 65 72 61 74 69 6E 67 20 50 72 6F 66 69 74 2C  |perating Profit,|
31 32 39 2C 31 33 36 2C 31 37 36 2C 32 33 38 2C  |129,136,176,238,|
33 37 34 0A 49 6E 74 65 72 65 73 74 20 45 78 70  |374.Interest Exp|
65 6E 73 65 2C 34 2C 33 2C 33 2C 31 2C 34 0A 4F  |ense,4,3,3,1,4.O|
74 68 65 72 20 47 61 69 6E 73 20 61 6E 64 20 4C  |ther Gains and L|
6F 73 73 65 73 2C 30 2C 37 2C 31 30 2C 30 2C 2D  |osses,0,7,10,0,-|
31 0A 20 20 50 72 65 74 61 78 20 49 6E 63 6F 6D  |1.  Pretax Incom|
65 2C 31 32 35 2C 31 32 36 2C 31 36 33 2C 32 33  |e,125,126,163,23|
37 2C 33 37 31 0A 49 6E 63 6F 6D 65 20 54 61 78  |7,371.Income Tax|
20 45 78 70 65 6E 73 65 2C 35 35 2C 35 32 2C 36  | Expense,55,52,6|
35 2C 39 32 2C 31 34 31 0A 20 20 4E 65 74 20 49  |5,92,141.  Net I|
6E 63 6F 6D 65 2C 37 30 2C 37 34 2C 39 38 2C 31  |ncome,70,74,98,1|
34 35 2C 32 33 30 0A                             |45,230.|


>
>> Barry Schwarz has suggested that you may have a trailing space in your
>> data, which wouldn't show up on Usenet, and he says (presumably after
>> trying it out) that this will produce the symptom you are describing.
> I thought so too. I have tried manually to delete rows and columns around
> the data I need to preserve. I am out of tools,

No, you are *not* out of tools. What you have is a reluctance to use a 
hex dump. Overcome that reluctance, and paste a hex dump of the file 
that is causing you problems.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 11:03:41 AM
On Friday, December 16, 2016 at 1:11:58 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 09:25, Alla _ wrote:
> > On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
> >> On 15/12/16 17:25, Alla _ wrote:
<snip>
> Hex dump, please.
> 
Below I post:
(1) The corrected version of the program based on comments I have received;
(2) csv inputs
(3) program's outputs
(4) hex dumps for each csv
Interesting: now that I have changed the error checking method, i.e. if (!ferror(fp)),
instead of checking for !feof, I get the error message at the end of the whole output;
this is telling me that there is something wrong with every csv file. 


(1) ******Program******
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

#define BUFFER_SIZE 100
#define ARRAY_SIZE(a) (sizeof(a)/sizeof(*(a)))
#define INCOME_STATEMENT "files/income_test.csv"

int main(int argc, char *argv[])
{
    //Check for correct number of arguments
    if(argc > 2)
    {
        fputs("Sorry, the program processes only one file.\n", stderr);
        return EXIT_FAILURE;
    }
    //Determine a file to use
    const char *data_file = (argc == 2)? argv[1] : INCOME_STATEMENT;
    
    FILE *fp = fopen(data_file, "r");
    
    if ( fp )
    {
        char buffer[BUFFER_SIZE] = "";
        char array_names[50][50] = {{0}};
        int  array_values[50][50] = {{0}};
        char *ptr_buffer = NULL;
        size_t line_number = 0;
        size_t names_column_number;
        size_t values_column_number;
       
        //Read each line from the file, store in the buffer
        while(fgets(buffer, sizeof(buffer), fp) != NULL)
        {
            size_t buffer_length = strlen(buffer);
            
            //DEBUG: CHECK IF BUFFER IS FILLED CORRECTLY
            printf("(1) %s", buffer);
            printf("(1.1) buffer length = %zu\n", buffer_length);
            //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
            printf("(2) ");
            
            
            //Parse row names from each line into 'array_names'
            for ( names_column_number = 0, ptr_buffer = buffer;
                  ( !isdigit(*ptr_buffer)); names_column_number++,
                   ptr_buffer++ )
            {
                //Check if the line starts with double quotes
                if(*ptr_buffer == '"')
                {
                    //Skip double quotes
                    ptr_buffer++;
                    //Store every character between double quotes
                    while(*ptr_buffer != '"')
                    {
                        array_names[line_number][names_column_number] = *ptr_buffer++;
                    
                        //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
                        printf("%c", array_names[line_number][names_column_number]);
                        
                        names_column_number++;
                        //Decrement the length of the buffer
                        buffer_length--;
                    }
                    
                    ptr_buffer++;
                }
                
                /** THE IF STATEMENT THAT CHECKS FOR DOUBLE QUOTES WILL GO INTO A SEPARATE FUNCTION **/
                if(*ptr_buffer == ',' && isdigit(*(ptr_buffer + 1)))
                    break;
                array_names[line_number][names_column_number] = *ptr_buffer;
                
                //DEBUG: CHECK IF NAMES ARE FILLED CORRECTLY
                printf("%c", array_names[line_number][names_column_number]);
                //Decrement the length of the buffer
                buffer_length--;
            }
            array_names[line_number][names_column_number] = '\0';
            putchar('\n');
            
            //Move to the next character in the line beyond comma
            ptr_buffer++;
            buffer_length--;
            
            //DEBUG: CHECK BUFFER LENGTH AFTER FILLING NAMES
            printf("(2.2) buffer length = %zu\n", buffer_length);

            
            //DEBUG: CHECK IF VALUES ARE FILLED CORRECTLY
            printf("(3) ");

            //Parse the comma-separated values from each line into 'array_values'
            for ( values_column_number = 0; *ptr_buffer != '\0'
                 /** && k < buffer_length - 2**/;
                 values_column_number++, ptr_buffer++)
            {
                array_values[line_number][values_column_number] = (int)strtol(ptr_buffer, &ptr_buffer,
                                                 10);
                
                //DEBUG: CHECK IF VALUES ARE FILLED CORRECTLY
                printf("%d ", array_values[line_number][values_column_number]);
            }
            printf("\n\n");
            line_number++;
        }
        if (!ferror(fp))
        {
            puts("Something went wrong with the provided file\n");
            return EXIT_FAILURE;
        }
        fclose(fp);
       
    }    
    //fopen() returned NULL
    else
    {
        perror(data_file);
    }
    return 0;
}


(2) File you have sent me (without changing Income
statement to Years in the first row):

******csv file******
Income Statement,2011,2012,2013,2014,2015
Sales,1062,1252,1587,1934,2519
Cost of Goods Sold,654,814,1009,1190,1499
  Gross Profit,408,438,578,744,1020
"Selling, General, and Admin Exp",254,271,364,454,576
  Operating Income before Depr,154,167,214,290,444
Depreciation and Amortization,25,31,38,52,70
  Operating Profit,129,136,176,238,374
Interest Expense,4,3,3,1,4
Other Gains and Losses,0,7,10,0,-1
  Pretax Income,125,126,163,237,371
Income Tax Expense,55,52,65,92,141
  Net Income,70,74,98,145,230

******code output with one trailing zero******
(1) Income Statement,2011,2012,2013,2014,2015
(1.1) buffer length = 43
(2) Income Statement
(2.2) buffer length = 26
(3) 2011 2012 2013 2014 2015 0 

(1) Sales,1062,1252,1587,1934,2519
(1.1) buffer length = 32
(2) Sales
(2.2) buffer length = 26
(3) 1062 1252 1587 1934 2519 0 

(1) Cost of Goods Sold,654,814,1009,1190,1499
(1.1) buffer length = 43
(2) Cost of Goods Sold
(2.2) buffer length = 24
(3) 654 814 1009 1190 1499 0 

(1)   Gross Profit,408,438,578,744,1020
(1.1) buffer length = 37
(2)   Gross Profit
(2.2) buffer length = 22
(3) 408 438 578 744 1020 0 

(1) "Selling, General, and Admin Exp",254,271,364,454,576
(1.1) buffer length = 55
(2) Selling, General, and Admin Exp
(2.2) buffer length = 23
(3) 254 271 364 454 576 0 

(1)   Operating Income before Depr,154,167,214,290,444
(1.1) buffer length = 52
(2)   Operating Income before Depr
(2.2) buffer length = 21
(3) 154 167 214 290 444 0 

(1) Depreciation and Amortization,25,31,38,52,70
(1.1) buffer length = 46
(2) Depreciation and Amortization
(2.2) buffer length = 16
(3) 25 31 38 52 70 0 

(1)   Operating Profit,129,136,176,238,374
(1.1) buffer length = 40
(2)   Operating Profit
(2.2) buffer length = 21
(3) 129 136 176 238 374 0 

(1) Interest Expense,4,3,3,1,4
(1.1) buffer length = 28
(2) Interest Expense
(2.2) buffer length = 11
(3) 4 3 3 1 4 0 

(1) Other Gains and Losses,0,7,10,0,-1
(1.1) buffer length = 36
(2) Other Gains and Losses
(2.2) buffer length = 13
(3) 0 7 10 0 -1 0 

(1)   Pretax Income,125,126,163,237,371
(1.1) buffer length = 37
(2)   Pretax Income
(2.2) buffer length = 21
(3) 125 126 163 237 371 0 

(1) Income Tax Expense,55,52,65,92,141
(1.1) buffer length = 36
(2) Income Tax Expense
(2.2) buffer length = 17
(3) 55 52 65 92 141 0 

(1)   Net Income,70,74,98,145,230
(1.1) buffer length = 31
(2)   Net Income
(2.2) buffer length = 18
(3) 70 74 98 145 230 0 

Something went wrong with the provided file

******hex******
49 6E 63 6F 6D 65 20 53 74 61 74 65 6D 65 6E 74
2C 32 30 31 31 2C 32 30 31 32 2C 32 30 31 33 2C
32 30 31 34 2C 32 30 31 35 0D 0A 53 61 6C 65 73
2C 31 30 36 32 2C 31 32 35 32 2C 31 35 38 37 2C
31 39 33 34 2C 32 35 31 39 0D 0A 43 6F 73 74 20
6F 66 20 47 6F 6F 64 73 20 53 6F 6C 64 2C 36 35
34 2C 38 31 34 2C 31 30 30 39 2C 31 31 39 30 2C
31 34 39 39 0D 0A 20 20 47 72 6F 73 73 20 50 72
6F 66 69 74 2C 34 30 38 2C 34 33 38 2C 35 37 38
2C 37 34 34 2C 31 30 32 30 0D 0A 22 53 65 6C 6C
69 6E 67 2C 20 47 65 6E 65 72 61 6C 2C 20 61 6E
64 20 41 64 6D 69 6E 20 45 78 70 22 2C 32 35 34
2C 32 37 31 2C 33 36 34 2C 34 35 34 2C 35 37 36
0D 0A 20 20 4F 70 65 72 61 74 69 6E 67 20 49 6E
63 6F 6D 65 20 62 65 66 6F 72 65 20 44 65 70 72
2C 31 35 34 2C 31 36 37 2C 32 31 34 2C 32 39 30
2C 34 34 34 0D 0A 44 65 70 72 65 63 69 61 74 69
6F 6E 20 61 6E 64 20 41 6D 6F 72 74 69 7A 61 74
69 6F 6E 2C 32 35 2C 33 31 2C 33 38 2C 35 32 2C
37 30 0D 0A 20 20 4F 70 65 72 61 74 69 6E 67 20
50 72 6F 66 69 74 2C 31 32 39 2C 31 33 36 2C 31
37 36 2C 32 33 38 2C 33 37 34 0D 0A 49 6E 74 65
72 65 73 74 20 45 78 70 65 6E 73 65 2C 34 2C 33
2C 33 2C 31 2C 34 0D 0A 4F 74 68 65 72 20 47 61
69 6E 73 20 61 6E 64 20 4C 6F 73 73 65 73 2C 30
2C 37 2C 31 30 2C 30 2C 2D 31 0D 0A 20 20 50 72
65 74 61 78 20 49 6E 63 6F 6D 65 2C 31 32 35 2C
31 32 36 2C 31 36 33 2C 32 33 37 2C 33 37 31 0D
0A 49 6E 63 6F 6D 65 20 54 61 78 20 45 78 70 65
6E 73 65 2C 35 35 2C 35 32 2C 36 35 2C 39 32 2C
31 34 31 0D 0A 20 20 4E 65 74 20 49 6E 63 6F 6D
65 2C 37 30 2C 37 34 2C 39 38 2C 31 34 35 2C 32
33 30 0D 0A

(3) My file:
******csv input with a vivd trailing comma******
Years,2011,2012,2013,2014,2015,
Sales,1062,1252,1587,1934,2519,
Cost of Goods Sold,654,814,1009,1190,1499,
  Gross Profit,408,438,578,744,1020,
SG&A,254,271,364,454,576,
  Operating Income before Depr,154,167,214,290,444,
Depreciation and Amortization,25,31,38,52,70,
  Operating Profit,129,136,176,238,374,
Interest Expense,4,3,3,1,4,
Other Gains and Losses,0,7,10,0,-1,
  Pretax Income,125,126,163,237,371,
Income Tax Expense,55,52,65,92,141,
  Net Income,70,74,98,145,230,
******program output******
(1) Years,2011,2012,2013,2014,2015,
(1.1) buffer length = 32
(2) Years
(2.2) buffer length = 26
(3) 2011 2012 2013 2014 2015 0 

(1) Sales,1062,1252,1587,1934,2519,
(1.1) buffer length = 32
(2) Sales
(2.2) buffer length = 26
(3) 1062 1252 1587 1934 2519 0 

(1) Cost of Goods Sold,654,814,1009,1190,1499,
(1.1) buffer length = 44
(2) Cost of Goods Sold
(2.2) buffer length = 25
(3) 654 814 1009 1190 1499 0 0 

(1)   Gross Profit,408,438,578,744,1020,
(1.1) buffer length = 38
(2)   Gross Profit
(2.2) buffer length = 23
(3) 408 438 578 744 1020 0 0 

(1) SG&A,254,271,364,454,576,
(1.1) buffer length = 27
(2) SG&A
(2.2) buffer length = 22
(3) 254 271 364 454 576 0 0 

(1)   Operating Income before Depr,154,167,214,290,444,
(1.1) buffer length = 53
(2)   Operating Income before Depr
(2.2) buffer length = 22
(3) 154 167 214 290 444 0 0 

(1) Depreciation and Amortization,25,31,38,52,70,
(1.1) buffer length = 47
(2) Depreciation and Amortization
(2.2) buffer length = 17
(3) 25 31 38 52 70 0 0 

(1)   Operating Profit,129,136,176,238,374,
(1.1) buffer length = 41
(2)   Operating Profit
(2.2) buffer length = 22
(3) 129 136 176 238 374 0 0 

(1) Interest Expense,4,3,3,1,4,
(1.1) buffer length = 29
(2) Interest Expense
(2.2) buffer length = 12
(3) 4 3 3 1 4 0 0 

(1) Other Gains and Losses,0,7,10,0,-1,
(1.1) buffer length = 37
(2) Other Gains and Losses
(2.2) buffer length = 14
(3) 0 7 10 0 -1 0 0 

(1)   Pretax Income,125,126,163,237,371,
(1.1) buffer length = 38
(2)   Pretax Income
(2.2) buffer length = 22
(3) 125 126 163 237 371 0 0 

(1) Income Tax Expense,55,52,65,92,141,
(1.1) buffer length = 37
(2) Income Tax Expense
(2.2) buffer length = 18
(3) 55 52 65 92 141 0 0 

(1)   Net Income,70,74,98,145,230,
(1.1) buffer length = 32
(2)   Net Income
(2.2) buffer length = 19
(3) 70 74 98 145 230 0 0 

Something went wrong with the provided file

******hex******
59 65 61 72 73 2C 32 30 31 31 2C 32 30 31 32 2C
32 30 31 33 2C 32 30 31 34 2C 32 30 31 35 2C 0A
53 61 6C 65 73 2C 31 30 36 32 2C 31 32 35 32 2C
31 35 38 37 2C 31 39 33 34 2C 32 35 31 39 2C 0A
43 6F 73 74 20 6F 66 20 47 6F 6F 64 73 20 53 6F
6C 64 2C 36 35 34 2C 38 31 34 2C 31 30 30 39 2C
31 31 39 30 2C 31 34 39 39 2C 0D 0A 20 20 47 72
6F 73 73 20 50 72 6F 66 69 74 2C 34 30 38 2C 34
33 38 2C 35 37 38 2C 37 34 34 2C 31 30 32 30 2C
0D 0A 53 47 26 41 2C 32 35 34 2C 32 37 31 2C 33
36 34 2C 34 35 34 2C 35 37 36 2C 0D 0A 20 20 4F
70 65 72 61 74 69 6E 67 20 49 6E 63 6F 6D 65 20
62 65 66 6F 72 65 20 44 65 70 72 2C 31 35 34 2C
31 36 37 2C 32 31 34 2C 32 39 30 2C 34 34 34 2C
0D 0A 44 65 70 72 65 63 69 61 74 69 6F 6E 20 61
6E 64 20 41 6D 6F 72 74 69 7A 61 74 69 6F 6E 2C
32 35 2C 33 31 2C 33 38 2C 35 32 2C 37 30 2C 0D
0A 20 20 4F 70 65 72 61 74 69 6E 67 20 50 72 6F
66 69 74 2C 31 32 39 2C 31 33 36 2C 31 37 36 2C
32 33 38 2C 33 37 34 2C 0D 0A 49 6E 74 65 72 65
73 74 20 45 78 70 65 6E 73 65 2C 34 2C 33 2C 33
2C 31 2C 34 2C 0D 0A 4F 74 68 65 72 20 47 61 69
6E 73 20 61 6E 64 20 4C 6F 73 73 65 73 2C 30 2C
37 2C 31 30 2C 30 2C 2D 31 2C 0D 0A 20 20 50 72
65 74 61 78 20 49 6E 63 6F 6D 65 2C 31 32 35 2C
31 32 36 2C 31 36 33 2C 32 33 37 2C 33 37 31 2C
0D 0A 49 6E 63 6F 6D 65 20 54 61 78 20 45 78 70
65 6E 73 65 2C 35 35 2C 35 32 2C 36 35 2C 39 32
2C 31 34 31 2C 0D 0A 20 20 4E 65 74 20 49 6E 63
6F 6D 65 2C 37 30 2C 37 34 2C 39 38 2C 31 34 35
2C 32 33 30 2C 0D 0A 
0
Alla
12/16/2016 11:06:18 AM
On Friday, December 16, 2016 at 2:03:48 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 10:43, Alla _ wrote:
> > On Friday, December 16, 2016 at 1:01:46 PM UTC+3, Richard Heathfield wrote:
> >> On 16/12/16 09:19, Alla _ wrote:
> <snip>
> >>> Great. We are working with exactly the same file,
> >>> and you don't get any trailing zeros, while I get one
> >>> - see in (3).
> >>
> >> But I'm using the same code as you, so clearly we are /not/ using
> >> exactly the same file.
> >>
> > Well, I have nothing to add here. One of the files I am using, and the one
> > I have been referring in this discussion today, is the one you sent me.
> > Are you using a different one?
> 
> No. Here's the hex dump, so you can see exactly what I'm using (and it's 
> exactly what I sent you):
> 
> 49 6E 63 6F 6D 65 20 53 74 61 74 65 6D 65 6E 74  |Income Statement|
> 2C 32 30 31 31 2C 32 30 31 32 2C 32 30 31 33 2C  |,2011,2012,2013,|
> 32 30 31 34 2C 32 30 31 35 0A 53 61 6C 65 73 2C  |2014,2015.Sales,|
> 31 30 36 32 2C 31 32 35 32 2C 31 35 38 37 2C 31  |1062,1252,1587,1|
> 39 33 34 2C 32 35 31 39 0A 43 6F 73 74 20 6F 66  |934,2519.Cost of|
> 20 47 6F 6F 64 73 20 53 6F 6C 64 2C 36 35 34 2C  | Goods Sold,654,|
> 38 31 34 2C 31 30 30 39 2C 31 31 39 30 2C 31 34  |814,1009,1190,14|
> 39 39 0A 20 20 47 72 6F 73 73 20 50 72 6F 66 69  |99.  Gross Profi|
> 74 2C 34 30 38 2C 34 33 38 2C 35 37 38 2C 37 34  |t,408,438,578,74|
> 34 2C 31 30 32 30 0A 22 53 65 6C 6C 69 6E 67 2C  |4,1020."Selling,|
> 20 47 65 6E 65 72 61 6C 2C 20 61 6E 64 20 41 64  | General, and Ad|
> 6D 69 6E 20 45 78 70 22 2C 32 35 34 2C 32 37 31  |min Exp",254,271|
> 2C 33 36 34 2C 34 35 34 2C 35 37 36 0A 20 20 4F  |,364,454,576.  O|
> 70 65 72 61 74 69 6E 67 20 49 6E 63 6F 6D 65 20  |perating Income |
> 62 65 66 6F 72 65 20 44 65 70 72 2C 31 35 34 2C  |before Depr,154,|
> 31 36 37 2C 32 31 34 2C 32 39 30 2C 34 34 34 0A  |167,214,290,444.|
> 44 65 70 72 65 63 69 61 74 69 6F 6E 20 61 6E 64  |Depreciation and|
> 20 41 6D 6F 72 74 69 7A 61 74 69 6F 6E 2C 32 35  | Amortization,25|
> 2C 33 31 2C 33 38 2C 35 32 2C 37 30 0A 20 20 4F  |,31,38,52,70.  O|
> 70 65 72 61 74 69 6E 67 20 50 72 6F 66 69 74 2C  |perating Profit,|
> 31 32 39 2C 31 33 36 2C 31 37 36 2C 32 33 38 2C  |129,136,176,238,|
> 33 37 34 0A 49 6E 74 65 72 65 73 74 20 45 78 70  |374.Interest Exp|
> 65 6E 73 65 2C 34 2C 33 2C 33 2C 31 2C 34 0A 4F  |ense,4,3,3,1,4.O|
> 74 68 65 72 20 47 61 69 6E 73 20 61 6E 64 20 4C  |ther Gains and L|
> 6F 73 73 65 73 2C 30 2C 37 2C 31 30 2C 30 2C 2D  |osses,0,7,10,0,-|
> 31 0A 20 20 50 72 65 74 61 78 20 49 6E 63 6F 6D  |1.  Pretax Incom|
> 65 2C 31 32 35 2C 31 32 36 2C 31 36 33 2C 32 33  |e,125,126,163,23|
> 37 2C 33 37 31 0A 49 6E 63 6F 6D 65 20 54 61 78  |7,371.Income Tax|
> 20 45 78 70 65 6E 73 65 2C 35 35 2C 35 32 2C 36  | Expense,55,52,6|
> 35 2C 39 32 2C 31 34 31 0A 20 20 4E 65 74 20 49  |5,92,141.  Net I|
> 6E 63 6F 6D 65 2C 37 30 2C 37 34 2C 39 38 2C 31  |ncome,70,74,98,1|
> 34 35 2C 32 33 30 0A                             |45,230.|
> 
> 
> >
> >> Barry Schwarz has suggested that you may have a trailing space in your
> >> data, which wouldn't show up on Usenet, and he says (presumably after
> >> trying it out) that this will produce the symptom you are describing.
> > I thought so too. I have tried manually to delete rows and columns around
> > the data I need to preserve. I am out of tools,
> 
> No, you are *not* out of tools. What you have is a reluctance to use a 
> hex dump. Overcome that reluctance, and paste a hex dump of the file 
> that is causing you problems.
> 
Oh no, I am not reluctant, it is not true ) The word "reluctant" is not 
applicable to me )
0
Alla
12/16/2016 11:08:58 AM
Richard Heathfield <rjh@cpax.org.uk> writes:

> On 16/12/16 09:51, Alla _ wrote:
> <snip>
>> The only thing - my compiler didn't try to warn me, it screamed: "Error" )
>> parse_csv_2.c:29:37: error: suggest braces around initialization of subobject
>>       [-Werror,-Wmissing-braces]
>>         char array_names[50][50] = {0};
>
> Then you have three choices: remove -Werror from your list of compiler
> flags, or remove -Wmissing-braces from your list of compiler flags, or
> change the code.

Or, four, change your compiler.  That may mean just an upgrade.  I think
gcc have fixed this or at least that's how I remember the issue, and I
certainly haven't seen that message, for that construct, for quite a
while.

<snip>
-- 
Ben.
0
Ben
12/16/2016 11:09:22 AM
On 16/12/16 00:27, Richard Heathfield wrote:
> On 15/12/16 22:20, David Brown wrote:
>> On 15/12/16 16:37, Richard Heathfield wrote:
> 
> <snip>
> 
>>> Module, right? Well, we have to say what we mean by "module". We might
>>> reasonably define it as a source and its associated header, in which
>>> case I think your point is fine.
>>
>> Yes, that is what I was meaning by "module".  Of course you are correct
>> that I should have defined it - especially as it could mean other things.
>>
>>> But we might also define it as a set of
>>> related functions, which might be spread over several sources. If we
>>> define it that way (and, frankly, I do), then it becomes reasonable to
>>> think of a module as having two headers: one public, and one private.
>>> The private header is shared across all the sources that go to make up
>>> that module, and the public header is shared across other modules.
>>
>> Agreed.
>>
>> If you prefer to call this sort of multi-file "lump" a "module", what do
>> you call a header/C file "lump" ?
> 
> A lump. What else?
> 
> The correct term is "preprocessing translation unit". PTU, anyone? But I
> tend to eschew the P and stick to "translation unit" (which is actually
> the correct term for what remains after preprocessing, but I'm not too
> fussed about the difference except when arguing about the difference!).

No, a translation unit (or compilation unit) is something different.

If you have:

widget.h	// Declarations for widget functions
widget.c	// Implementation of widget functions
game.c		// Uses widget functions

Then widget.c + any files it #include's (hopefully including widget.h)
is a translation unit.  So is game.c + any files /it/ includes
(including widget.h).

I am wondering what you call the lump "widget.h + widget.c", if
anything?  Maybe it is just a "module" that happens to contain only two
files.

> 
>> It is a real shame that C has no good way of dealing with this sort of
>> thing properly - no namespaces, and only two levels of identifier
>> visibility (static to a compilation unit, or global for the whole
>> program).
> 
> Yes. I'd like to see an hierarchy of visibility. But it isn't going to
> happen, and I can live with that.

If you couldn't live with it, you wouldn't be here - you'd be next door
in c.l.c++ :-)

> 
>>> So, for example, we might have an abstract data type that is defined in
>>> mylittlemoduleinternal.h and shared across mylittlemoduleA.c,
>>> mylittlemoduleB.c, and mylittlemoduleC.c. And then we have a bunch of
>>> prototypes listed in mylittlemodule.h, for sharing with other modules.
>>>
>>> Would you agree with that?
>>
>> Yes, that seems reasonable.  It is not exactly how I would organise it,
>> but close enough for now.
> 
> I think that's important, actually. What I mean is, okay, you and I
> would tend to choose to organise stuff in different ways, but those
> differences are likely to be fairly minor, and each of us would
> recognise that the other has made reasonable choices that just happen to
> be different to our own reasonable choice. This is what cultural
> diversity is all about!

Yes.  Since C does not have support for "modules", "units",
"namespaces", "lumps", etc., beyond the humble #include mechanism, we
have the freedom to implement whatever balance of predictable rules and
flexibility suits the job at hand.

> 
>>>> Anything that is used only within the
>>>> implementation C file, should be in that file alone - and not the
>>>> header.  So if you have a function "cisprng_calculate" that is not
>>>> exported, it should not be declared in the header file, and it
>>>> should be
>>>> declared "static" in the C file.
>>>
>>> But if it is needed by other sources within the same module, that gives
>>> us a problem. One possible solution is to share a function pointer in
>>> the private header --- but it's messy.
>>
>> Yes.
> 
> Yeeesss. It's... it isn't just messy, is it? It's actually icky. There
> is no /good/ solution, as far as I know.
> 
> What I actually do in this situation (when I get to choose, anyway) is
> to declare the function in the private header, and make it extern (not
> explicitly, but by not making it static), and then just not telling the
> user-programmer it's there! That way, even though they can in theory
> call it, first they have to find out about it, and they can only do that
> by going to some serious and deliberate investigative effort.

That would work.

I use "extern" explicitly on function declarations in headers (except
for "static inline" functions, of course).  I know it is unnecessary,
but I feel that it should have been necessary - "extern by default" is a
design mistake in C, IMHO.  It also makes it more consistent with extern
variable declarations.  (In small embedded systems, it is often useful
to have "global variables".  The balance of tradeoffs is different than
on bigger systems.)

> 
>> In this particular case, the program is small enough that such
>> multi-file modules are not going to be necessary, so a simpler system is
>> possible.  But the simpler system does not scale to a more hierarchical
>> organisation without changes such as you suggest.
> 
> Namespaces would be pleasant, wouldn't they? But we're not going to get
> them, and that's that.

Indeed.

> 
>>>> Assuming you are using C99 or C11 rather than C90 (I know some people
>>>> still use C90),
>>>
>>> You hear that, folks? David thinks I'm *some people*! Fame at last!
>>
>> I am sure you have good reasons for using C90.  But I think it usually
>> makes sense for someone relatively new to C to use C99 features when
>> they make coding clearer (following Heathfield's rule, naturally)
> 
> My blushes, Watson!
> 
> My reasons for using C90 are as follows:
> 
> 1) it works everywhere;
> 2) C99 doesn't add anything I need that I can't get from C++. So, if I
> need those features, I'll just write that code in C++ instead, which
> means I get lots of toys to play with.

My reasons for using C99 are:

1) It works everywhere I need it to work.
2) It adds a lot of useful things to C without needing to go for C++.

I am seeing more interest in C++ in the sorts of systems I work with -
the combination of better compilers, useful features in C++11/C++14, and
more powerful microcontrollers at the small end mean that C++ is
feasible on projects where previously C was the only option.

But since I can use C99, I don't have quite the same pressure to move to
C++ as C90 programmers do :-)

> 
>> If the usage documentation can be written in the header (without passing
>> "the point"), then that is a good thing.
> 
> A brief summary, yes.
> 
>> External documentation can go
>> into more detail, examples, etc., but there is always the risk of
>> getting out of synchronisation between the code and the document
> 
> But that's true of the internal documentation as well --- it can and
> does get out of synch with the code. The solution to synchronisation
> problems is discipline, and that's required whether the documentation is
> internal or external.

Also true.  It is easier, however, to keep your internal documentation
in sync - you've got the file open and have made changes to the code in
it, so you can change the lines of text there too.  Your external
documentation might mean different tools, different procedures, etc. -
and it takes more discipline to get right.

The best is to write self-documenting code - then it is never out of
sync.  At least, don't write anything in comments that could be
expressed in code (such as by good choice of function and parameter names).

> 
>> or
>> simply not having the document at hand when you are working with the
>> code.  Documentation in the header file is /always/ quickly at hand when
>> you are working with the code!
> 
> As long as you know the code well enough to know where to look, or know
> what you're looking for well enough to execute a grep. External docs can
> cover several headers in one indexed document. They can offer hyperlinks
> to specific function descriptions, they can contain diagrams (and I
> don't mean ASCII art), and they can use all the tricks of the word
> processing trade to make the document more readable than can generally
> be managed in eighty columns of Courier 12.
> 

Some of use have moved to using more capable IDE's for their development
work...

Of course you are correct that external documentation can have more
features and flexibility than internal documentation in comments.

It is also possible to make combinations of the two.  On one customer
project I made extensive use of doxygen.  So each item in the headers
had doxygen comments, and there were comment blocks giving a bit more
general information about the modules (or "lumps").  For larger sections
of documentation, including diagrams, I used separate text files that
were contained within the source code tree and processed by doxygen, but
without any code (except examples and the like).

0
David
12/16/2016 11:10:22 AM
On Friday, December 16, 2016 at 1:04:45 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 09:44, Alla _ wrote:
> > On Thursday, December 15, 2016 at 11:36:18 PM UTC+3, Barry Schwarz wrote:
> <snip>
> 
> >> There is something you are not showing us:
> >>     Possibly trailing data in the file.  That is why several have
> >> asked for a hex dump.
> >>     Possibly different code than what is in your message.
> >>
> > What has led you yet again to suspect me of some hidden motives?
> 
> I can't see anything in Barry's article that justifies such an 
> accusation. What makes you think he suspects you of anything? He's 
> describing facts, not ascribing motives.
Language barrier? Have I misinterpreted the wording? If so, Barry, I am 
sorry, truly; though I hope I didn't sound impolite; if I ever do, please, 
do me a favor and signal that this or that phrase sounds impolite; I will
be grateful for that. Written dialogues often cause misunderstandings
and misinterpretations, so it is good to clear up if anything. 
> 
> I would imagine that Barry, like me, tends to apply Hanlon's Razor to 
> every Usenet article he reads.
> 
> If Barry is correct, the thing you are not showing us is trailing data 
> in the file. If that trailing data is a space character, it is very 
> likely that you didn't know the data was there. That, as Barry rightly 
> says, is where a hex dump can be invaluable, because nothing can hide 
> from a hex dump.
> 
> -- 
> Richard Heathfield
> Email: rjh at cpax dot org dot uk
> "Usenet is a strange place" - dmr 29 July 1999
> Sig line 4 vacant - apply within
0
Alla
12/16/2016 11:14:12 AM
On Friday, December 16, 2016 at 2:09:30 PM UTC+3, Ben Bacarisse wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
> 
> > On 16/12/16 09:51, Alla _ wrote:
> > <snip>
> >> The only thing - my compiler didn't try to warn me, it screamed: "Error" )
> >> parse_csv_2.c:29:37: error: suggest braces around initialization of subobject
> >>       [-Werror,-Wmissing-braces]
> >>         char array_names[50][50] = {0};
> >
> > Then you have three choices: remove -Werror from your list of compiler
> > flags, or remove -Wmissing-braces from your list of compiler flags, or
> > change the code.
> 
> Or, four, change your compiler.  That may mean just an upgrade.  I think
> gcc have fixed this or at least that's how I remember the issue, and I
> certainly haven't seen that message, for that construct, for quite a
> while.
> 
Apple has "quietly" started using clang; I have only recently found out;
it seems I received that switch after installing some recent upgrades 
Apple has sent.  
0
Alla
12/16/2016 11:16:30 AM
On Friday, December 16, 2016 at 2:16:50 PM UTC+3, Alla _ wrote:
> On Friday, December 16, 2016 at 2:09:30 PM UTC+3, Ben Bacarisse wrote:
> > Richard Heathfield <rjh@cpax.org.uk> writes:
> > 
> > > On 16/12/16 09:51, Alla _ wrote:
> > > <snip>
> > >> The only thing - my compiler didn't try to warn me, it screamed: "Error" )
> > >> parse_csv_2.c:29:37: error: suggest braces around initialization of subobject
> > >>       [-Werror,-Wmissing-braces]
> > >>         char array_names[50][50] = {0};
> > >
> > > Then you have three choices: remove -Werror from your list of compiler
> > > flags, or remove -Wmissing-braces from your list of compiler flags, or
> > > change the code.
> > 
> > Or, four, change your compiler.  That may mean just an upgrade.  I think
> > gcc have fixed this or at least that's how I remember the issue, and I
> > certainly haven't seen that message, for that construct, for quite a
> > while.
> > 
Apple has "quietly" started using clang; I have only recently found out;
it seems I received that switch after installing some recent upgrades 
Apple has sent.
But I still use gcc -flags to compile programs. 

0
Alla
12/16/2016 11:20:33 AM
On 16/12/16 11:06, Alla _ wrote:
> On Friday, December 16, 2016 at 1:11:58 PM UTC+3, Richard Heathfield wrote:
>> On 16/12/16 09:25, Alla _ wrote:
>>> On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
>>>> On 15/12/16 17:25, Alla _ wrote:
> <snip>
>> Hex dump, please.
>>
> Below I post:
> (1) The corrected version of the program based on comments I have received;

I will make only one comment on the program, because I think first we 
need to sort out your data.

> (2) csv inputs
> (3) program's outputs
> (4) hex dumps for each csv
> Interesting: now that I have changed the error checking method, i.e. if (!ferror(fp)),
> instead of checking for !feof, I get the error message at the end of the whole output;
> this is telling me that there is something wrong with every csv file.

Yeah, your logic says:

IF there is NOT an error
   Report an error

Remove the !


>
> (2) File you have sent me (without changing Income
> statement to Years in the first row):
>
> ******csv file******
> Income Statement,2011,2012,2013,2014,2015

This is the line we care about, because:

> ******code output with one trailing zero******
> (3) 2011 2012 2013 2014 2015 0

....here is our first trailing zero. So let's look at the data up to that 
point, using a hex dump.

> ******hex******
> 49 6E 63 6F 6D 65 20 53 74 61 74 65 6D 65 6E 74
> 2C 32 30 31 31 2C 32 30 31 32 2C 32 30 31 33 2C
> 32 30 31 34 2C 32 30 31 35 0D 0A

There's your problem right there. 0D 0A instead of just 0A.

On Windows, 0D 0A is a correct line-end indicator. On Linux and on 
modern Macs, 0A is the right indicator. So your copy of Excel is 
preparing its CSV file for use on Windows, when it *should* be preparing 
it for use on the host system.

Here is a simple filter for you to use:

#include <stdio.h>
int main(void)
{
   int ch;
   while((ch = getchar()) != EOF)
   {
     if(ch != '\r')
     {
       putchar(ch);
     }
   }
   return 0;
}

Usage:

../dos2unix < input.csv > output.csv

Removes all carriage returns from the file.

You might already have a dos2unix program on your system, in which case 
by all means use that.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 11:28:55 AM
On 16/12/16 11:09, Ben Bacarisse wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
>
>> On 16/12/16 09:51, Alla _ wrote:
>> <snip>
>>> The only thing - my compiler didn't try to warn me, it screamed: "Error" )
>>> parse_csv_2.c:29:37: error: suggest braces around initialization of subobject
>>>       [-Werror,-Wmissing-braces]
>>>         char array_names[50][50] = {0};
>>
>> Then you have three choices: remove -Werror from your list of compiler
>> flags, or remove -Wmissing-braces from your list of compiler flags, or
>> change the code.
>
> Or, four, change your compiler.

Oh yes! I didn't even consider such a drastic option, although of course...

> That may mean just an upgrade.

....as you point out, it need not be /that/ drastic.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 11:33:02 AM
On 16/12/16 11:10, David Brown wrote:
> On 16/12/16 00:27, Richard Heathfield wrote:
>> On 15/12/16 22:20, David Brown wrote:
>>> On 15/12/16 16:37, Richard Heathfield wrote:
>>
>> <snip>
>>
>>>> Module, right? Well, we have to say what we mean by "module". We might
>>>> reasonably define it as a source and its associated header, in which
>>>> case I think your point is fine.
>>>
>>> Yes, that is what I was meaning by "module".  Of course you are correct
>>> that I should have defined it - especially as it could mean other things.
>>>
>>>> But we might also define it as a set of
>>>> related functions, which might be spread over several sources. If we
>>>> define it that way (and, frankly, I do), then it becomes reasonable to
>>>> think of a module as having two headers: one public, and one private.
>>>> The private header is shared across all the sources that go to make up
>>>> that module, and the public header is shared across other modules.
>>>
>>> Agreed.
>>>
>>> If you prefer to call this sort of multi-file "lump" a "module", what do
>>> you call a header/C file "lump" ?
>>
>> A lump. What else?
>>
>> The correct term is "preprocessing translation unit". PTU, anyone? But I
>> tend to eschew the P and stick to "translation unit" (which is actually
>> the correct term for what remains after preprocessing, but I'm not too
>> fussed about the difference except when arguing about the difference!).
>
> No, a translation unit (or compilation unit) is something different.

Er, no it isn't.

> If you have:
>
> widget.h	// Declarations for widget functions
> widget.c	// Implementation of widget functions
> game.c		// Uses widget functions
>
> Then widget.c + any files it #include's (hopefully including widget.h)
> is a translation unit.

Yep. Dat what I said.

> So is game.c + any files /it/ includes
> (including widget.h).

Dat what I said, too.

> I am wondering what you call the lump "widget.h + widget.c", if
> anything?

Oh, you mean sans otherwidget.h and stdio.h and so on? Uh, I don't have 
a name for that. I don't /need/ a name for that.

> Maybe it is just a "module" that happens to contain only two
> files.

Oh, sure. Sorry, was that not clear? It's amazing how tangled these 
discussions can be come when we don't spell everything out. Yes, I am 
perfectly content to have a module consisting of just one C file and its 
header. But I am /also/ perfectly content for the module to have more 
than one C file.

I'm not terribly interested in linkers so I don't know whether this has 
changed, but they didn't used to be terribly bright, and they couldn't 
pull in just one function from an object file even if only one were 
used; they had to pull in the whole file. So, when space was so terribly 
important (and no doubt it still is for many people), C programmers 
would sometimes put each function in its own source file, even if there 
was a clear relationship existing amongst a bunch of those functions. In 
those circumstances, to call each one a module when, actually, it was 
the whole bunch of them that was really the module, would have seemed a 
bit silly.

<snip>

> The best is to write self-documenting code - then it is never out of
> sync.  At least, don't write anything in comments that could be
> expressed in code (such as by good choice of function and parameter names).

Yes.

>>> or
>>> simply not having the document at hand when you are working with the
>>> code.  Documentation in the header file is /always/ quickly at hand when
>>> you are working with the code!
>>
>> As long as you know the code well enough to know where to look, or know
>> what you're looking for well enough to execute a grep. External docs can
>> cover several headers in one indexed document. They can offer hyperlinks
>> to specific function descriptions, they can contain diagrams (and I
>> don't mean ASCII art), and they can use all the tricks of the word
>> processing trade to make the document more readable than can generally
>> be managed in eighty columns of Courier 12.
>>
>
> Some of use have moved to using more capable IDE's for their development
> work...

You have an IDE that's more capable than Linux? I'm impressed!

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 11:43:00 AM
On 16/12/16 11:14, Alla _ wrote:
> On Friday, December 16, 2016 at 1:04:45 PM UTC+3, Richard Heathfield wrote:
>> On 16/12/16 09:44, Alla _ wrote:
>>> On Thursday, December 15, 2016 at 11:36:18 PM UTC+3, Barry Schwarz wrote:
>> <snip>
>>
>>>> There is something you are not showing us:
>>>>     Possibly trailing data in the file.  That is why several have
>>>> asked for a hex dump.
>>>>     Possibly different code than what is in your message.
>>>>
>>> What has led you yet again to suspect me of some hidden motives?
>>
>> I can't see anything in Barry's article that justifies such an
>> accusation. What makes you think he suspects you of anything? He's
>> describing facts, not ascribing motives.
> Language barrier? Have I misinterpreted the wording? If so, Barry, I am
> sorry, truly; though I hope I didn't sound impolite;

Don't worry about Barry. He'll survive. :-)

On Usenet, you need a thick skin. Even when people aren't trying to be 
unpleasant, they can sometimes appear that way because they were 
concentrating more on what they wrote than on how they wrote it.

Be very slow to jump to conclusions about people, but also remember that 
someone who has been unfailingly polite in the past is probably still 
being polite, even if it might at first seem otherwise.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 11:46:25 AM
Richard Heathfield <rjh@cpax.org.uk> writes:
<snip>

> ...here is our first trailing zero. So let's look at the data up to
> that point, using a hex dump.
>
>> ******hex******
>> 49 6E 63 6F 6D 65 20 53 74 61 74 65 6D 65 6E 74
>> 2C 32 30 31 31 2C 32 30 31 32 2C 32 30 31 33 2C
>> 32 30 31 34 2C 32 30 31 35 0D 0A
>
> There's your problem right there. 0D 0A instead of just 0A.
>
> On Windows, 0D 0A is a correct line-end indicator. On Linux and on
> modern Macs, 0A is the right indicator. So your copy of Excel is
> preparing its CSV file for use on Windows, when it *should* be
> preparing it for use on the host system.

Well, should is a matter of opinion.  Here is at least one opinion about
what a CSV file is:

   1.  Each record is located on a separate line, delimited by a line
       break (CRLF).  For example:

       aaa,bbb,ccc CRLF
       zzz,yyy,xxx CRLF

This is from RFC 4180 (don't worry, it does explain that fields can span
lines later on).  In a court of law you could well argue that MS Excel
is doing the right thing here.  Whilst you may well say it should not do
this, it is certain that if it didn't, someone else will say that they
definitely should.

CSV files have no universally agreed separator (despite the name), no
universally agreed character encoding, no universally agreed line ending
and no universally agreed quoting scheme.  The only universally agreed
thing about them is that they are a pain in neck.

The sound advice about file formats: "be generous in what your code
accepts and strict in what it generates" is almost impossible to follow
here, but it would certainly require the file to be processed as a
binary stream.

Normally, for a beginner, all this unfortunate mess can be glossed over,
but if the files are being generated with CRLF endings it would appear
that it can't be avoided here.

<snip>
-- 
Ben.
0
Ben
12/16/2016 12:02:45 PM
On Friday, December 16, 2016 at 2:29:02 PM UTC+3, Richard Heathfield wrote:
> On 16/12/16 11:06, Alla _ wrote:
> > On Friday, December 16, 2016 at 1:11:58 PM UTC+3, Richard Heathfield wrote:
> >> On 16/12/16 09:25, Alla _ wrote:
> >>> On Thursday, December 15, 2016 at 10:31:28 PM UTC+3, Richard Heathfield wrote:
> >>>> On 15/12/16 17:25, Alla _ wrote:
> > <snip>
> >> Hex dump, please.
> >>
> > Below I post:
> > (1) The corrected version of the program based on comments I have received;
> 
> I will make only one comment on the program, because I think first we 
> need to sort out your data.
> 
> > (2) csv inputs
> > (3) program's outputs
> > (4) hex dumps for each csv
> > Interesting: now that I have changed the error checking method, i.e. if (!ferror(fp)),
> > instead of checking for !feof, I get the error message at the end of the whole output;
> > this is telling me that there is something wrong with every csv file.
> 
> Yeah, your logic says:
> 
> IF there is NOT an error
>    Report an error
oops! ))) I have no idea how that ! has appeared there )
> 
<snip>
0
Alla
12/16/2016 12:18:18 PM
On 16/12/16 12:02, Ben Bacarisse wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
<snip>
>> On Windows, 0D 0A is a correct line-end indicator. On Linux and on
>> modern Macs, 0A is the right indicator. So your copy of Excel is
>> preparing its CSV file for use on Windows, when it *should* be
>> preparing it for use on the host system.
>
> Well, should is a matter of opinion.  Here is at least one opinion about
> what a CSV file is:
>
>    1.  Each record is located on a separate line, delimited by a line
>        break (CRLF).  For example:
>
>        aaa,bbb,ccc CRLF
>        zzz,yyy,xxx CRLF
>
> This is from RFC 4180

I take your point, but let's look at the context:

    While there are various specifications and implementations for the
    CSV format (for ex. [4], [5], [6] and [7]), there is no formal
    specification in existence, which allows for a wide variety of
    interpretations of CSV files.  This section documents the format that
    seems to be followed by most implementations:

    1.  Each record is located on a separate line, delimited by a line
        break (CRLF).

So it is reporting what the author /believes/ to be the practice 
followed by /most/ implementations. And I doubt very much whether Y 
Shafranovich stopped to consider whether that was the practice followed 
on *nix-like systems. Furthermore, the disclaimer at the top of the 
document makes it clear that this is *not* even an Internet standard, 
let alone a general computing standard.

CRLF is often specified in RFCs, and it can surely only be a sop to 
Microsoft. I can think of no other reason for it, anyway. If MS would 
only do the decent thing and drop this ridiculous convention, a whole 
universe of pain could be avoided.

> (don't worry, it does explain that fields can span
> lines later on).  In a court of law you could well argue that MS Excel
> is doing the right thing here.  Whilst you may well say it should not do
> this, it is certain that if it didn't, someone else will say that they
> definitely should.
>
> CSV files have no universally agreed separator (despite the name), no
> universally agreed character encoding, no universally agreed line ending
> and no universally agreed quoting scheme.  The only universally agreed
> thing about them is that they are a pain in neck.

Well, that's certainly true. No doubt we all have our own preferences. 
Personally, if I have the choice, I choose tab separators, \n line 
endings, and straight ASCII encoding, high bit always clear. (Obviously 
that isn't going to suit everybody.)

> The sound advice about file formats: "be generous in what your code
> accepts and strict in what it generates" is almost impossible to follow
> here, but it would certainly require the file to be processed as a
> binary stream.
>
> Normally, for a beginner, all this unfortunate mess can be glossed over,
> but if the files are being generated with CRLF endings it would appear
> that it can't be avoided here.

Easy to fix, though - dos2unix will do it.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 12:41:11 PM
On 16/12/16 12:43, Richard Heathfield wrote:
> On 16/12/16 11:10, David Brown wrote:
>> On 16/12/16 00:27, Richard Heathfield wrote:
>>> On 15/12/16 22:20, David Brown wrote:
>>>> On 15/12/16 16:37, Richard Heathfield wrote:
>>>
<snip>
> 
> Oh, sure. Sorry, was that not clear? It's amazing how tangled these
> discussions can be come when we don't spell everything out. Yes, I am
> perfectly content to have a module consisting of just one C file and its
> header. But I am /also/ perfectly content for the module to have more
> than one C file.

It is untangled now, I think.

> 
> I'm not terribly interested in linkers so I don't know whether this has
> changed, but they didn't used to be terribly bright, and they couldn't
> pull in just one function from an object file even if only one were
> used; they had to pull in the whole file. So, when space was so terribly
> important (and no doubt it still is for many people), C programmers
> would sometimes put each function in its own source file, even if there
> was a clear relationship existing amongst a bunch of those functions. In
> those circumstances, to call each one a module when, actually, it was
> the whole bunch of them that was really the module, would have seemed a
> bit silly.

Linkers are a bit brighter now, especially along with smarter compilers.
 A common usage (in embedded systems, where space often is tight) is to
use "function sections" in the compiler so that each function is put
into its own little code section.  On the linker, you enable "garbage
collection" (a poor choice of name for the feature, I think) which
collects all the code sections which are reachable from critical code
such as main(), the reset handler, interrupt functions, etc.  Sections
are joined up as graphs, and anything left on its own island
disconnected from the main()land get thrown out.

Even smarter linkers work with the compiler for link-time optimisation,
but that's another story.

For my own coding, I can usually put all the related files into a single
C file - so two-file modules are the norm for me.  But then, I don't
often have functions defined that are not going to be part of the final
program - and if I do, "function sections" keep the output tidy.

> 
> <snip>
> 
>> The best is to write self-documenting code - then it is never out of
>> sync.  At least, don't write anything in comments that could be
>> expressed in code (such as by good choice of function and parameter
>> names).
> 
> Yes.
> 
>>>> or
>>>> simply not having the document at hand when you are working with the
>>>> code.  Documentation in the header file is /always/ quickly at hand
>>>> when
>>>> you are working with the code!
>>>
>>> As long as you know the code well enough to know where to look, or know
>>> what you're looking for well enough to execute a grep. External docs can
>>> cover several headers in one indexed document. They can offer hyperlinks
>>> to specific function descriptions, they can contain diagrams (and I
>>> don't mean ASCII art), and they can use all the tricks of the word
>>> processing trade to make the document more readable than can generally
>>> be managed in eighty columns of Courier 12.
>>>
>>
>> Some of use have moved to using more capable IDE's for their development
>> work...
> 
> You have an IDE that's more capable than Linux? I'm impressed!
> 

I have an IDE that can display more than eighty columns, open several
files at once, and even choose different fonts (it won't let me pick
Courier 12, however).

Of course, those big files and long lines are a pain when you need to
fix things via a mobile telephone, JuiceSSH, and nano editor - but
fortunately I haven't had to do that very often.

0
David
12/16/2016 1:01:58 PM
On 16/12/2016 12:02, Ben Bacarisse wrote:

> Well, should is a matter of opinion.  Here is at least one opinion about
> what a CSV file is:
>
>    1.  Each record is located on a separate line, delimited by a line
>        break (CRLF).  For example:
>
>        aaa,bbb,ccc CRLF
>        zzz,yyy,xxx CRLF
>
> This is from RFC 4180 (don't worry, it does explain that fields can span
> lines later on).

Big mistake. Apparently leading and trailing spaces are also significant.

This means that a big chunk of a file spanning multiple lines containing 
English prose, or function of a programming language, counts as just one 
CSV field so long as there aren't any commas in there.

(And apparently a quoted field can still contain unadorned newline 
characters. So even fgets can only retrieve part of a field. Such 
characters should be escaped. Newline should be a higher level separator 
than comma.)

> CSV files have no universally agreed separator (despite the name), no
> universally agreed character encoding, no universally agreed line ending
> and no universally agreed quoting scheme.  The only universally agreed
> thing about them is that they are a pain in neck.

If your program generates CSV then you can do it properly so that you 
can read your own files and make it more likely for them to be 
acceptable to other programs. But you might still have the headache of 
dealing with embedded special characters within a field.

> The sound advice about file formats: "be generous in what your code
> accepts and strict in what it generates"

Oh, that's what I just repeated...

-- 
Bartc
0
BartC
12/16/2016 1:03:42 PM
On 16/12/16 13:01, David Brown wrote:
> On 16/12/16 12:43, Richard Heathfield wrote:
>> On 16/12/16 11:10, David Brown wrote:
>>> On 16/12/16 00:27, Richard Heathfield wrote:

<snip>

>>>> As long as you know the code well enough to know where to look, or know
>>>> what you're looking for well enough to execute a grep. External docs can
>>>> cover several headers in one indexed document. They can offer hyperlinks
>>>> to specific function descriptions, they can contain diagrams (and I
>>>> don't mean ASCII art), and they can use all the tricks of the word
>>>> processing trade to make the document more readable than can generally
>>>> be managed in eighty columns of Courier 12.
>>>>
>>>
>>> Some of use have moved to using more capable IDE's for their development
>>> work...
>>
>> You have an IDE that's more capable than Linux? I'm impressed!
>>
>
> I have an IDE that can display more than eighty columns, open several
> files at once, and even choose different fonts (it won't let me pick
> Courier 12, however).

Well, of course I do too, but I /choose/ to keep my code display quite 
traditional because I find it easier to work that way. Does your text 
editor allow you to put hyperlinks in the code without changing the code 
itself? If so, I'm impressed. And can your text editor also display 
pixel-perfect (i.e. not just ASCII art) diagrams? If so, I'm even more 
impressed.

> Of course, those big files and long lines are a pain when you need to
> fix things via a mobile telephone, JuiceSSH, and nano editor - but
> fortunately I haven't had to do that very often.

That's what cars (and trains and boats and planes) are for.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 1:12:45 PM
On 16/12/2016 12:41, Richard Heathfield wrote:

> CRLF is often specified in RFCs, and it can surely only be a sop to
> Microsoft. I can think of no other reason for it, anyway.

I think CRLF was around long before Microsoft. Something to do with 
driving a teletype I understand.

> If MS would
> only do the decent thing and drop this ridiculous convention, a whole
> universe of pain could be avoided.

If the main line-endings are either CRLF or LF (and you disallow 
individual LF or CR within text that were once used for cursor control) 
then it isn't much problem to deal with either convention.

My programs do that and since then I've never had a problem with my 
editor for example dealing with either kind of file (unlike Notepad).

I find dealing properly with tabs is a bigger problem.

> Easy to fix, though - dos2unix will do it.

That's funny, I used to have a program called ADDCR to do the opposite. 
Which I no longer need.


-- 
Bartc
0
BartC
12/16/2016 1:40:12 PM
On 16/12/16 14:12, Richard Heathfield wrote:
> On 16/12/16 13:01, David Brown wrote:
>> On 16/12/16 12:43, Richard Heathfield wrote:
>>> On 16/12/16 11:10, David Brown wrote:
>>>> On 16/12/16 00:27, Richard Heathfield wrote:
> 
> <snip>
> 
>>>>> As long as you know the code well enough to know where to look, or
>>>>> know
>>>>> what you're looking for well enough to execute a grep. External
>>>>> docs can
>>>>> cover several headers in one indexed document. They can offer
>>>>> hyperlinks
>>>>> to specific function descriptions, they can contain diagrams (and I
>>>>> don't mean ASCII art), and they can use all the tricks of the word
>>>>> processing trade to make the document more readable than can generally
>>>>> be managed in eighty columns of Courier 12.
>>>>>
>>>>
>>>> Some of use have moved to using more capable IDE's for their
>>>> development
>>>> work...
>>>
>>> You have an IDE that's more capable than Linux? I'm impressed!
>>>
>>
>> I have an IDE that can display more than eighty columns, open several
>> files at once, and even choose different fonts (it won't let me pick
>> Courier 12, however).
> 
> Well, of course I do too, but I /choose/ to keep my code display quite
> traditional because I find it easier to work that way. Does your text
> editor allow you to put hyperlinks in the code without changing the code
> itself? If so, I'm impressed. And can your text editor also display
> pixel-perfect (i.e. not just ASCII art) diagrams? If so, I'm even more
> impressed.
> 

I think we have had one of those tangles again...  I meant that my IDE
(Eclipse) can make files more readable than you get with 80 columns
Courier 12, not that it can do everything you would want from a full
documentation system.  (To be a smart-ass - yes, I can put hyperlinks in
the code without changing the code itself.  I just type them in
comments.  But they are not clickable links, which I suspect is what you
meant.)

I am sure there are editors that /can/ show pictures and hyperlinks -
especially those geared more towards web development than C coding.  And
Eclipse can show all sorts of graphics stuff too - I only use a tiny
fraction of its features.

>> Of course, those big files and long lines are a pain when you need to
>> fix things via a mobile telephone, JuiceSSH, and nano editor - but
>> fortunately I haven't had to do that very often.
> 
> That's what cars (and trains and boats and planes) are for.
> 

0
David
12/16/2016 1:57:46 PM
On 16/12/16 13:40, BartC wrote:
> On 16/12/2016 12:41, Richard Heathfield wrote:
>
>> CRLF is often specified in RFCs, and it can surely only be a sop to
>> Microsoft. I can think of no other reason for it, anyway.
>
> I think CRLF was around long before Microsoft. Something to do with
> driving a teletype I understand.

Yes, that's right. I didn't say they invented it. I said they should 
stop using it. Either that, or they should invest in a buggy whip factory.

>> If MS would
>> only do the decent thing and drop this ridiculous convention, a whole
>> universe of pain could be avoided.
>
> If the main line-endings are either CRLF or LF (and you disallow
> individual LF or CR within text that were once used for cursor control)
> then it isn't much problem to deal with either convention.

Of course not, if you know about it and know how to deal with it. It's 
still a nuisance, though.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 2:03:40 PM
BartC <bc@freeuk.com> writes:

> On 16/12/2016 12:02, Ben Bacarisse wrote:
>
>> Well, should is a matter of opinion.  Here is at least one opinion about
>> what a CSV file is:
>>
>>    1.  Each record is located on a separate line, delimited by a line
>>        break (CRLF).  For example:
>>
>>        aaa,bbb,ccc CRLF
>>        zzz,yyy,xxx CRLF
>>
>> This is from RFC 4180 (don't worry, it does explain that fields can span
>> lines later on).
>
> Big mistake. Apparently leading and trailing spaces are also
> significant.

I'm not sure what the mistake is.  Can you explain?

> This means that a big chunk of a file spanning multiple lines
> containing English prose, or function of a programming language,
> counts as just one CSV field so long as there aren't any commas in
> there.

I know this is meant to explain the "big mistake" but I still don't see
it.

I don't think commas are really an issue since in every CSV format I've
seen fields with multiple lines have to be quoted.  But since there can
be two kind of "line" in a CSV file, it's possible that commas might
come into it, but I'm not sure that's what you mean.

For example, in a file with CRLF line endings, I can imagine a really
nasty CSV file with multi-line text whose line separator is LF:

  f1,f2,thisLFisLFfourLFlinesCRLF

(I don't know it any software either write this or reads it, but I can
imagine it being possible in some badly specified formats.)

> (And apparently a quoted field can still contain unadorned newline
> characters. So even fgets can only retrieve part of a field. Such
> characters should be escaped. Newline should be a higher level
> separator than comma.)

Yes.  You should not use fgets to read binary files.  You /can/, of
course, but I can't see the point other than give yourself a lot of
pain.

>> CSV files have no universally agreed separator (despite the name), no
>> universally agreed character encoding, no universally agreed line ending
>> and no universally agreed quoting scheme.  The only universally agreed
>> thing about them is that they are a pain in neck.
>
> If your program generates CSV then you can do it properly so that you
> can read your own files and make it more likely for them to be
> acceptable to other programs. But you might still have the headache of
> dealing with embedded special characters within a field.
>
>> The sound advice about file formats: "be generous in what your code
>> accepts and strict in what it generates"
>
> Oh, that's what I just repeated...

Yes, though I went on to say that it's almost impossible to do.  Your
"you can do it properly" makes it sound easy.

-- 
Ben.
0
Ben
12/16/2016 3:04:13 PM
On 16/12/2016 15:04, Ben Bacarisse wrote:

>> Big mistake. Apparently leading and trailing spaces are also
>> significant.
>
> I'm not sure what the mistake is.  Can you explain?
>
>> This means that a big chunk of a file spanning multiple lines
>> containing English prose, or function of a programming language,
>> counts as just one CSV field so long as there aren't any commas in
>> there.
>
> I know this is meant to explain the "big mistake" but I still don't see
> it.

The mistake is having 'comma' be more important than 'newline'. Which 
can also make this ambiguous:

   abc,def
   ghi,jkl

Under strict CSV interpretation, that could be one record with three fields:

   abc,def\nghi,jkl

If you know the number of fields, that might help disambiguate. But CSV 
files usually don't state that.

> I don't think commas are really an issue since in every CSV format I've
> seen fields with multiple lines have to be quoted.

That doesn't really help. If the above example really was 3 fields, then 
quoting it would produce this:

   abc,"def
   ghi",jkl

There is still a 'hard' newline character in the middle of the field. 
Generic line-oriented text-handling, including fgets, will read that as 
two separate lines.

Fiddly CSV-aware code, or that handles this stuff as s low-level 
character-at-a-time stream, might be able to handle that. But you want 
to keep it simple. For example, a routine to copy a CSV file and tack an 
extra field on the end of each line would be trivial if it wasn't for 
that fact that some newlines could be embedded elements of some fields.

>> (And apparently a quoted field can still contain unadorned newline
>> characters. So even fgets can only retrieve part of a field. Such
>> characters should be escaped. Newline should be a higher level
>> separator than comma.)
>
> Yes.  You should not use fgets to read binary files.

CSV is a text format, not binary. That's the point of CSV. From what I 
can find out, there isn't an escape scheme for embedded control 
characters, as would be needed in string literals in source code for 
example.

>> If your program generates CSV then you can do it properly so that you
>> can read your own files and make it more likely for them to be
>> acceptable to other programs.

> Yes, though I went on to say that it's almost impossible to do.  Your
> "you can do it properly" makes it sound easy.

Easy to write and re-read your own files. Well, it's a little more 
fiddly if you allow that some character fields can contain non-printable 
characters (or separators or whitespace) as you need to invent or import 
a string escape scheme.

-- 
Bartc

0
BartC
12/16/2016 3:26:44 PM
BartC <bc@freeuk.com> writes:

> On 16/12/2016 15:04, Ben Bacarisse wrote:
>
>>> Big mistake. Apparently leading and trailing spaces are also
>>> significant.
>>
>> I'm not sure what the mistake is.  Can you explain?
>>
>>> This means that a big chunk of a file spanning multiple lines
>>> containing English prose, or function of a programming language,
>>> counts as just one CSV field so long as there aren't any commas in
>>> there.
>>
>> I know this is meant to explain the "big mistake" but I still don't see
>> it.
>
> The mistake is having 'comma' be more important than 'newline'. Which
> can also make this ambiguous:
>
>   abc,def
>   ghi,jkl
>
> Under strict CSV interpretation, that could be one record with three fields:
>
>   abc,def\nghi,jkl

I don't know why you think that.  As far as I know, unquoted line ending
are record separators.

Of course, any CSV specification that had this ambiguity would indeed be
a big mistake but I don't know how you got there from what I quoted.

> If you know the number of fields, that might help disambiguate. But
> CSV files usually don't state that.
>
>> I don't think commas are really an issue since in every CSV format I've
>> seen fields with multiple lines have to be quoted.
>
> That doesn't really help. If the above example really was 3 fields,
> then quoting it would produce this:
>
>   abc,"def
>   ghi",jkl
>
> There is still a 'hard' newline character in the middle of the
> field. Generic line-oriented text-handling, including fgets, will read
> that as two separate lines.

If you read a CSV file with fgets you need to handle (or, I suppose,
ignore) the complications that leads to.  Bit since the most general
form of CSV file should be read as a binary stream (in the sense that
you can't assume your system's line ending will be used) reading it with
fgets is a really bad idea (though even that could be made to work).

> Fiddly CSV-aware code, or that handles this stuff as s low-level
> character-at-a-time stream, might be able to handle that. But you want
> to keep it simple. For example, a routine to copy a CSV file and tack
> an extra field on the end of each line would be trivial if it wasn't
> for that fact that some newlines could be embedded elements of some
> fields.

If you know the line ending and the quoting method used, that's not
really hard (maybe the other side of trival, but not hard).  If you
don't know the quoting method or the line ending, it's impossible.  CSV
is a mess.

>>> (And apparently a quoted field can still contain unadorned newline
>>> characters. So even fgets can only retrieve part of a field. Such
>>> characters should be escaped. Newline should be a higher level
>>> separator than comma.)
>>
>> Yes.  You should not use fgets to read binary files.
>
> CSV is a text format, not binary.

I'm using C's terminology here.  It's a binary file in that you don't
want to apply your host system's line-end translation to the IO.  You
/can/ treat a CSV file as a text file, but I've found that to be a bad
idea in general.

> That's the point of CSV. From what I
> can find out, there isn't an escape scheme for embedded control
> characters, as would be needed in string literals in source code for
> example.

Yes, in everyday tech. language it's a text format, but you should read
CSV files (for best results) as binary streams in C.

>>> If your program generates CSV then you can do it properly so that you
>>> can read your own files and make it more likely for them to be
>>> acceptable to other programs.
>
>> Yes, though I went on to say that it's almost impossible to do.  Your
>> "you can do it properly" makes it sound easy.
>
> Easy to write and re-read your own files.

Yes, that's certainly easy.

> Well, it's a little more
> fiddly if you allow that some character fields can contain
> non-printable characters (or separators or whitespace) as you need to
> invent or import a string escape scheme.

And it's way more than fiddly if you need to be able to read, correctly,
every CSV file out there as well write CSV files that can be read by
every CSV reader out there.  That's the "almost impossible" I was
referring to.

-- 
Ben.
0
Ben
12/16/2016 5:01:59 PM
Richard Heathfield <rjh@cpax.org.uk> writes:
> On 16/12/16 09:09, Alla _ wrote:
>> On Friday, December 16, 2016 at 11:54:40 AM UTC+3, Richard Heathfield wrote:
>>> On 16/12/16 08:39, Alla _ wrote:
>>>> On Thursday, December 15, 2016 at 5:12:52 PM UTC+3, Richard Heathfield wrote:
[...]
>>>>> char array_names[50][50] = {0};
>>>>> int  array_values[50][50] = {0};
>>>>>
>>>>> which will take advantage of the default static initialisation rule to
>>>>> set everything to 0.
>>>>>
>>>> Is it a typo, or did you intend to use only {}, instead of {{}}?
>>>
>>> It isn't a typo. {0} always works, for any kind of aggregate type. But
>>> if you would prefer to use {{0}}, you {{{{{{{can}}}}}}}.
>> My compiler complained about {0} )
>
> Yes, so does mine. But it's wrong to do so. {0} is a C idiom, and 
> perfectly correct.
[...]

And the gcc maintainers have realized this.  gcc 4.9.4 prints the
warning (with "-Wall").  gcc 5.4.1 does not.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
12/16/2016 5:46:44 PM
Alla _ <modelling.data@gmail.com> writes:
[...]
> Apple has "quietly" started using clang; I have only recently found out;
> it seems I received that switch after installing some recent upgrades 
> Apple has sent.
> But I still use gcc -flags to compile programs. 

In fact Apple configures the system so that the "gcc" command actually
invokes clang, which can be a bit confusing.

clang is designed to be compatible with gcc, and keeping the same name
avoids breaking scripts that assume "gcc" is *the* C compiler.
(Historically, using the name "cc" would have made more sense, but too
many scripts depend on the name "gcc".)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
12/16/2016 5:48:37 PM
Alla _ <modelling.data@gmail.com> writes:
> On Friday, December 16, 2016 at 1:11:58 PM UTC+3, Richard Heathfield wrote:
[155 lines deleted]
>> Hex dump, please.
>> 
> Yes, Sir.

Alla, you *really* didn't need to post that.

You posted a followup quoting Richard's entire message, which included
multiple levels of quotations from previous messages, most of them
irrelevant.  And the only thing you added was a statement that you're
*going to* provide a hex dump.

You should have simply posted a followup containingt the hex dump --
preferably with most of the previous irrelevant context snipped (but of
course keeping attribution lines for any quoted text you keep).

My newsreader, like many others, does not collapse quoted text, at least
not by default.  I had to scroll through the entire article to find the
one line you added in your response.  When you fail to snip irrelevant
context, you're forcing most of the rest of us to scroll past it.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
12/16/2016 5:54:14 PM
In article <lnr357kbnu.fsf@kst-u.example.com>,
Keith Thompson  <kst-u@mib.org> wrote:
>Alla _ <modelling.data@gmail.com> writes:
>[...]
>> Apple has "quietly" started using clang; I have only recently found out;
>> it seems I received that switch after installing some recent upgrades 
>> Apple has sent.
>> But I still use gcc -flags to compile programs. 
>
>In fact Apple configures the system so that the "gcc" command actually
>invokes clang, which can be a bit confusing.
>
>clang is designed to be compatible with gcc, and keeping the same name
>avoids breaking scripts that assume "gcc" is *the* C compiler.
>(Historically, using the name "cc" would have made more sense, but too
>many scripts depend on the name "gcc".)

Note that (on my system, at least), 'cc' is a link to 'clang', but 'gcc' is
a standalong executable (i.e., a front end that does some massaging of the
command line before invoking 'clang').

-- 
Alice was something of a handful to her father, Theodore Roosevelt.  He was once
asked by a visiting dignitary about parenting his spitfire of a daughter and he
replied, "I can be President of the United States, or I can control Alice. I
cannot possibly do both."
0
gazelle
12/16/2016 5:55:09 PM
On 16/12/16 17:54, Keith Thompson wrote:
> Alla _ <modelling.data@gmail.com> writes:
>> On Friday, December 16, 2016 at 1:11:58 PM UTC+3, Richard Heathfield wrote:
> [155 lines deleted]
>>> Hex dump, please.
>>>
>> Yes, Sir.
>
> Alla, you *really* didn't need to post that.
>
> You posted a followup quoting Richard's entire message, which included
> multiple levels of quotations from previous messages, most of them
> irrelevant.  And the only thing you added was a statement that you're
> *going to* provide a hex dump.

Partly my fault. I should have snipped more in my own article. Apologies.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
12/16/2016 5:56:24 PM
In article <lnmvfvkbeh.fsf@kst-u.example.com>,
Keith Thompson  <kst-u@mib.org> wrote:
....
>You should have simply posted a followup containingt the hex dump --
>preferably with most of the previous irrelevant context snipped (but of
>course keeping attribution lines for any quoted text you keep).

It is hard enough on poor Keith to be the topicality police, now he also
has been saddled with the responsibility of being the posting etiquette
police.

>My newsreader, like many others, does not collapse quoted text, at least
>not by default.  I had to scroll through the entire article to find the
>one line you added in your response.  When you fail to snip irrelevant
>context, you're forcing most of the rest of us to scroll past it.

Somebody should teach Keith about the "tab" key (at least that's the key in
trn; I'm sure other Unix 'text-y' newsreaders have something similar).

Blaming the inadequacies of one's newsreader (or one's knowledge of how to
use it) on some poor hapless newbie, is really bad form.

-- 
The randomly chosen signature file that would have appeared here is more than 4
lines long.  As such, it violates one or more Usenet RFCs.  In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
	http://user.xmission.com/~gazelle/Sigs/ForFoxViewers
0
gazelle
12/16/2016 6:02:53 PM
On Fri, 16 Dec 2016 02:43:26 -0800 (PST), Alla _
<modelling.data@gmail.com> wrote:

>On Friday, December 16, 2016 at 1:01:46 PM UTC+3, Richard Heathfield wrote:

<snip>

>> Barry Schwarz has suggested that you may have a trailing space in your 
>> data, which wouldn't show up on Usenet, and he says (presumably after 
>> trying it out) that this will produce the symptom you are describing.
>I thought so too. I have tried manually to delete rows and columns around
>the data I need to preserve. I am out of tools, and definitely don't have 
>any knowledge to continue figuring this out. But it is a huge problem for 
>me as I can't proceed with working on the code because I can't even have
>the loop with a condition I need. I am thinking now how to solve the issue
>with the csv file. I will be back. 

You certainly do have the tools:
     Read a record with fgets.
     Test for the \n with strchr.
     If \n present, replace it with some character not in your text,
such as #, @, !, or _
     Print the string with printf using a format string similar to
"record = <%s>\n".  This will show you if there is any "invisible"
text, such as blanks, at either end of the record.

If you think there might be unprintable characters in the text, add
the following before the call to printf:
     Loop through characters
          Test the current character with isprint.
          If not printable, replace the character with another
unexpected character such as ? or ~.

You could also solve the problem of the extra "0" due to the
superfluous blank by testing the output of strtol.  Using the
parameter names in the standard, if the value of *endptr equals the
value of nptr, then strtol performed no conversion.  Therefore the
resulting 0 returned by strtol is not a value you are interested in.

-- 
Remove del for email
0
Barry
12/17/2016 1:09:22 AM
Then what term would you use to describe the fact that it took multiple requests to get you to do this?
0
jameskuyper
12/18/2016 11:32:37 AM
Keith Thompson <kst-u@mib.org> wrote:

> Alla _ <modelling.data@gmail.com> writes:
> [...]
> > Apple has "quietly" started using clang; I have only recently found out;
> > it seems I received that switch after installing some recent upgrades 
> > Apple has sent.
> > But I still use gcc -flags to compile programs. 
> 
> In fact Apple configures the system so that the "gcc" command actually
> invokes clang, which can be a bit confusing.
> 
> clang is designed to be compatible with gcc, and keeping the same name
> avoids breaking scripts that assume "gcc" is *the* C compiler.
> (Historically, using the name "cc" would have made more sense, but too
> many scripts depend on the name "gcc".)

Strictly speaking, that's the fault of those GNUhard scripts, not of
Apple.

Richard
0
raltbos
12/20/2016 11:53:02 AM
Reply: