Reading LAST line from text file without iterating through the file?

  • Follow


Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Robin

0
Reply rob 2/23/2011 3:59:36 PM

On Wed, 23 Feb 2011 15:59:36 +0000, Robin Wenger wrote:

> Is it possible to read the last text line from a text file WITHOUT
> reading the previous (n-1) lines?
> 
> Robin

You'd need to use RandomAccess, seek to the end of the file, work your 
way back looking for a linefeed/CR, and then slurp forward again into a 
buffer. While seeking backwards you can count characters and thus know 
exactly how big to make the StringBuilder for maximum efficiency.

0
Reply Ken 2/23/2011 4:07:10 PM


On 02/23/2011 07:59 AM, Robin Wenger wrote:
> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?
>
> Robin
>

You could use a RandomAccessFile and search backwards from the end for a 
linefeed.  Depending on the size of the line and the size of the file, 
it might not be more efficient than reading the whole file.

-- 

Knute Johnson
s/nospam/knute2011/
0
Reply nospam2627 (222) 2/23/2011 5:09:12 PM

On 02/23/2011 07:59 AM, Robin Wenger wrote:
> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?
>
> Robin
>

You could use a RandomAccessFile and search backwards from the end for a 
linefeed.  Depending on the size of the line and the size of the file, 
it might not be more efficient than reading the whole file.

-- 

Knute Johnson
s/nospam/knute2011/
0
Reply Knute 2/23/2011 5:11:21 PM

On 02/23/2011 10:59 AM, Robin Wenger wrote:
> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Yes, but it's tricky.  You need a random-access file and seek backwards to a 
newline.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/23/2011 5:21:01 PM

On 23/02/2011 18:21, Lew allegedly wrote:
> On 02/23/2011 10:59 AM, Robin Wenger wrote:
>> Is it possible to read the last text line from a text file WITHOUT
>> reading the previous (n-1) lines?
>
> Yes, but it's tricky. You need a random-access file and seek backwards
> to a newline.
>

$ echo RandomAccessFile | hivemind | cut
0
Reply Daniele 2/23/2011 6:49:56 PM

On Wed, 23 Feb 2011, Robin Wenger wrote:

> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

Yes.

tom

-- 
What? Yeah!
0
Reply Tom 2/23/2011 6:54:55 PM

On 23-02-11 19:49, Daniele Futtorovic wrote:
> On 23/02/2011 18:21, Lew allegedly wrote:
>> On 02/23/2011 10:59 AM, Robin Wenger wrote:
>>> Is it possible to read the last text line from a text file WITHOUT
>>> reading the previous (n-1) lines?
>>
>> Yes, but it's tricky. You need a random-access file and seek backwards
>> to a newline.
>>
> 
> $ echo RandomAccessFile | hivemind | cut

bash: hivemind: command not found

-- 
Luuk
0
Reply Luuk 2/23/2011 7:27:52 PM

On Feb 23, 10:59=A0am, r...@wenger.net (Robin Wenger) wrote:
> Is it possible to read the last text line from a text file WITHOUT readin=
g the previous (n-1) lines?
>
> Robin

I don't see a read last line.  It seems you have to know your end of
line character and check for it yourself.
http://download.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html
0
Reply Eric 2/23/2011 7:29:05 PM

On 23/02/2011 20:27, Luuk allegedly wrote:
> On 23-02-11 19:49, Daniele Futtorovic wrote:
>> On 23/02/2011 18:21, Lew allegedly wrote:
>>> On 02/23/2011 10:59 AM, Robin Wenger wrote:
>>>> Is it possible to read the last text line from a text file WITHOUT
>>>> reading the previous (n-1) lines?
>>>
>>> Yes, but it's tricky. You need a random-access file and seek backwards
>>> to a newline.
>>>
>>
>> $ echo RandomAccessFile | hivemind | cut
>
> bash: hivemind: command not found

cljp: hivemind: ooh yeah!

;)


0
Reply Daniele 2/23/2011 9:12:08 PM

rob@wenger.net (Robin Wenger) wrote in
news:4d652ee7$0$6877$9b4e6d93@newsspool2.arcor-online.net: 

> Is it possible to read the last text line from a text file WITHOUT
> reading the previous (n-1) lines? 
> 
> Robin
> 
> 
Yes, under certain circumstances.  For example, if you know "n" and know that 
all of the lines are of some fixed length (also known).  There are other 
situations as well.



0
Reply Ian 2/24/2011 12:46:40 AM

On 2/23/2011 10:59 AM, Robin Wenger wrote:
> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

     Others have mentioned using RandomAccess to work backward from the
end of the file until you find the penultimate line-ending.  This can
work, but it can also fail.  Consider a file with context-sensitive
encoding, for example, where the meaning of a byte depends on the values
of bytes that precede it.  If you read an isolated byte of value 91 from
such a file, without knowing whether it's a free-standing character or a
part of a multi-byte sequence or possibly preceded by a "shift-out," you
won't know what that byte value means.

     One strategy is to estimate a typical line length of N characters,
seek to 100*N (say) bytes before the end, and start reading from
there.  A nice feature of most multi-byte encoding schemes is that they
tend to self-synchronize: You may get misinterpreted garbage for a
while, but things are likely to get back on track eventually.  If you
want to get fancy you can apply reasonability tests to what you (think
you've) read, and restart at END-1000*N if things seem unreasonable.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 2/24/2011 2:19:22 AM

On 23-02-2011 10:59, Robin Wenger wrote:
> Is it possible to read the last text line from a text file WITHOUT reading the previous (n-1) lines?

In general no.

All the RandomAccessFile tricks are based on assumptions about lines
being separated by something - they do not work with record formats
that contains a line length instead of a delimiter.

If Unix/Linux/Windows/MacOS X is all you need to support then try:

     public static String readLastLineUnSup(String fnm) throws IOException {
         RandomAccessFile raf = new RandomAccessFile(fnm, "r");
         String res = "";
         long ix = raf.length() - 1;
         for(;;) {
             raf.seek(ix);
             int c = raf.read();
             if(c == '\r' || c == '\n') break;
             res = (char)c + res;
             ix--;
         }
         raf.close();
         return res;
     }

Arne
0
Reply ISO 2/24/2011 2:21:42 AM

On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:

> On 23-02-2011 10:59, Robin Wenger wrote:
>> Is it possible to read the last text line from a text file WITHOUT
>> reading the previous (n-1) lines?
> 
> In general no.
> 
> All the RandomAccessFile tricks are based on assumptions about lines
> being separated by something - they do not work with record formats that
> contains a line length instead of a delimiter.

"Record formats" are not relevant here, nor was someone else's concern 
about compressed formats -- the OP clearly said "a text file", by which 
is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.
0
Reply Ken 2/24/2011 3:23:56 AM

On 2/23/2011 10:23 PM, Ken Wesson wrote:
> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>
>> On 23-02-2011 10:59, Robin Wenger wrote:
>>> Is it possible to read the last text line from a text file WITHOUT
>>> reading the previous (n-1) lines?
>>
>> In general no.
>>
>> All the RandomAccessFile tricks are based on assumptions about lines
>> being separated by something - they do not work with record formats that
>> contains a line length instead of a delimiter.
>
> "Record formats" are not relevant here, nor was someone else's concern
> about compressed formats -- the OP clearly said "a text file", by which
> is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.

     OpenVMS supports many record formats, but the "native" one for
text files is VAR: A two-byte binary count, the payload characters,
and if necessary a padding byte to make the total byte count even.

     The "next most native" format is VFC, which is sort of like VAR
except that the first N (fixed) bytes of the payload are metadata
(line numbers, carriage control, ...) instead of line content.

     Then come the easy formats: STREAM, STREAM-LF, STREAM-CR, and
FIXED.  Oh, yes, and UNDEF; let's not forget UNDEF (although, to be
honest, UNDEF is more commonly used for "binary" than "text" files).

     (Strangest text file format I ever ran into used line-*bracketing*
characters: a CR before and an LF after.  The rationale for this format
caused me to shake my head and sigh: It was said that as you printed
such a file on a typewriter-like console, possibly with long pauses
between lines for progress messages and the like, then the LF at end-
of-line would move the paper so the print head wouldn't interfere with
reading it.  As I said, shake the head.)

     In short, all I'm asking is that you delete the word "generally"
because your experience is insufficiently general.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 2/24/2011 5:42:41 AM

On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:

> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>
>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>> Is it possible to read the last text line from a text file WITHOUT
>>>> reading the previous (n-1) lines?
>>>
>>> In general no.
>>>
>>> All the RandomAccessFile tricks are based on assumptions about lines
>>> being separated by something - they do not work with record formats
>>> that contains a line length instead of a delimiter.
>>
>> "Record formats" are not relevant here, nor was someone else's concern
>> about compressed formats -- the OP clearly said "a text file", by which
>> is generally understood flat ASCII with CR, LF, or CRLF as line
>> delimiter.
> 
>      OpenVMS supports many record formats, but the "native" one for
> text files is VAR: A two-byte binary count, the payload characters, and
> if necessary a padding byte to make the total byte count even.
> 
>      The "next most native" format is VFC, which is sort of like VAR
> except that the first N (fixed) bytes of the payload are metadata (line
> numbers, carriage control, ...) instead of line content.
> 
>      Then come the easy formats: STREAM, STREAM-LF, STREAM-CR, and
> FIXED.  Oh, yes, and UNDEF; let's not forget UNDEF (although, to be
> honest, UNDEF is more commonly used for "binary" than "text" files).
> 
>      (Strangest text file format I ever ran into used line-*bracketing*
> characters: a CR before and an LF after.  The rationale for this format
> caused me to shake my head and sigh: It was said that as you printed
> such a file on a typewriter-like console, possibly with long pauses
> between lines for progress messages and the like, then the LF at end-
> of-line would move the paper so the print head wouldn't interfere with
> reading it.  As I said, shake the head.)
> 
>      In short, all I'm asking is that you delete the word "generally"
> because your experience is insufficiently general.

Obsolete systems do not interest me. Since those days, the world has 
standardized on ASCII flat files for text files. I just wish it had 
standardized on one canonical end-of-line character too!
0
Reply Ken 2/24/2011 1:06:41 PM

On 2/24/11 9:06 PM, Ken Wesson wrote:
> [...]
> Obsolete systems do not interest me.

then…

> Since those days, the world has standardized on ASCII flat files for text files.

LOL!
0
Reply Peter 2/24/2011 1:23:34 PM

On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:

> On 2/24/11 9:06 PM, Ken Wesson wrote:
>> [...]
>> Obsolete systems do not interest me.
> 
> then…
> 
>> Since those days, the world has standardized on ASCII flat files for
>> text files.
> 
> LOL!

Windows text files are flat ASCII files (with CRLF line ends). Mac text 
files are flat ASCII files (with CR line ends). Unix text files are flat 
ASCII files (with LF line ends). And that exhausts 99.99% of the 
operating system market share right there, if not more, not counting 
smartphones which are all too modern to be using weird legacy formats for 
text files.

I can't remember the last time I had to interoperate with any machine 
that had anything other than standard ASCII as the native format for text 
files. It's gotta be decades.
0
Reply Ken 2/24/2011 2:00:47 PM

2011-02-24 15:00, Ken Wesson skrev:
> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
> 
>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>> [...]
>>> Obsolete systems do not interest me.
>>
>> then…
>>
>>> Since those days, the world has standardized on ASCII flat files for
>>> text files.
>>
>> LOL!
> 
> Windows text files are flat ASCII files (with CRLF line ends). Mac text 
> files are flat ASCII files (with CR line ends). Unix text files are flat 
> ASCII files (with LF line ends). And that exhausts 99.99% of the 
> operating system market share right there, if not more, not counting 
> smartphones which are all too modern to be using weird legacy formats for 
> text files.
> 
> I can't remember the last time I had to interoperate with any machine 
> that had anything other than standard ASCII as the native format for text 
> files. It's gotta be decades.

ASCII character values are limited to the 0-127 range. That's an
outdated "standard".

0
Reply Lars 2/24/2011 2:14:44 PM

Ken Wesson wrote:
> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>> "Record formats" are not relevant here, nor was someone else's concern
>>> about compressed formats -- the OP clearly said "a text file", by which
>>> is generally understood flat ASCII with CR, LF, or CRLF as line
>>> delimiter.

Ah, the warm blanket of provincialism.

>>      OpenVMS supports many record formats, but the "native" one for
>> text files is VAR: A two-byte binary count, the payload characters, and
>> if necessary a padding byte to make the total byte count even.
>> ...
>>      In short, all I'm asking is that you delete the word "generally"
>> because your experience is insufficiently general.

On the IBM i machines (formerly i Series, formerly System i, formerly
AS/400, successor to the System/3x), using the default filesystem, a
text "file" is actually a series of records in a "member" of a
"physical file". The i operating system hides implementation details,
but access to the contents of the "file" is record-oriented, not
byte-oriented.

In the alternate Hierarchical File System supported by the i machines
for POSIX compatibility, text files are byte-oriented, but usually
EBCDIC, not ASCII.

On IBM and other EBCDIC mainframe systems, there are a variety of
formats for text files, but flat byte-oriented ASCII isn't one of
them, unless you're running Linux.

> Obsolete systems do not interest me.

Apparently, neither do prominent ones that you don't happen to know
about. What a surprise.

> Since those days, the world has
> standardized on ASCII flat files for text files.

Only for sufficiently small values of "the world".

-- 
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
0
Reply Michael 2/24/2011 2:18:06 PM

Ken Wesson writes:

> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
> 
> > On 2/24/11 9:06 PM, Ken Wesson wrote:
> >> [...]
> >> Obsolete systems do not interest me.
> > 
> > then…
> > 
> >> Since those days, the world has standardized on ASCII flat files
> >> for text files.
> > 
> > LOL!
> 
> Windows text files are flat ASCII files (with CRLF line ends). Mac
> text files are flat ASCII files (with CR line ends). Unix text files
> are flat ASCII files (with LF line ends). And that exhausts 99.99%
> of the operating system market share right there, if not more, not
> counting smartphones which are all too modern to be using weird
> legacy formats for text files.
> 
> I can't remember the last time I had to interoperate with any
> machine that had anything other than standard ASCII as the native
> format for text files. It's gotta be decades.

I remember when we used a seven-bit character code to write my native
language. We could toggle the way we viewed the character codes where
we had put those characters that were not in ASCII. It was either
brackets and braces or those letters, but never both.

V{nkyr{-{{kk|si{. It's not a happy memory.
0
Reply Jussi 2/24/2011 2:19:09 PM

On 2/24/11 10:14 PM, Lars Enderin wrote:
> ASCII character values are limited to the 0-127 range. That's an
> outdated "standard".

Used by "obsolete systems".  A key point in my amusement.  :)
0
Reply Peter 2/24/2011 2:26:47 PM

2011-02-24 15:19, Jussi Piitulainen skrev:
> Ken Wesson writes:
> 
>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>
>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>> [...]
>>>> Obsolete systems do not interest me.
>>>
>>> then…
>>>
>>>> Since those days, the world has standardized on ASCII flat files
>>>> for text files.
>>>
>>> LOL!
>>
>> Windows text files are flat ASCII files (with CRLF line ends). Mac
>> text files are flat ASCII files (with CR line ends). Unix text files
>> are flat ASCII files (with LF line ends). And that exhausts 99.99%
>> of the operating system market share right there, if not more, not
>> counting smartphones which are all too modern to be using weird
>> legacy formats for text files.
>>
>> I can't remember the last time I had to interoperate with any
>> machine that had anything other than standard ASCII as the native
>> format for text files. It's gotta be decades.
> 
> I remember when we used a seven-bit character code to write my native
> language. We could toggle the way we viewed the character codes where
> we had put those characters that were not in ASCII. It was either
> brackets and braces or those letters, but never both.
> 
> V{nkyr{-{{kk|si{. It's not a happy memory.

I have the same experience. C code wasn't very readable with "Swedish
ASCII". At least Finnish doesn't use "å", except when quoting Swedish words.

0
Reply Lars 2/24/2011 2:40:37 PM

2011-02-24 15:26, Peter Duniho skrev:
> On 2/24/11 10:14 PM, Lars Enderin wrote:
>> ASCII character values are limited to the 0-127 range. That's an
>> outdated "standard".
> 
> Used by "obsolete systems".  A key point in my amusement.  :)

I thought so, but Ken seemed to need an explanation.

0
Reply Lars 2/24/2011 2:42:52 PM

On 24/02/2011 14:00, Ken Wesson wrote:
> Windows text files are flat ASCII files (with CRLF line ends).

Actually I find that, nowadays, lots of text files on Windows are 
so-called 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 
with BOM).

Even on my ancient XP boxes, Notepad offers only ANSI, Unicode, Unicode 
big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns out to be 
CP-1252),  Text-Document DOS format (turns out to be CP-850) and 
Unicode. No ASCII.


-- 
RGB
0
Reply RedGrittyBrick 2/24/2011 2:49:22 PM

On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:

> Ken Wesson wrote:
>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>> "Record formats" are not relevant here, nor was someone else's
>>>> concern about compressed formats -- the OP clearly said "a text
>>>> file", by which is generally understood flat ASCII with CR, LF, or
>>>> CRLF as line delimiter.
> 
> Ah, the warm blanket of provincialism.

Who asked you for your opinions of others here?

>>>      OpenVMS supports many record formats, but the "native" one for
>>> text files is VAR: A two-byte binary count, the payload characters,
>>> and if necessary a padding byte to make the total byte count even. ...
>>>      In short, all I'm asking is that you delete the word "generally"
>>> because your experience is insufficiently general.
> 
> On the IBM i machines (formerly i Series, formerly System i, formerly
> AS/400, successor to the System/3x), blah blah blah

You're one to talk about provincialism. Who the hell uses these ancient 
museum pieces any more?

>> Obsolete systems do not interest me.
> 
> Apparently, neither do prominent ones that you don't happen to know
> about.

There is nothing at all prominent about those IBM dinosaurs. They may 
have been prominent 30 years ago, but not now.

>> Since those days, the world has
>> standardized on ASCII flat files for text files.
> 
> Only for sufficiently small values of "the world".

Fine, then -- corporate America and home computers in America then. 
Perhaps you live in a place where they're 30 years behind us, but you're 
the unusual ones in that case.
0
Reply Ken 2/24/2011 6:42:19 PM

On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:

> On 24/02/2011 14:00, Ken Wesson wrote:
>> Windows text files are flat ASCII files (with CRLF line ends).
> 
> Actually I find that, nowadays, lots of text files on Windows are
> so-called 'ANSI' (mostly CP-1252)

Same difference. The files are plain text, with CRLF line ends.

> RTF etc.

Not text files. RTF is more akin to word processor document files than 
text files. Nobody would use RTF to encode source code or a shell script.

0
Reply Ken 2/24/2011 6:43:51 PM

On Thu, 24 Feb 2011 16:19:09 +0200, Jussi Piitulainen wrote:

> I remember when we used a seven-bit character code to write my native
> language. etc

That's why we now actually use that 8th bit for something useful, if need 
be.
0
Reply Ken 2/24/2011 6:44:34 PM

On Thu, 24 Feb 2011 15:14:44 +0100, Lars Enderin wrote:

> 2011-02-24 15:00, Ken Wesson skrev:
>> I can't remember the last time I had to interoperate with any machine
>> that had anything other than standard ASCII as the native format for
>> text files. It's gotta be decades.
> 
> ASCII character values are limited to the 0-127 range. That's an
> outdated "standard".

Well, these days we use the 8th bit for accented characters instead of 
just wasting it. Technically it's not your granddaddy's ASCII with that 
in use, but it's close enough for government work, and certainly close 
enough not to mess with using tests for CR/LF to detect line boundaries.
0
Reply Ken 2/24/2011 6:46:01 PM

On 2/24/11 10:42 PM, Lars Enderin wrote:
> 2011-02-24 15:26, Peter Duniho skrev:
>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>> ASCII character values are limited to the 0-127 range. That's an
>>> outdated "standard".
>>
>> Used by "obsolete systems".  A key point in my amusement.  :)
>
> I thought so, but Ken seemed to need an explanation.

Yes, and it was a good explanation.  Unfortunately, I don't think he 
understood the explanation, nor do I think he will understand further 
clarification.  I think it more likely that the harder anyone tries to 
explain to him these points, the more dug in his heels will be.

To do otherwise would necessarily require an admission that there's no 
single "text file" format, and that even if there were, ASCII or any of 
the single-byte derivatives thereof ain't it.  I don't see any way such 
an admission would ever be produced.

Pete
0
Reply Peter 2/24/2011 6:52:56 PM

On Fri, 25 Feb 2011 02:52:56 +0800, Peter Duniho wrote:

> On 2/24/11 10:42 PM, Lars Enderin wrote:
>> 2011-02-24 15:26, Peter Duniho skrev:
>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>> ASCII character values are limited to the 0-127 range. That's an
>>>> outdated "standard".
>>>
>>> Used by "obsolete systems".  A key point in my amusement.  :)
>>
>> I thought so, but Ken seemed to need an explanation.
> 
> Yes, and it was a good explanation.  Unfortunately, I don't think he
> understood the explanation, nor do I think he will understand further
> clarification.  I think it more likely that the harder anyone tries to
> explain to him these points, the more dug in his heels will be.

You know, that's what you can expect when you are unpleasant, nasty, and 
rude about things -- other people display a curious unwillingness to 
listen to anything you have to say. An old adage comes to mind -- 
something about honey and vinegar?

(It doesn't help when your "counterexamples" are obscure formats used on 
dinosaurian machines of yesteryear; the fact is that text files with CR/
LF line delimiters are standard on a set of operating systems that have 
the overwhelming majority of the market share for such these days.)
0
Reply Ken 2/24/2011 7:10:23 PM

On 24/02/2011 19:46, Ken Wesson allegedly wrote:
> it's not (...) ASCII (...).

Spot on.

0
Reply Daniele 2/24/2011 7:14:49 PM

On 24-02-11 19:46, Ken Wesson wrote:
> but it's close enough for government work,

hopefully you live in another country than i do....

-- 
Luuk
0
Reply Luuk 2/24/2011 7:22:56 PM

Ken Wesson <kwesson@gmail.com> writes:

> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>
>> Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>>> "Record formats" are not relevant here, nor was someone else's
>>>>> concern about compressed formats -- the OP clearly said "a text
>>>>> file", by which is generally understood flat ASCII with CR, LF, or
>>>>> CRLF as line delimiter.
>> 
>> Ah, the warm blanket of provincialism.
>
> Who asked you for your opinions of others here?
>
>>>>      OpenVMS supports many record formats, but the "native" one for
>>>> text files is VAR: A two-byte binary count, the payload characters,
>>>> and if necessary a padding byte to make the total byte count even. ...
>>>>      In short, all I'm asking is that you delete the word "generally"
>>>> because your experience is insufficiently general.
>> 
>> On the IBM i machines (formerly i Series, formerly System i, formerly
>> AS/400, successor to the System/3x), blah blah blah
>
> You're one to talk about provincialism. Who the hell uses these ancient 
> museum pieces any more?

Um, that would be me, or rather my employer's customers.

Indirectly, anyone who has an account with a bank or credit union is
likely using an EBCDIC-based machine.  There are some that don't, but
it's not the way to bet.

-- 
Jim Janney
0
Reply Jim 2/24/2011 8:12:36 PM

On Fri, 25 Feb 2011, Peter Duniho wrote:

> On 2/24/11 10:42 PM, Lars Enderin wrote:
>> 2011-02-24 15:26, Peter Duniho skrev:
>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>> ASCII character values are limited to the 0-127 range. That's an
>>>> outdated "standard".
>>> 
>>> Used by "obsolete systems".  A key point in my amusement.  :)
>> 
>> I thought so, but Ken seemed to need an explanation.
>
> Yes, and it was a good explanation.  Unfortunately, I don't think he 
> understood the explanation, nor do I think he will understand further 
> clarification.  I think it more likely that the harder anyone tries to 
> explain to him these points, the more dug in his heels will be.
>
> To do otherwise would necessarily require an admission that there's no single 
> "text file" format, and that even if there were, ASCII or any of the 
> single-byte derivatives thereof ain't it.  I don't see any way such an 
> admission would ever be produced.

There is a single text file format: lines of characters in some encoding, 
terminated by an end-of-line sequence which is distinguishable from any 
other characters.

It's merely the case that some current mainframes, and some obscure or 
historical systems, do not store text in text files!

tom

-- 
everything from live chats and the Web, to the COOLEST DISGUSTING
PORNOGRAPHY AND RADICAL MADNESS!!
0
Reply Tom 2/24/2011 8:49:32 PM

On Thu, 24 Feb 2011 19:42:19 +0100, Ken Wesson wrote:

> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
> 
>> Ken Wesson wrote:
>>> Obsolete systems do not interest me.
>> 
>> Apparently, neither do prominent ones that you don't happen to know
>> about.
> 
> There is nothing at all prominent about those IBM dinosaurs. They may
> have been prominent 30 years ago, but not now.
>
You know, you sound exactly like a character who surfaced in a Y2K 
newsgroup back in 1998/99. He refused to believe that any computers apart 
from PCs were in use at the time.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/24/2011 9:25:05 PM

Ken Wesson wrote:
> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
> 
>> Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>>> "Record formats" are not relevant here, nor was someone else's
>>>>> concern about compressed formats -- the OP clearly said "a text
>>>>> file", by which is generally understood flat ASCII with CR, LF, or
>>>>> CRLF as line delimiter.
>> Ah, the warm blanket of provincialism.
> 
> Who asked you for your opinions of others here?

No one. I offer them out of sheer generosity. No thanks are necessary.
In the twenty years I've been on Usenet, I've found offering my
opinions on the local idiots to be immensely useful. At least to me.

>> On the IBM i machines (formerly i Series, formerly System i, formerly
>> AS/400, successor to the System/3x), blah blah blah
> 
> You're one to talk about provincialism. Who the hell uses these ancient 
> museum pieces any more?

Thousands of organizations, which is why they still enjoy healthy sales.

>>> Obsolete systems do not interest me.
>> Apparently, neither do prominent ones that you don't happen to know
>> about.
> 
> There is nothing at all prominent about those IBM dinosaurs. They may 
> have been prominent 30 years ago, but not now.

Tell that to the many thousands of organizations that still use them.

And the majority of business transactions still runs on IBM mainframe
and midrange systems, and similar offerings from other companies.

IBM had just shy of $100B in sales last year. A good chunk of that was
from mainframes: mainframe sales were up 68% from 2009, to the best
level in six years. MIPS capacity (mainframe processing capacity owned
by customers) rose 58%, and IBM acquired a couple dozen new mainframe
customers - businesses that bought their first mainframes.[1]

As usual, you don't know what the hell you're talking about, and
clearly can't be bothered to do even a moment of research before
posting something else that demonstrates your ignorance. Not that
you'll learn anything from this exchange, either, I suppose.


[1] http://www.theregister.co.uk/2011/01/18/ibm_q4_2010_numbers/

-- 
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
0
Reply Michael 2/24/2011 10:11:02 PM

On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
> On 24/02/2011 14:00, Ken Wesson wrote:
>> Windows text files are flat ASCII files (with CRLF line ends).
>
> Actually I find that, nowadays, lots of text files on Windows are so-called
> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with BOM).
>
> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode, Unicode
> big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns out to be
> CP-1252), Text-Document DOS format (turns out to be CP-850) and Unicode. No
> ASCII.

Windows hasn't used ASCII in decades.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/25/2011 12:12:44 AM

On 23-02-2011 22:23, Ken Wesson wrote:
> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>> On 23-02-2011 10:59, Robin Wenger wrote:
>>> Is it possible to read the last text line from a text file WITHOUT
>>> reading the previous (n-1) lines?
>>
>> In general no.
>>
>> All the RandomAccessFile tricks are based on assumptions about lines
>> being separated by something - they do not work with record formats that
>> contains a line length instead of a delimiter.
>
> "Record formats" are not relevant here,

They are - because the record format determines whether RandomAccessFile
has a chance of working or not.

>                                      nor was someone else's concern
> about compressed formats -- the OP clearly said "a text file", by which
> is generally understood flat ASCII with CR, LF, or CRLF as line delimiter.

That is probably true among non IT pros.

But this group is for IT pros.

They know that there are other character sets and other
record formats.

Arne

0
Reply UTF 2/25/2011 1:39:31 AM

On 24-02-2011 00:42, Eric Sosman wrote:
> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>> Is it possible to read the last text line from a text file WITHOUT
>>>> reading the previous (n-1) lines?
>>>
>>> In general no.
>>>
>>> All the RandomAccessFile tricks are based on assumptions about lines
>>> being separated by something - they do not work with record formats that
>>> contains a line length instead of a delimiter.
>>
>> "Record formats" are not relevant here, nor was someone else's concern
>> about compressed formats -- the OP clearly said "a text file", by which
>> is generally understood flat ASCII with CR, LF, or CRLF as line
>> delimiter.
>
> OpenVMS supports many record formats, but the "native" one for
> text files is VAR: A two-byte binary count, the payload characters,
> and if necessary a padding byte to make the total byte count even.

Yep.

NOS/VE (it may not be relevant here because I don't think there
exists a Java for NOS/VE) used 6 byte length + data + 6 byte length.

The trailing 6 byte length made it possible to securely read the
file backwards which the VMS format does not.

Arne
0
Reply UTF 2/25/2011 1:41:50 AM

On 24-02-2011 08:06, Ken Wesson wrote:
> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>
>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>>
>>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>> reading the previous (n-1) lines?
>>>>
>>>> In general no.
>>>>
>>>> All the RandomAccessFile tricks are based on assumptions about lines
>>>> being separated by something - they do not work with record formats
>>>> that contains a line length instead of a delimiter.
>>>
>>> "Record formats" are not relevant here, nor was someone else's concern
>>> about compressed formats -- the OP clearly said "a text file", by which
>>> is generally understood flat ASCII with CR, LF, or CRLF as line
>>> delimiter.
>>
>>       OpenVMS supports many record formats, but the "native" one for
>> text files is VAR: A two-byte binary count, the payload characters, and
>> if necessary a padding byte to make the total byte count even.
>>
>>       The "next most native" format is VFC, which is sort of like VAR
>> except that the first N (fixed) bytes of the payload are metadata (line
>> numbers, carriage control, ...) instead of line content.
>>
>>       Then come the easy formats: STREAM, STREAM-LF, STREAM-CR, and
>> FIXED.  Oh, yes, and UNDEF; let's not forget UNDEF (although, to be
>> honest, UNDEF is more commonly used for "binary" than "text" files).
>>
>>       (Strangest text file format I ever ran into used line-*bracketing*
>> characters: a CR before and an LF after.  The rationale for this format
>> caused me to shake my head and sigh: It was said that as you printed
>> such a file on a typewriter-like console, possibly with long pauses
>> between lines for progress messages and the like, then the LF at end-
>> of-line would move the paper so the print head wouldn't interfere with
>> reading it.  As I said, shake the head.)
>>
>>       In short, all I'm asking is that you delete the word "generally"
>> because your experience is insufficiently general.
>
> Obsolete systems do not interest me.

Whether a solution works in general or not depends on whether
it is guaranteed to work on all platforms or not.

The RandomAccessFile and search for CR and LF does not.

Whether it works on platforms that interest you are completely
irrelevant.

>                                Since those days, the world has
> standardized on ASCII flat files for text files.

Not really.

Windows uses CP-1252, UTF-8 and UTF-16
Unix/Linux/VMS uses ISO-8859-1 and UTF-8
IBM mainframe uses EBCDIC

There are really very few systems today that uses just ASCII.

Arne
0
Reply UTF 2/25/2011 1:45:36 AM

On 24-02-2011 09:00, Ken Wesson wrote:
> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>
>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>> [...]
>>> Obsolete systems do not interest me.
>>
>> then…
>>
>>> Since those days, the world has standardized on ASCII flat files for
>>> text files.
>>
>> LOL!
>
> Windows text files are flat ASCII files (with CRLF line ends).

No.

They are CP-1252, UTF-8 or UTF-16.

>                                                             Mac text
> files are flat ASCII files (with CR line ends). Unix text files are flat
> ASCII files (with LF line ends).

No.

They are ISO-8859-1 or UTF-8.

>                                 And that exhausts 99.99% of the
> operating system market share right there, if not more,

No.

z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.

> I can't remember the last time I had to interoperate with any machine
> that had anything other than standard ASCII as the native format for text
> files. It's gotta be decades.

Possible that you only work with 20+ year old Unix and OpenVMS
systems with 7 bit VT100 access.

But that is not very common.

Arne

0
Reply UTF 2/25/2011 1:48:18 AM

On 24-02-2011 09:26, Peter Duniho wrote:
> On 2/24/11 10:14 PM, Lars Enderin wrote:
>> ASCII character values are limited to the 0-127 range. That's an
>> outdated "standard".
>
> Used by "obsolete systems". A key point in my amusement. :)

I am pretty sure that Ken completely missed your joke.

Arne

0
Reply UTF 2/25/2011 1:48:50 AM

On 24-02-2011 15:49, Tom Anderson wrote:
> On Fri, 25 Feb 2011, Peter Duniho wrote:
>
>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>> outdated "standard".
>>>>
>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>
>>> I thought so, but Ken seemed to need an explanation.
>>
>> Yes, and it was a good explanation. Unfortunately, I don't think he
>> understood the explanation, nor do I think he will understand further
>> clarification. I think it more likely that the harder anyone tries to
>> explain to him these points, the more dug in his heels will be.
>>
>> To do otherwise would necessarily require an admission that there's no
>> single "text file" format, and that even if there were, ASCII or any
>> of the single-byte derivatives thereof ain't it. I don't see any way
>> such an admission would ever be produced.
>
> There is a single text file format: lines of characters in some
> encoding, terminated by an end-of-line sequence which is distinguishable
> from any other characters.
>
> It's merely the case that some current mainframes, and some obscure or
> historical systems, do not store text in text files!

No.

There are also count prefix (and sometimes suffix) formats.

They have the advantage of begin able to actually have
all possible values in lines.

And the disadvantage of various hacks assuming all records
use delimiters does not work.

Arne
0
Reply arne6 (9487) 2/25/2011 1:50:55 AM

On 24-02-2011 13:46, Ken Wesson wrote:
> On Thu, 24 Feb 2011 15:14:44 +0100, Lars Enderin wrote:
>
>> 2011-02-24 15:00, Ken Wesson skrev:
>>> I can't remember the last time I had to interoperate with any machine
>>> that had anything other than standard ASCII as the native format for
>>> text files. It's gotta be decades.
>>
>> ASCII character values are limited to the 0-127 range. That's an
>> outdated "standard".
>
> Well, these days we use the 8th bit for accented characters instead of
> just wasting it.

Then it is not ASCII.

>           Technically it's not your granddaddy's ASCII with that
> in use, but it's close enough for government work, and certainly close
> enough not to mess with using tests for CR/LF to detect line boundaries.

The character set and the record format are independent of each other.

Arne


0
Reply arne6 (9487) 2/25/2011 1:52:37 AM

On 24-02-2011 09:40, Lars Enderin wrote:
> 2011-02-24 15:19, Jussi Piitulainen skrev:
>> Ken Wesson writes:
>>
>>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>>
>>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>>> [...]
>>>>> Obsolete systems do not interest me.
>>>>
>>>> then…
>>>>
>>>>> Since those days, the world has standardized on ASCII flat files
>>>>> for text files.
>>>>
>>>> LOL!
>>>
>>> Windows text files are flat ASCII files (with CRLF line ends). Mac
>>> text files are flat ASCII files (with CR line ends). Unix text files
>>> are flat ASCII files (with LF line ends). And that exhausts 99.99%
>>> of the operating system market share right there, if not more, not
>>> counting smartphones which are all too modern to be using weird
>>> legacy formats for text files.
>>>
>>> I can't remember the last time I had to interoperate with any
>>> machine that had anything other than standard ASCII as the native
>>> format for text files. It's gotta be decades.
>>
>> I remember when we used a seven-bit character code to write my native
>> language. We could toggle the way we viewed the character codes where
>> we had put those characters that were not in ASCII. It was either
>> brackets and braces or those letters, but never both.
>>
>> V{nkyr{-{{kk|si{. It's not a happy memory.
>
> I have the same experience. C code wasn't very readable with "Swedish
> ASCII". At least Finnish doesn't use "å", except when quoting Swedish words.

Good old ISO 646 NRC.

Horrible by today's standards.

But back then it was what we had.

Arne
0
Reply arne6 (9487) 2/25/2011 1:54:31 AM

On 24-02-2011 13:44, Ken Wesson wrote:
> On Thu, 24 Feb 2011 16:19:09 +0200, Jussi Piitulainen wrote:
>> I remember when we used a seven-bit character code to write my native
>> language. etc
>
> That's why we now actually use that 8th bit for something useful, if need
> be.

Well - you are the one that has been claiming that everybody is using
a 7 bit standard (ASCII) today.

Arne
0
Reply UTF 2/25/2011 1:55:01 AM

On 24-02-2011 13:43, Ken Wesson wrote:
> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>> On 24/02/2011 14:00, Ken Wesson wrote:
>>> Windows text files are flat ASCII files (with CRLF line ends).
>>
>> Actually I find that, nowadays, lots of text files on Windows are
>> so-called 'ANSI' (mostly CP-1252)
>
> Same difference.

Completely different char set.

Arne
0
Reply UTF 2/25/2011 1:55:44 AM

On 24-02-2011 19:12, Lew wrote:
> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>> On 24/02/2011 14:00, Ken Wesson wrote:
>>> Windows text files are flat ASCII files (with CRLF line ends).
>>
>> Actually I find that, nowadays, lots of text files on Windows are
>> so-called
>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with BOM).
>>
>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode, Unicode
>> big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns out to be
>> CP-1252), Text-Document DOS format (turns out to be CP-850) and
>> Unicode. No
>> ASCII.
>
> Windows hasn't used ASCII in decades.

I don't think it ever have.

DOS used CP-437, CP-850 etc..

32/64 bit Windows uses CP-1252 (which is practically the
same as ISO-8859-1) and some UTF-16.

..NET added UTF-8.

I don't remember 16 bit Windows, but I am pretty sure
that it did not use ASCII.

Arne

PS: CP-850 and CP-1252 is for western countries - other
     countries uses other char sets.

0
Reply UTF 2/25/2011 1:58:24 AM

On 24-02-2011 09:18, Michael Wojcik wrote:
> Ken Wesson wrote:
>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>> "Record formats" are not relevant here, nor was someone else's concern
>>>> about compressed formats -- the OP clearly said "a text file", by which
>>>> is generally understood flat ASCII with CR, LF, or CRLF as line
>>>> delimiter.
>
> Ah, the warm blanket of provincialism.

Yep.

>>>       OpenVMS supports many record formats, but the "native" one for
>>> text files is VAR: A two-byte binary count, the payload characters, and
>>> if necessary a padding byte to make the total byte count even.
>>> ...
>>>       In short, all I'm asking is that you delete the word "generally"
>>> because your experience is insufficiently general.
>
> On the IBM i machines (formerly i Series, formerly System i, formerly
> AS/400, successor to the System/3x), using the default filesystem, a
> text "file" is actually a series of records in a "member" of a
> "physical file". The i operating system hides implementation details,
> but access to the contents of the "file" is record-oriented, not
> byte-oriented.

And it is a pretty good guess that the RandomAccessFile searching
for CR and LF will fail on i also then.

> In the alternate Hierarchical File System supported by the i machines
> for POSIX compatibility, text files are byte-oriented, but usually
> EBCDIC, not ASCII.
>
> On IBM and other EBCDIC mainframe systems, there are a variety of
> formats for text files, but flat byte-oriented ASCII isn't one of
> them, unless you're running Linux.

Linux will be either ISO-8859-1 or UTF-8 not ASCII.

Arne
0
Reply UTF 2/25/2011 2:00:20 AM

On 24-02-2011 13:42, Ken Wesson wrote:
> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>> Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 00:42:41 -0500, Eric Sosman wrote:
>>>> On 2/23/2011 10:23 PM, Ken Wesson wrote:
>>>>> "Record formats" are not relevant here, nor was someone else's
>>>>> concern about compressed formats -- the OP clearly said "a text
>>>>> file", by which is generally understood flat ASCII with CR, LF, or
>>>>> CRLF as line delimiter.
>>
>> Ah, the warm blanket of provincialism.
>
> Who asked you for your opinions of others here?

Anyone posting to usenet gives the entire world the
opportunity to comment on them.

The smart people try to post something smart.

>>>>       OpenVMS supports many record formats, but the "native" one for
>>>> text files is VAR: A two-byte binary count, the payload characters,
>>>> and if necessary a padding byte to make the total byte count even. ...
>>>>       In short, all I'm asking is that you delete the word "generally"
>>>> because your experience is insufficiently general.
>>
>> On the IBM i machines (formerly i Series, formerly System i, formerly
>> AS/400, successor to the System/3x), blah blah blah
>
> You're one to talk about provincialism. Who the hell uses these ancient
> museum pieces any more?

Lots of places.

Retail sector, public sector, financial sector

>>> Obsolete systems do not interest me.
>>
>> Apparently, neither do prominent ones that you don't happen to know
>> about.
>
> There is nothing at all prominent about those IBM dinosaurs. They may
> have been prominent 30 years ago, but not now.

Both z/OS and i are widely used today.

>
>>> Since those days, the world has
>>> standardized on ASCII flat files for text files.
>>
>> Only for sufficiently small values of "the world".
>
> Fine, then -- corporate America and home computers in America then.

OK - neither z/OS or i are common on home computers.

But they are very common in corporate America.

If all z/OS systems disappeared over night then everything
would break down, because so many critical systems are
running on them.

Arne
0
Reply UTF 2/25/2011 2:09:44 AM

On 24-02-2011 17:11, Michael Wojcik wrote:
> Ken Wesson wrote:
>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
>
> Tell that to the many thousands of organizations that still use them.
>
> And the majority of business transactions still runs on IBM mainframe
> and midrange systems, and similar offerings from other companies.
>
> IBM had just shy of $100B in sales last year. A good chunk of that was
> from mainframes: mainframe sales were up 68% from 2009, to the best
> level in six years.

The biggest chunk of IBM's revenue is services.

But they still sell a lot of big iron.

The don't publicize numbers at the OS level, but I would guess that
at least 10 B$ was mainframe HW & SW.

Arne
0
Reply arne6 (9487) 2/25/2011 2:24:30 AM

On 2/24/2011 2:10 PM, Ken Wesson wrote:
> On Fri, 25 Feb 2011 02:52:56 +0800, Peter Duniho wrote:
>
>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>> outdated "standard".
>>>>
>>>> Used by "obsolete systems".  A key point in my amusement.  :)
>>>
>>> I thought so, but Ken seemed to need an explanation.
>>
>> Yes, and it was a good explanation.  Unfortunately, I don't think he
>> understood the explanation, nor do I think he will understand further
>> clarification.  I think it more likely that the harder anyone tries to
>> explain to him these points, the more dug in his heels will be.
>
> You know, that's what you can expect when you are unpleasant, nasty, and
> rude about things -- other people display a curious unwillingness to
> listen to anything you have to say. An old adage comes to mind --
> something about honey and vinegar?
>
> (It doesn't help when your "counterexamples" are obscure formats used on
> dinosaurian machines of yesteryear; the fact is that text files with CR/
> LF line delimiters are standard on a set of operating systems that have
> the overwhelming majority of the market share for such these days.)

     Interesting.  On the one hand he retreats from his earlier claim
that "ASCII" encoding is universal, and on the other he advances the
notion that CR/LF is The One True Delimiter.  So, which hand advances
and which retreats?  Is he spinning clockwise or counterclockwise?
Well, maybe his rotation will make him a sort of human eggbeater,
better at mixing the vinegar with the honey.  (Ugh.)

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 2/25/2011 2:32:49 AM

On 2/24/2011 2:14 PM, Daniele Futtorovic wrote:
> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>> it's not (...) ASCII (...).
>
> Spot on.

     I think it's amusing that he says "All the world's ASCII," and
posts his assertion in a message whose Content-Type says otherwise.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply Eric 2/25/2011 2:42:10 AM

On Thu, 24 Feb 2011 21:32:49 -0500, Eric Sosman wrote:

> On 2/24/2011 2:10 PM, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 02:52:56 +0800, Peter Duniho wrote:
>>
>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>> outdated "standard".
>>>>>
>>>>> Used by "obsolete systems".  A key point in my amusement.  :)
>>>>
>>>> I thought so, but Ken seemed to need an explanation.
>>>
>>> Yes, and it was a good explanation.  Unfortunately, I don't think he
>>> understood the explanation, nor do I think he will understand further
>>> clarification.  I think it more likely that the harder anyone tries to
>>> explain to him these points, the more dug in his heels will be.
>>
>> You know, that's what you can expect when you are unpleasant, nasty,
>> and rude about things -- other people display a curious unwillingness
>> to listen to anything you have to say. An old adage comes to mind --
>> something about honey and vinegar?
>>
>> (It doesn't help when your "counterexamples" are obscure formats used
>> on dinosaurian machines of yesteryear; the fact is that text files with
>> CR/ LF line delimiters are standard on a set of operating systems that
>> have the overwhelming majority of the market share for such these
>> days.)
> 
>      Interesting.  On the one hand he retreats from

Didn't anyone ever tell you that it was rude to discuss someone in the 
third person right in front of him like that?

If you have something to say about me you can address it directly to me, 
Sosman.

> human eggbeater

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/25/2011 4:35:13 AM

On Thu, 24 Feb 2011 20:50:55 -0500, Arne Vajhøj wrote:

> On 24-02-2011 15:49, Tom Anderson wrote:
>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>
>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>> outdated "standard".
>>>>>
>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>
>>>> I thought so, but Ken seemed to need an explanation.
>>>
>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>> understood the explanation, nor do I think he will understand further
>>> clarification. I think it more likely that the harder anyone tries to
>>> explain to him these points, the more dug in his heels will be.
>>>
>>> To do otherwise would necessarily require an admission that there's no
>>> single "text file" format, and that even if there were, ASCII or any
>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>> such an admission would ever be produced.
>>
>> There is a single text file format: lines of characters in some
>> encoding, terminated by an end-of-line sequence which is
>> distinguishable from any other characters.
>>
>> It's merely the case that some current mainframes, and some obscure or
>> historical systems, do not store text in text files!
> 
> No.
> 
> There are also count prefix (and sometimes suffix) formats.

Those aren't text files. Text is, notionally, a string of characters, 
including perhaps spaces and line-end characters. A text file is 
therefore a file whose content is a string of characters, including 
perhaps spaces and line-end characters. Such a thing is, logically, the 
only native way to represent raw text. Anything more structured is 
obviously not a plain text file. It may be a text-containing file of some 
kind but it is not a text file.

> They have the advantage of begin able to actually have all possible
> values in lines.

That's nonsense. The only character a normal text file cannot have in 
lines is a line break, and in actual fact you cannot have a line break in 
the middle of a line *by definition*. Wherever there is a line break one 
line ENDS and another one BEGINS, *by definition*. If that weren't the 
case then it wouldn't be a line break!

So there is no "advantage" here. What you are actually describing is a 
"list-of-strings" file, not a text file (which is representable as a 
single string). Your "list-of-strings" file is in fact NOT representable 
as a single string, without resort to some escaping mechanism or the use 
of a data structure such as ArrayList<String>. For instance, suppose you 
have a file of two test records, one of which is

foo
bar

and the other of which is

baz

where the first one has a literal linebreak after the second o. To 
represent this whole file in a string requires either a string 
serialization of an in-memory record format (e.g., ArrayList<String> -> 
ObjectOutputStream -> ByteBuffer -> BinHex converter -> String) or a 
string with delimiters. Say you use newlines as the delimiters. Then you 
need to escape the literal newline after that o, say,

foo\nbar
baz

is the string. And you also need to escape the escape, so, here you'd 
have to escape backslashes. Any other delimiter you choose instead will 
have the same effect, so long as you are storing this into a String. (You 
can use an ArrayList<Object> with Characters and a delimiter object that 
is not equal to any Character, but then once again this is not a String.)

Face it: those record-oriented file formats are not text files. They have 
additional structure that cannot be represented natively in a String, 
therefore represent more than just a String, such as a collection of 
Strings, and therefore are not text files but something else -- archives 
of multiple text files bundled into single files.

The main use for such a thing over plain ordinary text files that I can 
think of is storing a mailbox without resorting to hacks that behave 
oddly when lines in the bodies start with the word "from". And these days 
filesystems work fine with large directories full of tiny files, so 
there's less need for that sort of thing than there once was.

> And the disadvantage of various hacks assuming all records use
> delimiters does not work.

Nobody is assuming records use delimiters. They are assuming text files 
are text files. The lines in text files use delimiters as an inherent 
property. If you have a text in a String, seeking backward from the end 
until a newline character (or the beginning of the String, whichever you 
hit first) will reliably find the start of the last line in the String. 
The same is true of any disk file format that faithfully represents the 
String as a flat string of text rather, and in particular of the formats 
commonly used to store, e.g., C source files.
0
Reply Ken 2/25/2011 4:48:38 AM

On Thu, 24 Feb 2011 20:48:50 -0500, Arne Vajhøj wrote:

> I am pretty sure that Ken completely missed your joke.

What did I just tell Sosman about talking about people as if they aren't 
there?

And what does this have to do with Java? This is 
comp.lang.java.programmer, Arne, not rec.humor.did.ken.get.the.joke.

0
Reply Ken 2/25/2011 4:49:49 AM

On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:

> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>> it's not (...) ASCII (...).

Alleged by whom? That distorted quote is most certainly not what I wrote.
0
Reply Ken 2/25/2011 4:50:26 AM

On Thu, 24 Feb 2011 21:42:10 -0500, Eric Sosman wrote:

>      I think it's amusing that he says "All the world's ASCII,"

Who says "all the world's ASCII", Sosman? I can't recall anybody doing so 
in this group recently.

It is true that almost all the world seems to use encodings that contain 
ASCII as a subset. That is not quite the same thing.
0
Reply Ken 2/25/2011 4:51:47 AM

On Thu, 24 Feb 2011 20:22:56 +0100, Luuk wrote:

> On 24-02-11 19:46, Ken Wesson wrote:
>> but it's close enough for government work,
> 
> hopefully you live in another country than i do....

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/25/2011 4:52:06 AM

On Thu, 24 Feb 2011 20:52:37 -0500, Arne Vajhøj wrote:

> On 24-02-2011 13:46, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 15:14:44 +0100, Lars Enderin wrote:
>>
>>> 2011-02-24 15:00, Ken Wesson skrev:
>>>> I can't remember the last time I had to interoperate with any machine
>>>> that had anything other than standard ASCII as the native format for
>>>> text files. It's gotta be decades.
>>>
>>> ASCII character values are limited to the 0-127 range. That's an
>>> outdated "standard".
>>
>> Well, these days we use the 8th bit for accented characters instead of
>> just wasting it.
> 
> Then it is not ASCII.

It contains ASCII as a subset.

So it is ASCII. And more.

>>           Technically it's not your granddaddy's ASCII with that
>> in use, but it's close enough for government work, and certainly close
>> enough not to mess with using tests for CR/LF to detect line
>> boundaries.
> 
> The character set and the record format are independent of each other.

Record formats are not relevant here, since text files do not have record 
formats; they are raw sequences in some character set more or less by 
definition. Anything with additional structure over and above that is 
something other than a text file. Generically we call such things "binary 
files" though commonly binary files do *contain* text. But all contain 
additional structure that cannot be represented in, say, a 
java.lang.String without resort to some form of escaping or encoding. And 
that makes them not pure text, but text-and-some-other-stuff or some-
other-stuff-that-happens-to-contain-text.
0
Reply kwesson (107) 2/25/2011 4:54:30 AM

On Thu, 24 Feb 2011 20:55:01 -0500, Arne Vajhøj wrote:

> On 24-02-2011 13:44, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 16:19:09 +0200, Jussi Piitulainen wrote:
>>> I remember when we used a seven-bit character code to write my native
>>> language. etc
>>
>> That's why we now actually use that 8th bit for something useful, if
>> need be.
> 
> Well - you are the one that has been claiming that everybody is using a
> 7 bit standard (ASCII) today.

Technically they are, since the various more recent standards they use 
contain ASCII as a subset and generally reduce to ASCII if you strip the 
high bit off (code pages) or the high byte and highest remaining bit (16-
bit encodings). So they are using ASCII and sometimes some additional 
stuff that encloses and contains ASCII.
0
Reply kwesson (107) 2/25/2011 4:56:21 AM

On Thu, 24 Feb 2011 20:55:44 -0500, Arne Vajhøj wrote:

> On 24-02-2011 13:43, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>
>>> Actually I find that, nowadays, lots of text files on Windows are
>>> so-called 'ANSI' (mostly CP-1252)
>>
>> Same difference.
> 
> Completely different char set.

Funny that something so "completely different" intersects with ASCII in 
the entirety of ASCII's range (0-127). It just specifies what 128-255 
mean instead of leaving those values undefined. Unicode specifies what 
128-65535 mean and still intersects with ASCII on 0-127.

0
Reply kwesson (107) 2/25/2011 4:57:46 AM

On Thu, 24 Feb 2011 20:58:24 -0500, Arne Vajhøj wrote:

> On 24-02-2011 19:12, Lew wrote:
>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>
>>> Actually I find that, nowadays, lots of text files on Windows are
>>> so-called
>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>> BOM).
>>>
>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns
>>> out to be CP-1252), Text-Document DOS format (turns out to be CP-850)
>>> and Unicode. No
>>> ASCII.
>>
>> Windows hasn't used ASCII in decades.
> 
> I don't think it ever have.

Funny then that bog-standard ASCII files seem to read and write just fine 
in Notepad on the occasions that I use Windows computers.

> DOS used CP-437, CP-850 etc..
> 
> 32/64 bit Windows uses CP-1252 (which is practically the same as
> ISO-8859-1) and some UTF-16.

All of those seem to be ASCII plus another up to 128 characters, or in 
the case of UTF-16, another up to 65408 characters.

Saying that a 7-bit-clean file interpreted in one of those is not ASCII 
is like saying that humans are not mammals.
0
Reply Ken 2/25/2011 5:00:01 AM

On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:

> On 24-02-2011 09:00, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>
>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>> [...]
>>>> Obsolete systems do not interest me.
>>>
>>> then…
>>>
>>>> Since those days, the world has standardized on ASCII flat files for
>>>> text files.
>>>
>>> LOL!
>>
>> Windows text files are flat ASCII files (with CRLF line ends).
> 
> No.
> 
> They are CP-1252, UTF-8 or UTF-16.

All of which are ASCII++, for all intents and purposes.

>>                                                             Mac text
>> files are flat ASCII files (with CR line ends). Unix text files are
>> flat ASCII files (with LF line ends).
> 
> No.
> 
> They are ISO-8859-1 or UTF-8.

Which are ASCII++, for all intents and purposes.

>>                                 And that exhausts 99.99% of the
>> operating system market share right there, if not more,
> 
> No.
> 
> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.

Nonsense. There are *at least* ten thousand PCs running Windows for every 
one machine running one of those operating systems.

Ten thousand *PCs running Windows*.

If you throw in Unix and MacOS you get a lot more, especially given how 
heavily Unix is used in server racks.

>> I can't remember the last time I had to interoperate with any machine
>> that had anything other than standard ASCII as the native format for
>> text files. It's gotta be decades.
> 
> Possible that you only work with 20+ year old Unix and OpenVMS systems
> with 7 bit VT100 access.
> 
> But that is not very common.

I work with what nearly everyone in the field works with these days: a 
mix of Unix, MacOS, and Windows, mainly Unix server blades whose services 
are accessed by mainly Windows desktop/netbook users with a smattering of 
Mac users and a small but growing contingent of smartphone users.
0
Reply Ken 2/25/2011 5:04:04 AM

On Thu, 24 Feb 2011 13:12:36 -0700, Jim Janney wrote:

> Ken Wesson <kwesson@gmail.com> writes:
> 
>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>
>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>> AS/400, successor to the System/3x), blah blah blah
>>
>> You're one to talk about provincialism. Who the hell uses these ancient
>> museum pieces any more?
> 
> Um, that would be me, or rather my employer's customers.

Your employer may happen to be using such legacy systems, but I very much 
doubt that very many people deal with them in an IT capacity. Far, *far* 
fewer than deal with Unix, Windows, and Mac boxes in such a capacity.

How many end-users interact indirectly with these systems is of course 
irrelevant.
0
Reply Ken 2/25/2011 5:06:16 AM

On Thu, 24 Feb 2011 21:25:05 +0000, Martin Gregorie wrote:

> On Thu, 24 Feb 2011 19:42:19 +0100, Ken Wesson wrote:
> 
>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>> 
>>> Ken Wesson wrote:
>>>> Obsolete systems do not interest me.
>>> 
>>> Apparently, neither do prominent ones that you don't happen to know
>>> about.
>> 
>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
>>
> You know, you sound exactly like a character who surfaced in a Y2K
> newsgroup back in 1998/99. He refused to believe that any computers
> apart from PCs were in use at the time.

I doubt that. He may have correctly pointed out that the vast *majority* 
of computers were PCs at the time. (Now, laptops and smartphones may have 
the slight edge, or perhaps even server blades, now that typical servers 
are racks full of small computers instead of single big computers.)

If he did claim they *all* were then he was an idiot.
0
Reply Ken 2/25/2011 5:08:01 AM

On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:

> Ken Wesson wrote:
>> Who asked you for your opinions of others here?
> 
> No one. I offer them out of sheer generosity.

Calling other people names is hardly what I would call "generosity", nor 
is polluting a newsgroup with off-topic traffic.

> the local idiots

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>> AS/400, successor to the System/3x), blah blah blah
>> 
>> You're one to talk about provincialism. Who the hell uses these ancient
>> museum pieces any more?
> 
> Thousands of organizations, which is why they still enjoy healthy sales.

Ah, must be vendor lockin. Sucks to be them. Soon they'll be outcompeted 
by newer, nimbler firms that use modern things like the free Unixes on 
commodity hardware. Of course, they might last a while if they can keep 
convincing the government to give them "bailouts" or other protectionist 
help in the face of competitors and their own screwups.

Still, your "thousands" of organizations are outweighed by the *hundreds* 
of thousands that don't use such systems and *they* are outweighed by the 
hundreds of *millions* of individuals who collectively possess *billions* 
of personal computers (often two or three desktop/laptop/netbook 
machines, one current and one or two older units, *plus* a smartphone or 
an iPad or whatever, and that's not even counting routers or other 
gadgets with general-purpose microprocessors but non-general-purpose 
applications).

Less than 1 in 10,000 computers, and probably *far* less, don't store 
text files in a form that consists of CR, CRLF, or LF delimited lines. A 
very large number of those that do in fact use one or another ASCII-
superset character set and they pretty much all intersect on using 
characters 10 and 13 to represent the potential line-end characters.

>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
> 
> Tell that to the many thousands of organizations that still use them.

Perhaps ten thousand aging dinosaurian computers using them. Over one 
billion using some variation on the theme of Windows, Unix, MacOS, iOS, 
or Android. Probably more devices use phone OS also-rans like SymbianOS 
and PalmOS than use those oddball IBM operating systems.

> And the majority of business transactions

have no bearing on this discussion, which has to do with the majority of 
*computers* and, secondarily, what will be encountered routinely by the 
majority of *IT workers*.

> IBM had just shy of $100B in sales last year.

Vendor lockin has been good to them.

> you don't know what the hell

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> can't be bothered to do even a moment of research

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> your ignorance

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> Not that you'll learn

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/25/2011 5:19:37 AM

On Thu, 24 Feb 2011 21:09:44 -0500, Arne Vajhøj wrote:

> On 24-02-2011 13:42, Ken Wesson wrote:
>> Who asked you for your opinions of others here?
> 
> Anyone posting to usenet gives the entire world the opportunity to
> comment on them.

Only people ignorant about etiquette, the newsgroup's topic, or both will 
actually do so.

> The smart people

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>> You're one to talk about provincialism. Who the hell uses these ancient
>> museum pieces any more?
> 
> Lots of places.
> 
> Retail sector, public sector, financial sector

If you're counting it that way, that's 3 places. Hardly "lots". :)

See other posts. Perhaps a collected few tens of thousands of computers 
using museum-worthy OSes like those versus a collected *billion* or more 
of machines running Windows, MacOS, iOS, Android, and Unix.

>> There is nothing at all prominent about those IBM dinosaurs. They may
>> have been prominent 30 years ago, but not now.
> 
> Both z/OS and i are widely used today.

If by "widely used" you mean on one in ten thousand or fewer computers.

>> Fine, then -- corporate America and home computers in America then.
> 
> OK - neither z/OS or i are common on home computers.
> 
> But they are very common in corporate America.

If by "very common" you mean used on one in ten thousand or fewer of 
their computers. For every single z/OS machine in corporate America there 
are probably a thousand blade servers and ten thousand office PCs and 
employer-provided laptops and God alone knows how many employee 
smartphones with plans and/or handsets paid for by their company.

> If all z/OS systems disappeared over night then everything would break
> down, because so many critical systems are running on them.

A somewhat scary thought, but hardly relevant unless you're trying to 
stir up enough public alarm to foment a general movement to replace these 
legacy systems with more modern ones.
0
Reply Ken 2/25/2011 5:26:43 AM

On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:

> On 24-02-2011 09:18, Michael Wojcik wrote:
>> Ah, the warm blanket of provincialism.
> 
> Yep.

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> And it is a pretty good guess that the RandomAccessFile searching for CR
> and LF will fail on i also then.

How fortunate that i runs on fewer than one in ten thousand machines. 
Does Java even run on i?

> Linux will be either ISO-8859-1 or UTF-8 not ASCII.

Both contain ASCII as a subset -- if you take a pure-ASCII file and 
reencode it in either the result is the identical byte sequence.
0
Reply Ken 2/25/2011 5:28:27 AM

On Thu, 24 Feb 2011 20:45:36 -0500, Arne Vajhøj wrote:

> On 24-02-2011 08:06, Ken Wesson wrote:
>> Obsolete systems do not interest me.
> 
> Whether a solution works in general or not depends on whether it is
> guaranteed to work on all platforms or not.

Actually, "in general" tends to have some kind of implicit scope that is 
usually less than "all platforms". For instance, when discussing a Java 
solution, we can exclude platforms that Java doesn't run on. The last I 
heard that even includes one prominent one: iOS, the platform of iPhones 
and iPads.

> The RandomAccessFile and search for CR and LF does not.

It probably runs on all platforms Java is normally used on. It certainly 
runs on 99.99% or more of the machines anyone is likely to run Java on, 
AND the remaining less than .01% are ones sufficiently oddball that their 
operators will *know* to expect common crossplatform software to often 
break on them. Typical C code using I/O will probably not work on such 
machines without heavy modification, even C code that compiles and works 
fine on every POSIX-compliant system and every Windows box and most other 
machines. Hell, these machines may not even be able to represent C source 
trees normally, requiring the compiler vendor to jump through hoops and 
requiring unusual tools and IDEs be used to hack C sources and not just 
the system text editor. Hell, I wouldn't be surprised if there were no 
working C implementations on some of these systems -- and I'd be 
surprised if many, if any, of them ran Java at all, let alone had a fully 
compliant JavaSE 5/6 implementation.

> Whether it works on platforms that interest you are completely
> irrelevant.

On the contrary, whether software works on platforms that interest its 
developer and user base is 100% relevant and whether it works on 
platforms that *don't* interest its developer and user base is irrelevant.

>>                                Since those days, the world has
>> standardized on ASCII flat files for text files.
> 
> Not really.
> 
> Windows uses CP-1252, UTF-8 and UTF-16 Unix/Linux/VMS uses ISO-8859-1
> and UTF-8

All ASCII supersets. Which means the common denominator among all those 
is ... ta-da! ASCII. :)

> IBM mainframe uses EBCDIC

And hardly anyone uses IBM mainframe (sic). What was that figure again? 
0.01% of all computers? Or fewer. And shrinking. Even if the number of 
IBM mainframes is actually growing (for the love of God, *why*?), the 
number of non-IBM-mainframe computers is growing *exponentially* faster. 
There was a time when IBM mainframes may have been over 50% and were 
surely over 20% of all computers; the trend has been one of exponential 
decay of that percentage ever since, with the knee of the curve 
corresponding quite closely with the beginning of widespread adoption of 
the PC.

> There are really very few systems today that uses just ASCII.

But many that use ASCII.
0
Reply Ken 2/25/2011 5:36:41 AM

On Thu, 24 Feb 2011 20:39:31 -0500, Arne Vajhøj wrote:

> On 23-02-2011 22:23, Ken Wesson wrote:
>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>> Is it possible to read the last text line from a text file WITHOUT
>>>> reading the previous (n-1) lines?
>>>
>>> In general no.
>>>
>>> All the RandomAccessFile tricks are based on assumptions about lines
>>> being separated by something - they do not work with record formats
>>> that contains a line length instead of a delimiter.
>>
>> "Record formats" are not relevant here,
> 
> They are

They are not, since files in record formats are not text files.

>>                                      nor was someone else's concern
>> about compressed formats -- the OP clearly said "a text file", by which
>> is generally understood flat ASCII with CR, LF, or CRLF as line
>> delimiter.
> 
> non IT pros

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> They know that there are other character sets and other record formats.

Other character sets mostly intersect in ASCII. Nearly all in any kind of 
widespread use intersect in using characters 10 and 13 as the potential-
line-end characters. And "other record formats" are not relevant in a 
discussion of text files, as has been explained already.
0
Reply Ken 2/25/2011 5:38:32 AM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-1766341026-1298628187=:14571
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Thu, 24 Feb 2011, Arne Vajh�j wrote:

> On 24-02-2011 15:49, Tom Anderson wrote:
>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>> 
>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>> outdated "standard".
>>>>> 
>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>> 
>>>> I thought so, but Ken seemed to need an explanation.
>>> 
>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>> understood the explanation, nor do I think he will understand further
>>> clarification. I think it more likely that the harder anyone tries to
>>> explain to him these points, the more dug in his heels will be.
>>> 
>>> To do otherwise would necessarily require an admission that there's no
>>> single "text file" format, and that even if there were, ASCII or any
>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>> such an admission would ever be produced.
>> 
>> There is a single text file format: lines of characters in some
>> encoding, terminated by an end-of-line sequence which is distinguishable
>> from any other characters.
>> 
>> It's merely the case that some current mainframes, and some obscure or
>> historical systems, do not store text in text files!
>
> No.

Yes.

> There are also count prefix (and sometimes suffix) formats.

Which, although they may be used to store text, are not text files.

> They have the advantage of begin able to actually have
> all possible values in lines.

True. I wish we used more formats like this.

tom

-- 
If it ain't Alberta, it ain't beef.
--232016332-1766341026-1298628187=:14571--
0
Reply Tom 2/25/2011 10:03:07 AM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-988153464-1298628404=:14571
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Thu, 24 Feb 2011, Arne Vajhøj wrote:

> On 24-02-2011 17:11, Michael Wojcik wrote:
>> Ken Wesson wrote:
>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>> have been prominent 30 years ago, but not now.
>> 
>> Tell that to the many thousands of organizations that still use them.
>> 
>> And the majority of business transactions still runs on IBM mainframe
>> and midrange systems, and similar offerings from other companies.
>> 
>> IBM had just shy of $100B in sales last year. A good chunk of that was
>> from mainframes: mainframe sales were up 68% from 2009, to the best
>> level in six years.
>
> The biggest chunk of IBM's revenue is services.
>
> But they still sell a lot of big iron.

Do they actually sell them? What happened to the leasing model?

tom

-- 
If it ain't Alberta, it ain't beef.
--232016332-988153464-1298628404=:14571--
0
Reply Tom 2/25/2011 10:06:44 AM

On 25-02-2011 05:06, Tom Anderson wrote:
> On Thu, 24 Feb 2011, Arne Vajhøj wrote:
>> On 24-02-2011 17:11, Michael Wojcik wrote:
>>> Ken Wesson wrote:
>>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>>> have been prominent 30 years ago, but not now.
>>>
>>> Tell that to the many thousands of organizations that still use them.
>>>
>>> And the majority of business transactions still runs on IBM mainframe
>>> and midrange systems, and similar offerings from other companies.
>>>
>>> IBM had just shy of $100B in sales last year. A good chunk of that was
>>> from mainframes: mainframe sales were up 68% from 2009, to the best
>>> level in six years.
>>
>> The biggest chunk of IBM's revenue is services.
>>
>> But they still sell a lot of big iron.
>
> Do they actually sell them? What happened to the leasing model?

Good question.

I don't know if they sell or lease them out.

IBM deliver boxes to customers and get a ton of money
in return.

Arne

0
Reply arne6 (9487) 2/25/2011 2:15:47 PM

On 25-02-2011 05:03, Tom Anderson wrote:
> On Thu, 24 Feb 2011, Arne Vajh�j wrote:
>
>> On 24-02-2011 15:49, Tom Anderson wrote:
>>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>>
>>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>>> outdated "standard".
>>>>>>
>>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>>
>>>>> I thought so, but Ken seemed to need an explanation.
>>>>
>>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>>> understood the explanation, nor do I think he will understand further
>>>> clarification. I think it more likely that the harder anyone tries to
>>>> explain to him these points, the more dug in his heels will be.
>>>>
>>>> To do otherwise would necessarily require an admission that there's no
>>>> single "text file" format, and that even if there were, ASCII or any
>>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>>> such an admission would ever be produced.
>>>
>>> There is a single text file format: lines of characters in some
>>> encoding, terminated by an end-of-line sequence which is distinguishable
>>> from any other characters.
>>>
>>> It's merely the case that some current mainframes, and some obscure or
>>> historical systems, do not store text in text files!
>>
>> No.
>
> Yes.
>
>> There are also count prefix (and sometimes suffix) formats.
>
> Which, although they may be used to store text, are not text files.

Of course they are text files.

If I edit Foobar.java in a text editor and write a Java program
and saves it, then why should it be less of a text file, because
the record format used on that system is not delimited?

Arne
0
Reply ISO 2/25/2011 2:28:04 PM

On Fri, 25 Feb 2011 06:08:01 +0100, Ken Wesson wrote:

> On Thu, 24 Feb 2011 21:25:05 +0000, Martin Gregorie wrote:
> 
>> On Thu, 24 Feb 2011 19:42:19 +0100, Ken Wesson wrote:
>> 
>>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>> 
>>>> Ken Wesson wrote:
>>>>> Obsolete systems do not interest me.
>>>> 
>>>> Apparently, neither do prominent ones that you don't happen to know
>>>> about.
>>> 
>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>> have been prominent 30 years ago, but not now.
>>>
>> You know, you sound exactly like a character who surfaced in a Y2K
>> newsgroup back in 1998/99. He refused to believe that any computers
>> apart from PCs were in use at the time.
> 
> I doubt that. He may have correctly pointed out that the vast *majority*
> of computers were PCs at the time. (Now, laptops and smartphones may
> have the slight edge, or perhaps even server blades, now that typical
> servers are racks full of small computers instead of single big
> computers.)
> 
> If he did claim they *all* were then he was an idiot.

He did and he was.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/25/2011 2:28:32 PM

On 25-02-2011 00:38, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:39:31 -0500, Arne Vajhøj wrote:
>
>> On 23-02-2011 22:23, Ken Wesson wrote:
>>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>> reading the previous (n-1) lines?
>>>>
>>>> In general no.
>>>>
>>>> All the RandomAccessFile tricks are based on assumptions about lines
>>>> being separated by something - they do not work with record formats
>>>> that contains a line length instead of a delimiter.
>>>
>>> "Record formats" are not relevant here,
>>
>> They are
>
> They are not, since files in record formats are not text files.

Given that:
   data + LF
   data + CR + LF
are alo record formats then that is nonsense.

>> They know that there are other character sets and other record formats.
>
> Other character sets mostly intersect in ASCII. Nearly all in any kind of
> widespread use intersect in using characters 10 and 13 as the potential-
> line-end characters. And "other record formats" are not relevant in a
> discussion of text files, as has been explained already.

As has been proven not to be the case.

Arne

0
Reply UTF 2/25/2011 2:29:13 PM

On 25-02-2011 00:04, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 09:00, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>>
>>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>>> [...]
>>>>> Obsolete systems do not interest me.
>>>>
>>>> then…
>>>>
>>>>> Since those days, the world has standardized on ASCII flat files for
>>>>> text files.
>>>>
>>>> LOL!
>>>
>>> Windows text files are flat ASCII files (with CRLF line ends).
>>
>> No.
>>
>> They are CP-1252, UTF-8 or UTF-16.
>
> All of which are ASCII++, for all intents and purposes.

This is an IT group.

Not a group for hairdressers or chefs.

This mean that we use exact terms.

ASCII is a very well defined standard specified by ANSI and ISO.

There are no such thing as ASCII++.

>>>                                                              Mac text
>>> files are flat ASCII files (with CR line ends). Unix text files are
>>> flat ASCII files (with LF line ends).
>>
>> No.
>>
>> They are ISO-8859-1 or UTF-8.
>
> Which are ASCII++, for all intents and purposes.
>
>>>                                  And that exhausts 99.99% of the
>>> operating system market share right there, if not more,
>>
>> No.
>>
>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>
> Nonsense. There are *at least* ten thousand PCs running Windows for every
> one machine running one of those operating systems.
>
> Ten thousand *PCs running Windows*.

The PC/mainframe ratio is probably like 100000:1.

But the relevance is not that big. Because mainframes happen
to be a lot more expensive than PC's.

>>> I can't remember the last time I had to interoperate with any machine
>>> that had anything other than standard ASCII as the native format for
>>> text files. It's gotta be decades.
>>
>> Possible that you only work with 20+ year old Unix and OpenVMS systems
>> with 7 bit VT100 access.
>>
>> But that is not very common.
>
> I work with what nearly everyone in the field works with these days: a
> mix of Unix, MacOS, and Windows, mainly Unix server blades whose services
> are accessed by mainly Windows desktop/netbook users with a smattering of
> Mac users and a small but growing contingent of smartphone users.

The you won't have any users using ASCII.

Arne
0
Reply arne6 (9487) 2/25/2011 2:36:29 PM

On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:

> If by "very common" you mean used on one in ten thousand or fewer of
> their computers. For every single z/OS machine in corporate America
> there are probably a thousand blade servers and ten thousand office PCs
> and employer-provided laptops and God alone knows how many employee
> smartphones with plans and/or handsets paid for by their company.
>
By that standard PCs, in which lets include desktops and laptops, are 
also a tiny small proportion of all computers once you count phones and 
all the embedded computers in vehicles.

IMO its a silly argument because very many PCs are used for only a small 
part of the day and do very little apart from using electricity and 
occasionally receiving and sending a few e-mails. A better measure is the 
number of transactions and documents handled by each machine per year.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply martin1645 (527) 2/25/2011 2:37:07 PM

On 25-02-2011 00:26, Ken Wesson wrote:
> On Thu, 24 Feb 2011 21:09:44 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 13:42, Ken Wesson wrote:
>>> You're one to talk about provincialism. Who the hell uses these ancient
>>> museum pieces any more?
>>
>> Lots of places.
>>
>> Retail sector, public sector, financial sector
>
> If you're counting it that way, that's 3 places. Hardly "lots". :)

I have news for you - the number of business entities in those
3 sectors are a lot higher than 3.

We already understand that you have no knowledge about businesses.

But I assume that you have seen a world map. You are no aware
that other countries has public sectors??

> See other posts. Perhaps a collected few tens of thousands of computers
> using museum-worthy OSes like those versus a collected *billion* or more
> of machines running Windows, MacOS, iOS, Android, and Unix.

There are also more flies than humans on earth.

That does not make flies more important.

>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>> have been prominent 30 years ago, but not now.
>>
>> Both z/OS and i are widely used today.
>
> If by "widely used" you mean on one in ten thousand or fewer computers.

But a lot more in revenue.

>>> Fine, then -- corporate America and home computers in America then.
>>
>> OK - neither z/OS or i are common on home computers.
>>
>> But they are very common in corporate America.
>
> If by "very common" you mean used on one in ten thousand or fewer of
> their computers. For every single z/OS machine in corporate America there
> are probably a thousand blade servers and ten thousand office PCs and
> employer-provided laptops and God alone knows how many employee
> smartphones with plans and/or handsets paid for by their company.

And?

If a company buys a mainframe for 20 M$ and 10000 PC's for 10 M$,
then it is 2/3 mainframe.

>> If all z/OS systems disappeared over night then everything would break
>> down, because so many critical systems are running on them.
>
> A somewhat scary thought, but hardly relevant unless you're trying to
> stir up enough public alarm to foment a general movement to replace these
> legacy systems with more modern ones.

It is relevant because the point is that most of the world
important data are processed by mainframes.

Some claim 80% of all financial data is stored on mainframe.

Sure they can be replaced. 10-20 years and 10-20 trillion dollars.

Arne


0
Reply arne6 (9487) 2/25/2011 2:45:01 PM

On 25-02-2011 00:28, Ken Wesson wrote:
> On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:
>> And it is a pretty good guess that the RandomAccessFile searching for CR
>> and LF will fail on i also then.
>
> How fortunate that i runs on fewer than one in ten thousand machines.
> Does Java even run on i?

Yes.

>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>
> Both contain ASCII as a subset -- if you take a pure-ASCII file and
> reencode it in either the result is the identical byte sequence.

Yes, but that does not change that they do not use ASCII. They
use ISO-8859-1 or UTF-8.

Arne
0
Reply arne6 (9487) 2/25/2011 2:46:30 PM

On Fri, 25 Feb 2011 06:38:32 +0100, Ken Wesson wrote:

> On Thu, 24 Feb 2011 20:39:31 -0500, Arne Vajhøj wrote:
> 
>> On 23-02-2011 22:23, Ken Wesson wrote:
>>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>> reading the previous (n-1) lines?
>>>>
>>>> In general no.
>>>>
>>>> All the RandomAccessFile tricks are based on assumptions about lines
>>>> being separated by something - they do not work with record formats
>>>> that contains a line length instead of a delimiter.
>>>
>>> "Record formats" are not relevant here,
>> 
>> They are
> 
> They are not, since files in record formats are not text files.
> 
>>>                                      nor was someone else's concern
>>> about compressed formats -- the OP clearly said "a text file", by
>>> which is generally understood flat ASCII with CR, LF, or CRLF as line
>>> delimiter.
>> 
>> non IT pros
> 
> Your personal opinions of others are not the topic of this newsgroup. Do
> you have anything Java-related to say?
> 
>> They know that there are other character sets and other record formats.
> 
> Other character sets mostly intersect in ASCII. Nearly all in any kind
> of widespread use intersect in using characters 10 and 13 as the
> potential- line-end characters. And "other record formats" are not
> relevant in a discussion of text files, as has been explained already.
>
Bad argument: a text file contains records. They are variable length 
records with a 'newline' encoding as the delimiter.

BTW, you can use C to handle iSeries text files through the usual gets() 
and puts() functions despite the iSeries holding text in what are 
effectively database rows. They have three fields per row - a line 
number, a fixed length text field and an 8 byte ID. The latter is 
equivalent to the way the last few columns of punched cards were often 
used. I don't know why an OS/400 text file would need an ID field, but 
its there. The reason that C's standard text handling works on these 
files is down to the standard library, which is written to inter-convert 
between C's internal null delimited string representations of lines and 
the external fixed field representation. 

Getting back on topic, I haven't used Java on an OS/400 but its available 
and will almost certainly work the same way and, in addition, will 
probably manage the mapping between EBCDIC and Unicode. It has to or it 
would break WORA.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/25/2011 2:58:27 PM

Tom Anderson wrote:
> 
>> There are also count prefix (and sometimes suffix) formats.
> 
> Which, although they may be used to store text, are not text files.

And with what do you support your claim for this definition of "text
file"?

I hope it's something more solid than KW's flailing appeals to
"notion" and the like, which are unsupported by contemporary or
historical uses of the term "text", in the computing disciplines or
more broadly. Have you something better to offer?

-- 
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
0
Reply Michael 2/25/2011 3:19:09 PM

Ken Wesson wrote:
> On Thu, 24 Feb 2011 21:25:05 +0000, Martin Gregorie wrote:
>>>
>> You know, you sound exactly like a character who surfaced in a Y2K
>> newsgroup back in 1998/99. He refused to believe that any computers
>> apart from PCs were in use at the time.
> 
> I doubt that. He may have correctly pointed out that the vast *majority* 
> of computers were PCs at the time. (Now, laptops and smartphones may have 
> the slight edge, or perhaps even server blades, now that typical servers 
> are racks full of small computers instead of single big computers.)

Embedded computers have a huge majority over all general-purpose
computers, by orders of magnitude, if we're counting CPUs. The line
between "smartphones" and other mobile phones is fuzzy; but among
computing devices that support at least some general-purpose
applications (as opposed to dedicated controllers), phones are far and
away in the majority, by number of CPUs.

In other words, wrong again, Ken.

-- 
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
0
Reply Michael 2/25/2011 3:26:27 PM

Ken Wesson wrote:
> On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:
> 
>> Ken Wesson wrote:
>>> Who asked you for your opinions of others here?
>> No one. I offer them out of sheer generosity.
> 
> Calling other people names is hardly what I would call "generosity"

No, you wouldn't.

> nor
> is polluting a newsgroup with off-topic traffic.

Unlike polluting a newsgroup with ignorance and dull repetition, eh?

> Your personal opinions of others are not the topic of this newsgroup.

Actually, they are. Check the charter.

> Do you have anything Java-related to say?

Yes.

>>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>>> AS/400, successor to the System/3x), blah blah blah
>>> You're one to talk about provincialism. Who the hell uses these ancient
>>> museum pieces any more?
>> Thousands of organizations, which is why they still enjoy healthy sales.
> 
> Ah, must be vendor lockin. Sucks to be them. Soon they'll be outcompeted 
> by newer, nimbler firms that use modern things like the free Unixes on 
> commodity hardware.

Yes, soon, no doubt. O glorious day, when we are ushered into the Age
of Wessonism! Free unicorns for all!

> Of course, they might last a while if they can keep 
> convincing the government to give them "bailouts" or other protectionist 
> help in the face of competitors and their own screwups.

Careful, Ken - you'll short out your keyboard with all that spittle.

> Still, your "thousands" of organizations are outweighed by the *hundreds* 
> of thousands that don't use such systems

No, they aren't. But do let us know when your cognitive abilities pass
beyond counting.

(On second thought - don't.)

>> And the majority of business transactions
> 
> have no bearing on this discussion, which has to do with the majority of 
> *computers*

No, it doesn't. You don't have the power to determine what the
discussion is about; it's about whatever the participants - all the
participants - decide to discuss.

I'm pleased to see that my prediction of your failure to learn was
right on the money.

-- 
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
0
Reply Michael 2/25/2011 3:44:10 PM

On 25-02-2011 00:36, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:45:36 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 08:06, Ken Wesson wrote:
>>> Obsolete systems do not interest me.
>>
>> Whether a solution works in general or not depends on whether it is
>> guaranteed to work on all platforms or not.
>
> Actually, "in general" tends to have some kind of implicit scope that is
> usually less than "all platforms". For instance, when discussing a Java
> solution, we can exclude platforms that Java doesn't run on.

True.

But Java do run on some of these platforms.

>> The RandomAccessFile and search for CR and LF does not.
>
> It probably runs on all platforms Java is normally used on. It certainly
> runs on 99.99% or more of the machines anyone is likely to run Java on,

If you are counting machines: yes.

If you are counting dollars: no.

> AND the remaining less than .01% are ones sufficiently oddball that their
> operators will *know* to expect common crossplatform software to often
> break on them. Typical C code using I/O will probably not work on such
> machines without heavy modification, even C code that compiles and works
> fine on every POSIX-compliant system and every Windows box and most other
> machines.

C code just like Java code works if the code according to the
standard has well defined behavior.

But this functionality is not guaranteed to work in C either.

fgetpos and fsetpos do not work on offsets but on an opague type
that can contain more than offset.

fseek and ftell work on offsets for binary files, but for text
files it is opaque.

POSIX/SUS then adds lseek, which will either work with
offsets or return an error.

>           Hell, these machines may not even be able to represent C source
> trees normally, requiring the compiler vendor to jump through hoops and
> requiring unusual tools and IDEs be used to hack C sources and not just
> the system text editor.

Text editors are by definition able to create text files and source
code is text files.

Try think logical.

>                         Hell, I wouldn't be surprised if there were no
> working C implementations on some of these systems

They do have C.

>                                                      -- and I'd be
> surprised if many, if any, of them ran Java at all, let alone had a fully
> compliant JavaSE 5/6 implementation.

I am not surprised that you would be surprised - you don't seem to know
much about systems.

z/OS, i and OpenVMS all has certified Java versions.

>> Whether it works on platforms that interest you are completely
>> irrelevant.
>
> On the contrary, whether software works on platforms that interest its
> developer and user base is 100% relevant and whether it works on
> platforms that *don't* interest its developer and user base is irrelevant.

No.

Not if the discussion is about general usage.

And it is bad Java programming to write code that only works
on some Java platforms even though the expectation is that the
program will only be used on platforms where it do work.

>>>                                 Since those days, the world has
>>> standardized on ASCII flat files for text files.
>>
>> Not really.
>>
>> Windows uses CP-1252, UTF-8 and UTF-16 Unix/Linux/VMS uses ISO-8859-1
>> and UTF-8
>
> All ASCII supersets. Which means the common denominator among all those
> is ... ta-da! ASCII. :)

That does not make them use ASCII.

>> IBM mainframe uses EBCDIC
>
> And hardly anyone uses IBM mainframe (sic). What was that figure again?
> 0.01% of all computers?

I think the number was 80% of financial data.
:-)

>> There are really very few systems today that uses just ASCII.
>
> But many that use ASCII.

Very few.

Most support ASCII because they use something that
is compatible with ASCII.

Arne
0
Reply UTF 2/25/2011 3:52:08 PM

On 24-02-2011 23:57, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:55:44 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 13:43, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>
>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>> so-called 'ANSI' (mostly CP-1252)
>>>
>>> Same difference.
>>
>> Completely different char set.
>
> Funny that something so "completely different" intersects with ASCII in
> the entirety of ASCII's range (0-127). It just specifies what 128-255
> mean instead of leaving those values undefined. Unicode specifies what
> 128-65535 mean and still intersects with ASCII on 0-127.

It is still a different char set.

Arne

0
Reply arne6 (9487) 2/25/2011 5:18:27 PM

On 24-02-2011 23:56, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:55:01 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 13:44, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 16:19:09 +0200, Jussi Piitulainen wrote:
>>>> I remember when we used a seven-bit character code to write my native
>>>> language. etc
>>>
>>> That's why we now actually use that 8th bit for something useful, if
>>> need be.
>>
>> Well - you are the one that has been claiming that everybody is using a
>> 7 bit standard (ASCII) today.
>
> Technically they are, since the various more recent standards they use
> contain ASCII as a subset and generally reduce to ASCII if you strip the
> high bit off (code pages) or the high byte and highest remaining bit (16-
> bit encodings). So they are using ASCII and sometimes some additional
> stuff that encloses and contains ASCII.

They are not using ASCII.

They are using something that is backwards compatible
with ASCII.

Arne
0
Reply arne6 (9487) 2/25/2011 5:19:18 PM

On 24-02-2011 23:54, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:52:37 -0500, Arne Vajhøj wrote:
>
>> On 24-02-2011 13:46, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 15:14:44 +0100, Lars Enderin wrote:
>>>
>>>> 2011-02-24 15:00, Ken Wesson skrev:
>>>>> I can't remember the last time I had to interoperate with any machine
>>>>> that had anything other than standard ASCII as the native format for
>>>>> text files. It's gotta be decades.
>>>>
>>>> ASCII character values are limited to the 0-127 range. That's an
>>>> outdated "standard".
>>>
>>> Well, these days we use the 8th bit for accented characters instead of
>>> just wasting it.
>>
>> Then it is not ASCII.
>
> It contains ASCII as a subset.
>
> So it is ASCII. And more.

The makes it not ASCII.

A Java 1.6 app is not a Java 1.2.2 app just because
some of the functionality were present in Java 1.2.2
as well.

>>>            Technically it's not your granddaddy's ASCII with that
>>> in use, but it's close enough for government work, and certainly close
>>> enough not to mess with using tests for CR/LF to detect line
>>> boundaries.
>>
>> The character set and the record format are independent of each other.
>
> Record formats are not relevant here, since text files do not have record
> formats;

Lines are a record format.

 >      they are raw sequences in some character set more or less by
> definition. Anything with additional structure over and above that is
> something other than a text file. Generically we call such things "binary
> files" though commonly binary files do *contain* text. But all contain
> additional structure that cannot be represented in, say, a
> java.lang.String without resort to some form of escaping or encoding. And
> that makes them not pure text, but text-and-some-other-stuff or some-
> other-stuff-that-happens-to-contain-text.

Not true.

Which you can easily verify by having a Java program read
such a file, those lines are read fine into a String.

Arne


0
Reply arne6 (9487) 2/25/2011 5:26:13 PM

On 24-02-2011 23:57, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:55:44 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 13:43, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>
>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>> so-called 'ANSI' (mostly CP-1252)
>>>
>>> Same difference.
>>
>> Completely different char set.
>
> Funny that something so "completely different" intersects with ASCII in
> the entirety of ASCII's range (0-127). It just specifies what 128-255
> mean instead of leaving those values undefined. Unicode specifies what
> 128-65535 mean and still intersects with ASCII on 0-127.

Occasionally backwards compatibility is a design goal.

If you knew about programming, then you would have
seen that before.

Arne

0
Reply UTF 2/25/2011 5:27:09 PM

On 25-02-2011 00:00, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:58:24 -0500, Arne Vajhøj wrote:
>
>> On 24-02-2011 19:12, Lew wrote:
>>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>
>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>> so-called
>>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>>> BOM).
>>>>
>>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document (turns
>>>> out to be CP-1252), Text-Document DOS format (turns out to be CP-850)
>>>> and Unicode. No
>>>> ASCII.
>>>
>>> Windows hasn't used ASCII in decades.
>>
>> I don't think it ever have.
>
> Funny then that bog-standard ASCII files seem to read and write just fine
> in Notepad on the occasions that I use Windows computers.

That just mean that it use something ASCII compatible - not that
it uses ASCII.

And you can easily verify that it indeed supports characters
not part of ASCII.

>> DOS used CP-437, CP-850 etc..
>>
>> 32/64 bit Windows uses CP-1252 (which is practically the same as
>> ISO-8859-1) and some UTF-16.
>
> All of those seem to be ASCII plus another up to 128 characters, or in
> the case of UTF-16, another up to 65408 characters.
>
> Saying that a 7-bit-clean file interpreted in one of those is not ASCII
> is like saying that humans are not mammals.

And?

Noone is saying that such a file is not ASCII.

We are saying that the system are not reading ASCII. It reads
a character set that is backwards compatible with ASCII.

Arne

PS: UTF-16 is *not* ASCII compatible.
0
Reply UTF 2/25/2011 5:30:23 PM

On 25-02-2011 00:19, Ken Wesson wrote:
> On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:
>> And the majority of business transactions
>
> have no bearing on this discussion, which has to do with the majority of
> *computers* and, secondarily, what will be encountered routinely by the
> majority of *IT workers*.

Well the topic was market share.

Market share is counted in dollars.

And since somebody is willing to pay a lot more for a mainframe
running an entire bank than for somebody to be able to read email,
then counting computers does not really reflect market share.

Arne
0
Reply UTF 2/25/2011 5:33:16 PM

On 25-02-2011 00:06, Ken Wesson wrote:
> On Thu, 24 Feb 2011 13:12:36 -0700, Jim Janney wrote:
>> Ken Wesson<kwesson@gmail.com>  writes:
>>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>>
>>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>>> AS/400, successor to the System/3x), blah blah blah
>>>
>>> You're one to talk about provincialism. Who the hell uses these ancient
>>> museum pieces any more?
>>
>> Um, that would be me, or rather my employer's customers.
>
> Your employer may happen to be using such legacy systems, but I very much
> doubt that very many people deal with them in an IT capacity. Far, *far*
> fewer than deal with Unix, Windows, and Mac boxes in such a capacity.
>
> How many end-users interact indirectly with these systems is of course
> irrelevant.

Not really - the high number of end users mean that the company
is willing to pay a lot of money for those systems, which impacts
the market share.

Arne
0
Reply UTF 2/25/2011 5:34:43 PM

On 24-02-2011 23:48, Ken Wesson wrote:
> On Thu, 24 Feb 2011 20:50:55 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 15:49, Tom Anderson wrote:
>>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>>> outdated "standard".
>>>>>>
>>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>>
>>>>> I thought so, but Ken seemed to need an explanation.
>>>>
>>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>>> understood the explanation, nor do I think he will understand further
>>>> clarification. I think it more likely that the harder anyone tries to
>>>> explain to him these points, the more dug in his heels will be.
>>>>
>>>> To do otherwise would necessarily require an admission that there's no
>>>> single "text file" format, and that even if there were, ASCII or any
>>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>>> such an admission would ever be produced.
>>>
>>> There is a single text file format: lines of characters in some
>>> encoding, terminated by an end-of-line sequence which is
>>> distinguishable from any other characters.
>>>
>>> It's merely the case that some current mainframes, and some obscure or
>>> historical systems, do not store text in text files!
>>
>> No.
>>
>> There are also count prefix (and sometimes suffix) formats.
>
> Those aren't text files. Text is, notionally, a string of characters,
> including perhaps spaces and line-end characters. A text file is
> therefore a file whose content is a string of characters, including
> perhaps spaces and line-end characters. Such a thing is, logically, the
> only native way to represent raw text. Anything more structured is
> obviously not a plain text file. It may be a text-containing file of some
> kind but it is not a text file.

A text file is something you read and write as lines of text.

Whether the system used LF delimiters or CR LF delimters or
a counted approach does not matter.

>> They have the advantage of begin able to actually have all possible
>> values in lines.
>
> That's nonsense. The only character a normal text file cannot have in
> lines is a line break, and in actual fact you cannot have a line break in
> the middle of a line *by definition*. Wherever there is a line break one
> line ENDS and another one BEGINS, *by definition*. If that weren't the
> case then it wouldn't be a line break!

No.

newline is a code 10 in many char sets.

It is perfectly valid as content in the middle of a line on
older MacOS systems (because they use another line delimiter
and on all systems using count prefixes (no line delimiter at
all).

> So there is no "advantage" here. What you are actually describing is a
> "list-of-strings" file, not a text file

A text file is a list of strings.

> Face it: those record-oriented file formats are not text files. They have
> additional structure that cannot be represented natively in a String,

Neither can delimited files

> therefore represent more than just a String, such as a collection of
> Strings, and therefore are not text files but something else -- archives
> of multiple text files bundled into single files.
>
> The main use for such a thing over plain ordinary text files that I can
> think of is storing a mailbox without resorting to hacks that behave
> oddly when lines in the bodies start with the word "from". And these days
> filesystems work fine with large directories full of tiny files, so
> there's less need for that sort of thing than there once was.
>
>> And the disadvantage of various hacks assuming all records use
>> delimiters does not work.
>
> Nobody is assuming records use delimiters. They are assuming text files
> are text files. The lines in text files use delimiters as an inherent
> property.

No.

That is an illusion that you seem to have.

>        If you have a text in a String, seeking backward from the end
> until a newline character (or the beginning of the String, whichever you
> hit first) will reliably find the start of the last line in the String.

No.

It will not work on systems that uses CR as line delimiter or systems
using count prefixed lines.

> The same is true of any disk file format that faithfully represents the
> String as a flat string of text rather, and in particular of the formats
> commonly used to store, e.g., C source files.

Wrong.

C source files are stored using count prefix line format son systems
that uses such.

Arne
0
Reply arne6 (9487) 2/25/2011 5:51:12 PM

On 24-02-2011 23:51, Ken Wesson wrote:
> On Thu, 24 Feb 2011 21:42:10 -0500, Eric Sosman wrote:
>
>>       I think it's amusing that he says "All the world's ASCII,"
>
> Who says "all the world's ASCII", Sosman? I can't recall anybody doing so
> in this group recently.
>
> It is true that almost all the world seems to use encodings that contain
> ASCII as a subset. That is not quite the same thing.

Somebody with the name of Ken Wesson wrote:

# Since those days, the world has standardized on ASCII flat files for 
text files.

# Windows text files are flat ASCII files (with CRLF line ends). Mac text
# files are flat ASCII files (with CR line ends). Unix text files are flat
# ASCII files (with LF line ends).

Arne

0
Reply UTF 2/25/2011 5:53:49 PM

On 25/02/2011 05:50, Ken Wesson allegedly wrote:
> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>
>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>> it's not (...) ASCII (...).
>
> Alleged by whom? That distorted quote is most certainly not what I wrote.

Alleged by my Usenet provider.

I was trying to extract the wisdom in your postings. Give me some credit
here. That quote is most certainly what you (pertinently) wrote, minus
the fluff.

And please, I beg of you sincerely and benevolently, stop acting like
such a loonie.

-- 
DF.
0
Reply Daniele 2/25/2011 6:07:45 PM

On 25/02/2011 15:36, Arne Vajh=F8j allegedly wrote:
>
> This is an IT group.
>
> Not a group for hairdressers or chefs.
>
> This mean that we use exact terms.

Dear Mr. Vajh=F8j ,

We'll see you in court.

Yours frivolously,

D. Futtorovic
Chief Representative of the Local (Pubic) Hairdresser's Union

0
Reply Daniele 2/25/2011 6:16:50 PM

On Feb 25, 1:07=A0pm, Daniele Futtorovic <da.futt.n...@laposte-dot-
net.invalid> wrote:
> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>
> > On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>
> >> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
> >>> it's not (...) ASCII (...).
>
> > Alleged by whom? That distorted quote is most certainly not what I wrot=
e.
>
> Alleged by my Usenet provider.
>
> I was trying to extract the wisdom in your postings. Give me some credit
> here. That quote is most certainly what you (pertinently) wrote, minus
> the fluff.
>
> And please, I beg of you sincerely and benevolently, stop acting like
> such a loonie.
>

"Your grand-daddy's ASCII" is exactly today's ASCII.  Ergo, "It's not
your grand-daddy's ASCII" is exactly "It's not ASCII".

--
Lew
0
Reply Lew 2/25/2011 6:38:52 PM

Arne Vajhøj <arne@vajhoej.dk> writes:

> On 25-02-2011 00:06, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 13:12:36 -0700, Jim Janney wrote:
>>> Ken Wesson<kwesson@gmail.com>  writes:
>>>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>>>
>>>>> On the IBM i machines (formerly i Series, formerly System i, formerly
>>>>> AS/400, successor to the System/3x), blah blah blah
>>>>
>>>> You're one to talk about provincialism. Who the hell uses these ancient
>>>> museum pieces any more?
>>>
>>> Um, that would be me, or rather my employer's customers.
>>
>> Your employer may happen to be using such legacy systems, but I very much
>> doubt that very many people deal with them in an IT capacity. Far, *far*
>> fewer than deal with Unix, Windows, and Mac boxes in such a capacity.
>>
>> How many end-users interact indirectly with these systems is of course
>> irrelevant.
>
> Not really - the high number of end users mean that the company
> is willing to pay a lot of money for those systems, which impacts
> the market share.

Indeed.  And let's not forget where a lot of Eclipse funding comes from.

-- 
Jim Janney
0
Reply Jim 2/25/2011 7:59:10 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-1386214742-1298667816=:2988
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Fri, 25 Feb 2011, Daniele Futtorovic wrote:

> On 25/02/2011 15:36, Arne Vajh�j allegedly wrote:
>
>> This is an IT group.
>> 
>> Not a group for hairdressers or chefs.
>> 
>> This mean that we use exact terms.
>
> Dear Mr. Vajh�j ,
>
> We'll see you in court.
>
> Yours frivolously,
>
> D. Futtorovic
> Chief Representative of the Local (Pubic) Hairdresser's Union

My members wish to join you in this suit.

t. Anderson
Chair of Delegates, cljp Local (Pubic) Chef's Union

-- 
Formal logical proofs, and therefore programs - formal logical proofs
that particular computations are possible, expressed in a formal system
called a programming language - are utterly meaningless. To write a
computer program you have to come to terms with this, to accept that
whatever you might want the program to mean, the machine will blindly
follow its meaningless rules and come to some meaningless conclusion. --
Dehnadi and Bornat
--232016332-1386214742-1298667816=:2988--
0
Reply Tom 2/25/2011 9:03:36 PM

On 25-02-2011 16:03, Tom Anderson wrote:
> On Fri, 25 Feb 2011, Daniele Futtorovic wrote:
>
>> On 25/02/2011 15:36, Arne Vajh�j allegedly wrote:
>>
>>> This is an IT group.
>>>
>>> Not a group for hairdressers or chefs.
>>>
>>> This mean that we use exact terms.
>>
>> Dear Mr. Vajh�j ,
>>
>> We'll see you in court.
>>
>> Yours frivolously,
>>
>> D. Futtorovic
>> Chief Representative of the Local (Pubic) Hairdresser's Union
>
> My members wish to join you in this suit.
>
> t. Anderson
> Chair of Delegates, cljp Local (Pubic) Chef's Union

I think I have a problem.

Microwave food and DIY hair cut for the rest of my life ...

:-)

Arne

0
Reply ISO 2/25/2011 9:35:32 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-955230153-1298670192=:2988
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Fri, 25 Feb 2011, Lew wrote:

> On Feb 25, 1:07�pm, Daniele Futtorovic <da.futt.n...@laposte-dot-
> net.invalid> wrote:
>> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>>
>>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>
>>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>>> it's not (...) ASCII (...).
>>>
>>> Alleged by whom? That distorted quote is most certainly not what I wrote.
>>
>> Alleged by my Usenet provider.
>>
>> I was trying to extract the wisdom in your postings. Give me some credit
>> here. That quote is most certainly what you (pertinently) wrote, minus
>> the fluff.
>>
>> And please, I beg of you sincerely and benevolently, stop acting like 
>> such a loonie.
>
> "Your grand-daddy's ASCII" is exactly today's ASCII.

My grandfather's ASCII would probably have been ASCII-1963, which is not 
today's ASCII.

Actually, my grandfather's ASCII would probably have been ITA2, IYSWIM.

tom

-- 
Formal logical proofs, and therefore programs - formal logical proofs
that particular computations are possible, expressed in a formal system
called a programming language - are utterly meaningless. To write a
computer program you have to come to terms with this, to accept that
whatever you might want the program to mean, the machine will blindly
follow its meaningless rules and come to some meaningless conclusion. --
Dehnadi and Bornat
--232016332-955230153-1298670192=:2988--
0
Reply Tom 2/25/2011 9:43:12 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-833583975-1298670312=:2988
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Fri, 25 Feb 2011, Arne Vajhøj wrote:

> On 25-02-2011 05:06, Tom Anderson wrote:
>> On Thu, 24 Feb 2011, Arne Vajhøj wrote:
>>> On 24-02-2011 17:11, Michael Wojcik wrote:
>>>> Ken Wesson wrote:
>>>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>>>> have been prominent 30 years ago, but not now.
>>>> 
>>>> Tell that to the many thousands of organizations that still use them.
>>>> 
>>>> And the majority of business transactions still runs on IBM mainframe
>>>> and midrange systems, and similar offerings from other companies.
>>>> 
>>>> IBM had just shy of $100B in sales last year. A good chunk of that was
>>>> from mainframes: mainframe sales were up 68% from 2009, to the best
>>>> level in six years.
>>> 
>>> The biggest chunk of IBM's revenue is services.
>>> 
>>> But they still sell a lot of big iron.
>> 
>> Do they actually sell them? What happened to the leasing model?
>
> Good question.
>
> I don't know if they sell or lease them out.
>
> IBM deliver boxes to customers and get a ton of money in return.

Well, that's them, FedEx, and Emperors Club then.

tom

-- 
Formal logical proofs, and therefore programs - formal logical proofs
that particular computations are possible, expressed in a formal system
called a programming language - are utterly meaningless. To write a
computer program you have to come to terms with this, to accept that
whatever you might want the program to mean, the machine will blindly
follow its meaningless rules and come to some meaningless conclusion. --
Dehnadi and Bornat
--232016332-833583975-1298670312=:2988--
0
Reply Tom 2/25/2011 9:45:12 PM

On 25/02/2011 19:38, Lew allegedly wrote:
> On Feb 25, 1:07 pm, Daniele Futtorovic<da.futt.n...@laposte-dot-
> net.invalid>  wrote:
>> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>>
>>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>
>>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>>> it's not (...) ASCII (...).
>>
>>> Alleged by whom? That distorted quote is most certainly not what I wrote.
>>
>> Alleged by my Usenet provider.
>>
>> I was trying to extract the wisdom in your postings. Give me some credit
>> here. That quote is most certainly what you (pertinently) wrote, minus
>> the fluff.
>>
>> And please, I beg of you sincerely and benevolently, stop acting like
>> such a loonie.
>>
>
> "Your grand-daddy's ASCII" is exactly today's ASCII.  Ergo, "It's not
> your grand-daddy's ASCII" is exactly "It's not ASCII".

Precisely. ASCII is not a /technique/, it's a _standard_.

It's /also/ a technique, of course, but only secondarily so: a technique 
that derives from a standard.

-- 
DF.
0
Reply Daniele 2/25/2011 9:47:32 PM

On 25/02/2011 22:35, Arne Vajh=F8j allegedly wrote:
> On 25-02-2011 16:03, Tom Anderson wrote:
>> On Fri, 25 Feb 2011, Daniele Futtorovic wrote:
>>
>>> On 25/02/2011 15:36, Arne Vajh=F8j allegedly wrote:
>>>
>>>> This is an IT group.
>>>>
>>>> Not a group for hairdressers or chefs.
>>>>
>>>> This mean that we use exact terms.
>>>
>>> Dear Mr. Vajh=F8j ,
>>>
>>> We'll see you in court.
>>>
>>> Yours frivolously,
>>>
>>> D. Futtorovic
>>> Chief Representative of the Local (Pubic) Hairdresser's Union
>>
>> My members wish to join you in this suit.
>>
>> t. Anderson
>> Chair of Delegates, cljp Local (Pubic) Chef's Union
>
> I think I have a problem.
>
> Microwave food and DIY hair cut for the rest of my life ...

Geek.

> :-)

0
Reply Daniele 2/25/2011 9:51:17 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-1318095270-1298673678=:2988
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Fri, 25 Feb 2011, Arne Vajhøj wrote:

> On 25-02-2011 00:38, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:39:31 -0500, Arne Vajhøj wrote:
>> 
>>> On 23-02-2011 22:23, Ken Wesson wrote:
>>>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>>> reading the previous (n-1) lines?
>>>>> 
>>>>> In general no.
>>>>> 
>>>>> All the RandomAccessFile tricks are based on assumptions about lines
>>>>> being separated by something - they do not work with record formats
>>>>> that contains a line length instead of a delimiter.
>>>> 
>>>> "Record formats" are not relevant here,
>>> 
>>> They are
>> 
>> They are not, since files in record formats are not text files.
>
> Given that:
>  data + LF
>  data + CR + LF
> are alo record formats then that is nonsense.

The thing about CR and LF is that lineprinters, and things which are 
pretending to be lineprinters, like terminal emulators and text editors, 
know how to deal with them; they write the next character lower down 
and/or at the start of the line. They aren't record separators, they're 
format effectors (ASCII does have record separators - an impressive range 
of them, in fact - but i don't known of anybody using them).

What happens if you send one of these alleged text files from a mainframe 
to a printer or a shell? Do the printers and shells in mainframe land 
handle those formats, or does there have to be a program that reads the 
format and then talks to the printer? Or does that all happen down in the 
OS? How does the lineprinter know to move the golf ball across the paper 
when it gets to the end of a record?

tom

-- 
Formal logical proofs, and therefore programs - formal logical proofs
that particular computations are possible, expressed in a formal system
called a programming language - are utterly meaningless. To write a
computer program you have to come to terms with this, to accept that
whatever you might want the program to mean, the machine will blindly
follow its meaningless rules and come to some meaningless conclusion. --
Dehnadi and Bornat
--232016332-1318095270-1298673678=:2988--
0
Reply Tom 2/25/2011 10:41:18 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-567768456-1298673970=:2988
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Fri, 25 Feb 2011, Arne Vajh�j wrote:

> On 25-02-2011 05:03, Tom Anderson wrote:
>> On Thu, 24 Feb 2011, Arne Vajh�j wrote:
>> 
>>> On 24-02-2011 15:49, Tom Anderson wrote:
>>>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>>> 
>>>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>>>> outdated "standard".
>>>>>>> 
>>>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>>> 
>>>>>> I thought so, but Ken seemed to need an explanation.
>>>>> 
>>>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>>>> understood the explanation, nor do I think he will understand further
>>>>> clarification. I think it more likely that the harder anyone tries to
>>>>> explain to him these points, the more dug in his heels will be.
>>>>> 
>>>>> To do otherwise would necessarily require an admission that there's no
>>>>> single "text file" format, and that even if there were, ASCII or any
>>>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>>>> such an admission would ever be produced.
>>>> 
>>>> There is a single text file format: lines of characters in some
>>>> encoding, terminated by an end-of-line sequence which is distinguishable
>>>> from any other characters.
>>>> 
>>>> It's merely the case that some current mainframes, and some obscure or
>>>> historical systems, do not store text in text files!
>>> 
>>> No.
>> 
>> Yes.
>> 
>>> There are also count prefix (and sometimes suffix) formats.
>> 
>> Which, although they may be used to store text, are not text files.
>
> Of course they are text files.
>
> If I edit Foobar.java in a text editor and write a Java program and 
> saves it, then why should it be less of a text file, because the record 
> format used on that system is not delimited?

If i edit Foobar.java in Google Docs and write a Java program and save it, 
then why should it be less of a text file, because it's stored in some 
mysterious cloud database?

Or how about:

$ dbm Foobar.java init
$ dbm Foobar.java set 1 "public class Foobar"
$ dbm Foobar.java set 2 "{"
$ dbm Foobar.java set 3 "}"

?

That a file has text somewhere in it does not make it a text file.

tom

-- 
The girlfriend of my friend is my enemy. -- old Arabic proverb
--232016332-567768456-1298673970=:2988--
0
Reply Tom 2/25/2011 10:46:10 PM

On Fri, 25 Feb 2011, Michael Wojcik wrote:

> Tom Anderson wrote:
>>
>>> There are also count prefix (and sometimes suffix) formats.
>>
>> Which, although they may be used to store text, are not text files.
>
> And with what do you support your claim for this definition of "text
> file"?
>
> I hope it's something more solid than KW's flailing appeals to "notion" 
> and the like, which are unsupported by contemporary or historical uses 
> of the term "text", in the computing disciplines or more broadly. Have 
> you something better to offer?

Merely my observations of the usage of the term by people.

tom

-- 
The girlfriend of my friend is my enemy. -- old Arabic proverb
0
Reply Tom 2/25/2011 10:51:50 PM

On 25-02-2011 17:46, Tom Anderson wrote:
> On Fri, 25 Feb 2011, Arne Vajh�j wrote:
>
>> On 25-02-2011 05:03, Tom Anderson wrote:
>>> On Thu, 24 Feb 2011, Arne Vajh�j wrote:
>>>
>>>> On 24-02-2011 15:49, Tom Anderson wrote:
>>>>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>>>>
>>>>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>>>>> outdated "standard".
>>>>>>>>
>>>>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>>>>
>>>>>>> I thought so, but Ken seemed to need an explanation.
>>>>>>
>>>>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>>>>> understood the explanation, nor do I think he will understand further
>>>>>> clarification. I think it more likely that the harder anyone tries to
>>>>>> explain to him these points, the more dug in his heels will be.
>>>>>>
>>>>>> To do otherwise would necessarily require an admission that
>>>>>> there's no
>>>>>> single "text file" format, and that even if there were, ASCII or any
>>>>>> of the single-byte derivatives thereof ain't it. I don't see any way
>>>>>> such an admission would ever be produced.
>>>>>
>>>>> There is a single text file format: lines of characters in some
>>>>> encoding, terminated by an end-of-line sequence which is
>>>>> distinguishable
>>>>> from any other characters.
>>>>>
>>>>> It's merely the case that some current mainframes, and some obscure or
>>>>> historical systems, do not store text in text files!
>>>>
>>>> No.
>>>
>>> Yes.
>>>
>>>> There are also count prefix (and sometimes suffix) formats.
>>>
>>> Which, although they may be used to store text, are not text files.
>>
>> Of course they are text files.
>>
>> If I edit Foobar.java in a text editor and write a Java program and
>> saves it, then why should it be less of a text file, because the
>> record format used on that system is not delimited?
>
> If i edit Foobar.java in Google Docs and write a Java program and save
> it, then why should it be less of a text file, because it's stored in
> some mysterious cloud database?
>
> Or how about:
>
> $ dbm Foobar.java init
> $ dbm Foobar.java set 1 "public class Foobar"
> $ dbm Foobar.java set 2 "{"
> $ dbm Foobar.java set 3 "}"
>
> ?
>
> That a file has text somewhere in it does not make it a text file.

Well - the fact that:
- the Java compiler reads Java source in that format
- the C compiler reads C source in that format
- Java BufferedReader/FileReader readLine can read the files
- C fopen with t and fgets can read the files

seems to distinguish it a lot from what you mention.

Arne

can read it would be a significant
0
Reply ISO 2/25/2011 11:08:11 PM

On 25-02-2011 17:41, Tom Anderson wrote:
> On Fri, 25 Feb 2011, Arne Vajhøj wrote:
>
>> On 25-02-2011 00:38, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 20:39:31 -0500, Arne Vajhøj wrote:
>>>
>>>> On 23-02-2011 22:23, Ken Wesson wrote:
>>>>> On Wed, 23 Feb 2011 21:21:42 -0500, Arne Vajhøj wrote:
>>>>>> On 23-02-2011 10:59, Robin Wenger wrote:
>>>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>>>> reading the previous (n-1) lines?
>>>>>>
>>>>>> In general no.
>>>>>>
>>>>>> All the RandomAccessFile tricks are based on assumptions about lines
>>>>>> being separated by something - they do not work with record formats
>>>>>> that contains a line length instead of a delimiter.
>>>>>
>>>>> "Record formats" are not relevant here,
>>>>
>>>> They are
>>>
>>> They are not, since files in record formats are not text files.
>>
>> Given that:
>> data + LF
>> data + CR + LF
>> are alo record formats then that is nonsense.
>
> The thing about CR and LF is that lineprinters, and things which are
> pretending to be lineprinters, like terminal emulators and text editors,
> know how to deal with them; they write the next character lower down
> and/or at the start of the line. They aren't record separators, they're
> format effectors (ASCII does have record separators - an impressive
> range of them, in fact - but i don't known of anybody using them).
>
> What happens if you send one of these alleged text files from a
> mainframe to a printer or a shell? Do the printers and shells in
> mainframe land handle those formats, or does there have to be a program
> that reads the format and then talks to the printer? Or does that all
> happen down in the OS? How does the lineprinter know to move the golf
> ball across the paper when it gets to the end of a record?

It is mostly transparent.

The program that needs to read the file reads the file
just as it would on any other platform.

Java readLine / C fgets / Fortran READ or whatever returns a line.

The underlying system calls handle the record format used for
physical storage of the line.

Arne


0
Reply UTF 2/25/2011 11:12:35 PM

On 25-02-2011 16:43, Tom Anderson wrote:
> On Fri, 25 Feb 2011, Lew wrote:
>
>> On Feb 25, 1:07 pm, Daniele Futtorovic <da.futt.n...@laposte-dot-
>> net.invalid> wrote:
>>> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>>>
>>>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>>
>>>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>>>> it's not (...) ASCII (...).
>>>>
>>>> Alleged by whom? That distorted quote is most certainly not what I
>>>> wrote.
>>>
>>> Alleged by my Usenet provider.
>>>
>>> I was trying to extract the wisdom in your postings. Give me some credit
>>> here. That quote is most certainly what you (pertinently) wrote, minus
>>> the fluff.
>>>
>>> And please, I beg of you sincerely and benevolently, stop acting like
>>> such a loonie.
>>
>> "Your grand-daddy's ASCII" is exactly today's ASCII.
>
> My grandfather's ASCII would probably have been ASCII-1963, which is not
> today's ASCII.

Are there that big differences between ASCII 63, 67 and 86?

Arne
0
Reply ISO 2/25/2011 11:15:39 PM

On 11-02-25 06:46 PM, Tom Anderson wrote:
[ SNIP ]
>
> That a file has text somewhere in it does not make it a text file.
>
> tom
>
Without getting completely anal over it all, I'm reasonably content with 
what Wikipedia has to say about text files:

1. Structured as sequence of lines, so either fixed length lines or EOL 
characters, and often an EOF marker;

2. No additional metadata required to interpret a text file; they stand 
by themselves. This I think is more important than point 1;

As far as I am concerned, point 2 is central. To me "rich text" formats 
are an oxymoron.

AHS

-- 
We must recognize the chief characteristic of the modern era - a 
permanent state of what I call violent peace.
-- James D. Watkins
0
Reply Arved 2/25/2011 11:50:28 PM

Ken Wesson wrote :
> Record formats are not relevant here, since text files do not have record
> formats; they are raw sequences in some character set more or less by
> definition. Anything with additional structure over and above that is
> something other than a text file. Generically we call such things "binary
> files" though commonly binary files do *contain* text. But all contain
> additional structure that cannot be represented in, say, a
> java.lang.String without resort to some form of escaping or encoding. And
> that makes them not pure text, but text-and-some-other-stuff or some-
> other-stuff-that-happens-to-contain-text.

Text files which are pure text yet are structured in that they either 
have:
- comma delimited
- fixed length field columns

java.lang.String quite happily holds this information in that a single 
line is a "record" and can be manipulated as pure text.

Breaking out "fields" is fairly trivial using standard character 
routines. Building them up and writing them out is also a purely text 
operation.

I will grant you that comma delimited files have "coding" in that the 
comma has special meaning, but it still falls within the ASCII set.

I have also used HL7 (a medical information file format) which is also 
pure text yet has many information delimiters and needs to be 
extensively parsed to extract the information. If I remember right, a 
line feed is one of the delimiters and a carriage return is a record 
delimiter.

Let's see, CSS, *ML, log files, property files, all are pure ASCII text 
which nevertheless hold "field" information.

Emails (newsgroup postings) are also 7-bit ASCII and are pure text yet 
the headers hold record information. To get binary information you need 
to have an encoding standard, which is itself 7-bit.

I used to say that you could not get a virus from an email just by 
reading it because 7-bits could not hold a program. Then some innocent 
in Microsoft decided to implement auto-running scripts...

-- 
Wojtek :-)


0
Reply Wojtek 2/26/2011 1:40:46 AM

On 26/02/2011 00:50, Arved Sandstrom allegedly wrote:
> Without getting completely anal over it all, I'm reasonably content with
> what Wikipedia has to say about text files:
>
> 1. Structured as sequence of lines, so either fixed length lines or EOL
> characters, and often an EOF marker;
>
> 2. No additional metadata required to interpret a text file; they stand
> by themselves. This I think is more important than point 1;
>
> As far as I am concerned, point 2 is central. To me "rich text" formats
> are an oxymoron.

Good one. So how about we defined "text file" as a sequence of binary
data that:
1. Contains no metadata;
2. The whole content of which can be transformed into character data
using a single character encoding scheme;
3. Adheres to some convention as to what constitutes a or separates two
lines.

This raises, as far as I can see, two questions:
a) How do we define "character"?
b) What about BOMs? Aren't they metadata? Wouldn't they violate
constraint number two? Does this mean we must exclude from being
candidates for text file encodings such character encodings as provision
for variable endianness?

-- 
DF.
0
Reply Daniele 2/26/2011 2:11:00 AM

On 25-02-2011 21:11, Daniele Futtorovic wrote:
> On 26/02/2011 00:50, Arved Sandstrom allegedly wrote:
>> Without getting completely anal over it all, I'm reasonably content with
>> what Wikipedia has to say about text files:
>>
>> 1. Structured as sequence of lines, so either fixed length lines or EOL
>> characters, and often an EOF marker;
>>
>> 2. No additional metadata required to interpret a text file; they stand
>> by themselves. This I think is more important than point 1;
>>
>> As far as I am concerned, point 2 is central. To me "rich text" formats
>> are an oxymoron.
>
> Good one. So how about we defined "text file" as a sequence of binary
> data that:
> 1. Contains no metadata;
> 2. The whole content of which can be transformed into character data
> using a single character encoding scheme;
> 3. Adheres to some convention as to what constitutes a or separates two
> lines.
>
> This raises, as far as I can see, two questions:
> a) How do we define "character"?
> b) What about BOMs? Aren't they metadata? Wouldn't they violate
> constraint number two? Does this mean we must exclude from being
> candidates for text file encodings such character encodings as provision
> for variable endianness?

re a)

Unicode codepoints or Java char.

re b)

The BOM are not meta data about the way to read the bytes.

It is metadata for the app on how to convert the bytes to
chars (codepoints).

The file content does not say ISO-8859-1 or UTF-8.

The app logic/programming language/IO libraries provide
that.

The BOM is part of line content. Seen from the file/record
format perspective it is data not meta data.

It is first at the higher level that it becomes metadata.

Arne

0
Reply ISO 2/26/2011 2:39:20 AM

On 2/26/11 10:11 AM, Daniele Futtorovic wrote:
> Good one. So how about we defined "text file" as a sequence of binary
> data that:
> 1. Contains no metadata;
> 2. The whole content of which can be transformed into character data
> using a single character encoding scheme;
> 3. Adheres to some convention as to what constitutes a or separates two
> lines.

I don't see the need for the third requirement.  While many "text file" 
formats do define such a convention, why should we expect _every_ "text 
file" format to do so?  Programs can consume text without having 
line-breaks, even going so far as to insert line breaks where it feels 
they are needed, so why should a "text file" format necessarily include 
that information?

> This raises, as far as I can see, two questions:
> a) How do we define "character"?

IMHO, a "character" is simply the smallest unit of data that a 
"character encoding" encodes.  Granted, this is a bit self-referential, 
but it seems to me that's what "character encodings" are all about.  If 
the "encoding" defines a unit of data that it calls a "character", then 
by definition that _is_ a character.

We don't necessarily need any other definition.  An encoding can make a 
"character" be whatever that encoding needs it to be.

If you're concerned someone might come along with a file format in which 
they define "character" to include some large, structured binary data 
embedded in the file, then I suppose you could impose such limitations 
as requiring a character to be:

   • either the smallest recognizable unit of a human language (i.e. in 
an alphabet, characters are made up of smaller components, such as 
curved or straight lines, but those alone are not recognizable as a part 
of human language), or a similarly-sized unit of data that controls the 
presentation or interpretation of such recognizable units of human language.

Or something like that.

> b) What about BOMs? Aren't they metadata? Wouldn't they violate
> constraint number two? Does this mean we must exclude from being
> candidates for text file encodings such character encodings as provision
> for variable endianness?

Impossible to say until you define "metadata".  That said, if you are 
going to be concerned about BOMs, there are lots of other characters in 
various encodings that could be considered metadata as well.  Even in 
Unicode alone, you also have things such as "zero-width space", 
"left-to-right-" and "right-to-left-" marks/embedding/override, 
"invisible operators", etc.  And ASCII (and other encodings that overlap 
ASCII) has a couple dozen or so control characters in the first 32 code 
points.

Is a file that contains any of these control characters not to be 
considered a "text file", even if every single unit of data in the file 
is still a valid character in the given encoding?

Personally, I would be happy with all manner of non-character data in a 
"text file", as long as that data does not itself require the use of 
some data that can't be represented in the character encoding used by 
the file.  Simply define "metadata" as above (i.e. "anything that's not 
itself a character in the encoding used by the file"), and then BOMs and 
other kinds of control characters are fine.

To me, the real question is: if we do define "text file" in a rigorous 
way, how in the world does that help any of us write better Java 
programs?  What's the point?

Pete
0
Reply Peter 2/26/2011 3:15:42 AM

On 2/26/11 11:15 AM, Peter Duniho wrote:
> [...]
> Personally, I would be happy with all manner of non-character data in a
                                                   ^^^^^^^^^^^^^^^^^^
> "text file", as long as that data does not itself require the use of
> some data that can't be represented in the character encoding used by
> the file. Simply define "metadata" as above (i.e. "anything that's not
> itself a character in the encoding used by the file"), and then BOMs and
> other kinds of control characters are fine.

See the trouble you can get into?  Even in the process of trying to 
discuss what "character" is and is not, I accidentally used the word 
"character" in the colloquial sense, rather than sticking to a rigorous, 
well-defined usage.
0
Reply Peter 2/26/2011 3:20:01 AM

Wojtek wrote:
> Emails (newsgroup postings) are also 7-bit ASCII and are pure text yet the

That is quite the outré claim, and completely false.  This newsgroup posting 
is certainly not in 7-bit ASCII.  It has a soupçon of 8-bit characters.  I 
don't think I've ever seen a 7-bit ASCII post from Arne Vajhøj, but he sure 
has a lot of posts in this newsgroup.

As for emails, I embed JPGs in email all the time.  Is that 7-bit ASCII?  Or 
even pure text?

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/26/2011 3:52:29 AM

On Fri, 25 Feb 2011 12:51:12 -0500, Arne Vajhøj wrote:

> On 24-02-2011 23:48, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:50:55 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 15:49, Tom Anderson wrote:
>>>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>>>> outdated "standard".
>>>>>>>
>>>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>>>
>>>>>> I thought so, but Ken seemed to need an explanation.
>>>>>
>>>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>>>> understood the explanation, nor do I think he will understand
>>>>> further clarification. I think it more likely that the harder anyone
>>>>> tries to explain to him these points, the more dug in his heels will
>>>>> be.
>>>>>
>>>>> To do otherwise would necessarily require an admission that there's
>>>>> no single "text file" format, and that even if there were, ASCII or
>>>>> any of the single-byte derivatives thereof ain't it. I don't see any
>>>>> way such an admission would ever be produced.
>>>>
>>>> There is a single text file format: lines of characters in some
>>>> encoding, terminated by an end-of-line sequence which is
>>>> distinguishable from any other characters.
>>>>
>>>> It's merely the case that some current mainframes, and some obscure
>>>> or historical systems, do not store text in text files!
>>>
>>> No.
>>>
>>> There are also count prefix (and sometimes suffix) formats.
>>
>> Those aren't text files. Text is, notionally, a string of characters,
>> including perhaps spaces and line-end characters. A text file is
>> therefore a file whose content is a string of characters, including
>> perhaps spaces and line-end characters. Such a thing is, logically, the
>> only native way to represent raw text. Anything more structured is
>> obviously not a plain text file. It may be a text-containing file of
>> some kind but it is not a text file.
> 
> A text file is something you read and write as lines of text.
> 
> Whether the system used LF delimiters or CR LF delimters or a counted
> approach does not matter.

Except, of course, that you don't read "a counted approach" as lines of 
text; you read it as binary integers mixed with text strings. It is not, 
physically, a text file.

Both Tom Anderson and I have explained this to you repeatedly already.

>> That's nonsense. The only character a normal text file cannot have in
>> lines is a line break, and in actual fact you cannot have a line break
>> in the middle of a line *by definition*. Wherever there is a line break
>> one line ENDS and another one BEGINS, *by definition*. If that weren't
>> the case then it wouldn't be a line break!
> 
> No.

Yes. A line break in the middle of a line is utter nonsense, the logical 
equivalent of an odd even number or an endpoint of a circle or a corner 
of a disc.


> [ASCII 10] is perfectly valid as content in the middle of a line on
> older MacOS systems

Sophistry. Those just use ASCII 13 to mean the same thing.

>> So there is no "advantage" here. What you are actually describing is a
>> "list-of-strings" file, not a text file
> 
> A text file is a list of strings.

No, a text file is a single string. A list of strings is a more complex 
beast, because it has an additional out-of-band division that is not a 
part of the text. Line breaks are part of a text; your weird record-
delimiters are not.

>> Face it: those record-oriented file formats are not text files. They
>> have additional structure that cannot be represented natively in a
>> String,
> 
> Neither can delimited files

Nonsense! Delimited files, if by that you mean normal text files with LF, 
CR, or CR/LF line ends, can be and frequently are represented natively in 
a String. Only your weird record formats can't be, because the record 
boundaries can't be expressed in Stringese. You cannot represent them in 
a String as length numbers somewhere without turning it from a raw text 
String into rich text of some kind, and you can't use any character value 
to delimit them likewise -- that value might occur in the text itself and 
any such occurrences have to be escaped, and your escape character then 
also has to be escaped, etc.

I've explained all of this to you before, Arne. I don't know why it 
didn't penetrate. (Though I have my suspicions.)

Basically, your claim, Arne, boils down to "you can convert between an 
ArrayList<String> and a single String without the latter having to 
contain special formatting of some kind", and this is *easily* disproven.

>>> And the disadvantage of various hacks assuming all records use
>>> delimiters does not work.
>>
>> Nobody is assuming records use delimiters. They are assuming text files
>> are text files. The lines in text files use delimiters as an inherent
>> property.
> 
> No.

Yes.

> an illusion that you seem to have

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>>        If you have a text in a String, seeking backward from the end
>> until a newline character (or the beginning of the String, whichever
>> you hit first) will reliably find the start of the last line in the
>> String.
> 
> No.

Yes.

> It will not work on systems that uses CR as line delimiter

Sure it will, if you test for both CR and LF characters.

i = 1 + max(str.lastIndexOf((char)10),str.lastIndexOf((char)13)); ought 
to do ya to find the first character of the last line of a String. Even 
if it's single line -- both lastIndexOfs will be -1 so i will be 0 
exactly as it should be since in that instance the first character of the 
entire String is the first character of the last (in fact the only) line.

> or systems using count prefixed lines.

If a Java String behaves differently with regard to seeking for /u000A 
and /u000D characters on those systems than on others, then the Java 
implementaion on those systems are not compliant with the JLS.

>> The same is true of any disk file format that faithfully represents the
>> String as a flat string of text rather, and in particular of the
>> formats commonly used to store, e.g., C source files.
> 
> Wrong.

No, not wrong. See above.

> C source files are stored using count prefix line format son systems
> that uses such.

I said "the formats commonly used to store, e.g., C source files". No 
"count prefix line format" is *commonly* used to store C source files -- 
99.99% or more of C source files residing on hard disks in this world are 
undoubtedly in fact LF-delimited, and most of the rest CRLF delimited 
(Windows wackiness strikes again).
0
Reply Ken 2/26/2011 9:43:51 AM

On Fri, 25 Feb 2011 10:19:09 -0500, Michael Wojcik wrote:

> KW's flailing

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> unsupported by contemporary or historical uses of the term "text"

Not relevant. It is contemporary use of the term "text FILE" that 
concerns me.
0
Reply Ken 2/26/2011 9:44:52 AM

On Fri, 25 Feb 2011 12:53:49 -0500, Arne Vajhøj wrote:

> On 24-02-2011 23:51, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:42:10 -0500, Eric Sosman wrote:
>>
>>>       I think it's amusing that he says "All the world's ASCII,"
>>
>> Who says "all the world's ASCII", Sosman? I can't recall anybody doing
>> so in this group recently.
>>
>> It is true that almost all the world seems to use encodings that
>> contain ASCII as a subset. That is not quite the same thing.
> 
> Somebody with the name of Ken Wesson wrote:
 
[quotation]

> Arne

I don't see an argument of any kind in your post. Forget to include one?
0
Reply Ken 2/26/2011 9:45:50 AM

On Fri, 25 Feb 2011 19:07:45 +0100, Daniele Futtorovic wrote:

> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>
>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>> it's not (...) ASCII (...).
>>
>> Alleged by whom? That distorted quote is most certainly not what I
>> wrote.
> 
> Alleged by my Usenet provider.

No, I'm sure your Usenet provider correctly indicated that I said "it's 
not your grandfather's ASCII" and you then altered the quotation.

> minus the fluff

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> such a loonie

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/26/2011 9:47:11 AM

On Fri, 25 Feb 2011 12:26:13 -0500, Arne Vajhøj wrote:

> On 24-02-2011 23:54, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:52:37 -0500, Arne Vajhøj wrote:
>>
>>> On 24-02-2011 13:46, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 15:14:44 +0100, Lars Enderin wrote:
>>>>
>>>>> 2011-02-24 15:00, Ken Wesson skrev:
>>>>>> I can't remember the last time I had to interoperate with any
>>>>>> machine that had anything other than standard ASCII as the native
>>>>>> format for text files. It's gotta be decades.
>>>>>
>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>> outdated "standard".
>>>>
>>>> Well, these days we use the 8th bit for accented characters instead
>>>> of just wasting it.
>>>
>>> Then it is not ASCII.
>>
>> It contains ASCII as a subset.
>>
>> So it is ASCII. And more.
> 
> The makes it not ASCII.

By that reasoning, anyone who has undergone hip replacement surgery, 
gotten a pacemaker, or even had so much as a dental filling is not human.

>> Record formats are not relevant here, since text files do not have
>> record formats;
> 
> Lines are a record format.

No, lines are data while record boundaries are metadata. Line breaks are 
a part of a text; record boundaries (absent binhexing a record-oriented 
binary file or similar situations) are not a part of a text.

>  >      they are raw sequences in some character set more or less by
>> definition. Anything with additional structure over and above that is
>> something other than a text file. Generically we call such things
>> "binary files" though commonly binary files do *contain* text. But all
>> contain additional structure that cannot be represented in, say, a
>> java.lang.String without resort to some form of escaping or encoding.
>> And that makes them not pure text, but text-and-some-other-stuff or
>> some- other-stuff-that-happens-to-contain-text.
> 
> Not true.

It was indeed true. Once again you seem to be claiming that you know of a 
way to represent an ArrayList<String> losslessly in a single String 
without any encoding or other formatting that would make the latter no 
longer plain text.

I am quite sure that there is none, and I have explained my reasoning at 
least twice. One of those times is even quoted with the bottom of that 
quotation being just a few lines above this line.

> Which you can easily verify by having a Java program read such a file,
> those lines are read fine into a String.

Lossily, if there's no extra formatting encoded into that String. If the 
file is

foo<newline character>bar<record boundary>baz

(which is seven characters in one record and three in a second) and is 
read into a String as foo\nbar\nbaz (11 characters) and written back out, 
it probably ends up as either

foo<newline character>bar<newline character>baz

or

foo<record boundary>bar<record boundary>baz

and it *definitely* loses the information about which newlines were 
physical newline characters in the file and which were actually record 
boundaries instead. For a colorful example, if I'd used record boundaries 
vs. physical newline characters as a hidden extra layer of binary code, 
interpreting one kind of line end as 0 and the other as 1, and used this 
to steganographically hide a sekrit message in a seemingly ordinary 
string, so anyone reading the file normally just saw a copy of "Hamlet" 
but, say, appending a 0 bit to a bit string for every literal newline and 
a 1 for every record boundary encountered and then reading that bit 
string as an ASCII file itself produced "the attack will commence at 
midnight", and my field commander received the file and ran it through a 
Java program that read it into a String and wrote it back out, he'd get 
back "Hamlet" and nothing else. The sekrit message would be lost, and 
perhaps with it the war. Attempting to decode it would probably result in 
a bit string of either all zeros or all ones and certainly wouldn't 
recreate the message as it was intended.

Ergo, the transformation is not lossless and the file contains additional 
information (specifically, "the attack will commence at midnight") over 
and above its "text file content".

Ergo, the file cannot be a text file, because treating it as one results 
in information loss!

That's pretty much the *definition* of a file type error: treating the 
file as that type results in lost or misinterpreted information. In this 
case treating one of your "record-oriented text files" as an actual text 
file has such a result, and it consequently must not be a text file, no 
matter what you keep wanting to call it, Arne.
0
Reply Ken 2/26/2011 10:00:31 AM

On Fri, 25 Feb 2011 17:40:46 -0800, Wojtek wrote:

> Ken Wesson wrote :
>> Record formats are not relevant here, since text files do not have
>> record formats; they are raw sequences in some character set more or
>> less by definition. Anything with additional structure over and above
>> that is something other than a text file. Generically we call such
>> things "binary files" though commonly binary files do *contain* text.
>> But all contain additional structure that cannot be represented in,
>> say, a java.lang.String without resort to some form of escaping or
>> encoding. And that makes them not pure text, but
>> text-and-some-other-stuff or some-
>> other-stuff-that-happens-to-contain-text.
> 
> Text files which are pure text yet are structured in that they either
> have:
> - comma delimited
> - fixed length field columns

The ability to use text files to represent other data (e.g. CSV files) 
does not change things any. The structure is preserved if it is processed 
by a tool that treats the file as plain text. This is not the case with 
Arne's "record-oriented text files": suppose I use newline characters and 
record boundaries to record secret information in the choice of which one 
of those characters is used to mark each line end. Then I process the 
file through a tool that treats the file as plain text (in particular, 
cannot "see" which line ends are physical newline characters and which 
are record boundaries, seeing both of those as the same thing). Then the 
hidden information will be stripped in the output.

CSV files cannot lose information when processed (nondestructively) as 
plain text files. (Say, simply read as text into a java.lang.String and 
then written out again.)

Arne's weird files, by contrast, *can* lose information when processed in 
such a way.

That makes them not text files.

> I have also used HL7 (a medical information file format) which is also
> pure text yet has many information delimiters and needs to be
> extensively parsed to extract the information. If I remember right, a
> line feed is one of the delimiters and a carriage return is a record
> delimiter.

Can it be read into a java.lang.String and written back to disk without 
losing information?
0
Reply kwesson (107) 2/26/2011 10:07:17 AM

On Fri, 25 Feb 2011 22:52:29 -0500, Lew wrote:

> Wojtek wrote:
>> Emails (newsgroup postings) are also 7-bit ASCII and are pure text yet
>> the
> 
> That is quite the outré claim, and completely false.  This newsgroup
> posting is certainly not in 7-bit ASCII.  It has a soupçon of 8-bit
> characters.  I don't think I've ever seen a 7-bit ASCII post from Arne
> Vajhøj, but he sure has a lot of posts in this newsgroup.

Yeah -- too many.
0
Reply kwesson (107) 2/26/2011 10:07:46 AM

On Fri, 25 Feb 2011 12:19:18 -0500, Arne Vajhøj wrote:

> On 24-02-2011 23:56, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:55:01 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 13:44, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 16:19:09 +0200, Jussi Piitulainen wrote:
>>>>> I remember when we used a seven-bit character code to write my
>>>>> native language. etc
>>>>
>>>> That's why we now actually use that 8th bit for something useful, if
>>>> need be.
>>>
>>> Well - you are the one that has been claiming that everybody is using
>>> a 7 bit standard (ASCII) today.
>>
>> Technically they are, since the various more recent standards they use
>> contain ASCII as a subset and generally reduce to ASCII if you strip
>> the high bit off (code pages) or the high byte and highest remaining
>> bit (16- bit encodings). So they are using ASCII and sometimes some
>> additional stuff that encloses and contains ASCII.
> 
> They are not using ASCII.

So, using ASCII + something else means you're not using ASCII?

Interesting insight into what it's like to live in Arne's World.

But perhaps you'd like to cut short your vacation there and join us back 
here on Earth.
0
Reply kwesson (107) 2/26/2011 10:09:05 AM

On Fri, 25 Feb 2011 12:18:27 -0500, Arne Vajhøj wrote:

> On 24-02-2011 23:57, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:55:44 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 13:43, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>
>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>> so-called 'ANSI' (mostly CP-1252)
>>>>
>>>> Same difference.
>>>
>>> Completely different char set.
>>
>> Funny that something so "completely different" intersects with ASCII in
>> the entirety of ASCII's range (0-127). It just specifies what 128-255
>> mean instead of leaving those values undefined. Unicode specifies what
>> 128-65535 mean and still intersects with ASCII on 0-127.
> 
> It is still a different char set.

It's a "different char set" in the manner that adding a suit of clothes 
to a naked person results in a "different person", Arne.
0
Reply Ken 2/26/2011 10:09:55 AM

On Fri, 25 Feb 2011 12:27:09 -0500, Arne Vajhøj wrote:

> On 24-02-2011 23:57, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:55:44 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 13:43, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>
>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>> so-called 'ANSI' (mostly CP-1252)
>>>>
>>>> Same difference.
>>>
>>> Completely different char set.
>>
>> Funny that something so "completely different" intersects with ASCII in
>> the entirety of ASCII's range (0-127). It just specifies what 128-255
>> mean instead of leaving those values undefined. Unicode specifies what
>> 128-65535 mean and still intersects with ASCII on 0-127.
> 
> If you knew about programming

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/26/2011 10:10:24 AM

On Fri, 25 Feb 2011 12:30:23 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:00, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:58:24 -0500, Arne Vajhøj wrote:
>>
>>> On 24-02-2011 19:12, Lew wrote:
>>>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>
>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>> so-called
>>>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>>>> BOM).
>>>>>
>>>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document
>>>>> (turns out to be CP-1252), Text-Document DOS format (turns out to be
>>>>> CP-850) and Unicode. No
>>>>> ASCII.
>>>>
>>>> Windows hasn't used ASCII in decades.
>>>
>>> I don't think it ever have.
>>
>> Funny then that bog-standard ASCII files seem to read and write just
>> fine in Notepad on the occasions that I use Windows computers.
> 
> That just mean that it use something ASCII compatible - not that it uses
> ASCII.

Sophistry.

> And you can easily verify that it indeed supports characters not part of
> ASCII.

Never said it didn't.

>> All of those seem to be ASCII plus another up to 128 characters, or in
>> the case of UTF-16, another up to 65408 characters.
>>
>> Saying that a 7-bit-clean file interpreted in one of those is not ASCII
>> is like saying that humans are not mammals.
> 
> And?
> 
> Noone is saying that such a file is not ASCII.

You were.

> PS: UTF-16 is *not* ASCII compatible.

It is if you strip the high bytes and not just the 7th bits.

Otherwise you get NULs interleaved with the actual characters (if the 
characters were all on code page zero, say because it was typical English 
prose; other characters than NULs sometimes otherwise), which is annoying 
but not insurmountable.
0
Reply Ken 2/26/2011 10:13:21 AM

On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:04, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 09:00, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>>>
>>>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>>>> [...]
>>>>>> Obsolete systems do not interest me.
>>>>>
>>>>> then…
>>>>>
>>>>>> Since those days, the world has standardized on ASCII flat files
>>>>>> for text files.
>>>>>
>>>>> LOL!
>>>>
>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>
>>> No.
>>>
>>> They are CP-1252, UTF-8 or UTF-16.
>>
>> All of which are ASCII++, for all intents and purposes.
> 
> This is an IT group.
> 
> Not a group for hairdressers or chefs.

Evasive change of subject noted.

>>>>                                  And that exhausts 99.99% of the
>>>> operating system market share right there, if not more,
>>>
>>> No.
>>>
>>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>>
>> Nonsense. There are *at least* ten thousand PCs running Windows for
>> every one machine running one of those operating systems.
>>
>> Ten thousand *PCs running Windows*.
> 
> The PC/mainframe ratio is probably like 100000:1.

Hence why I said *at least*. I was being conservative in my estimates -- 
as generous to *your* case as possible. And still I was demolishing it.

> But the relevance is not that big. Because mainframes happen to be a lot
> more expensive than PC's.

One computer is still one computer, no matter how expensive it is. It's 
the price tag whose relevance is not that big.

>>>> I can't remember the last time I had to interoperate with any machine
>>>> that had anything other than standard ASCII as the native format for
>>>> text files. It's gotta be decades.
>>>
>>> Possible that you only work with 20+ year old Unix and OpenVMS systems
>>> with 7 bit VT100 access.
>>>
>>> But that is not very common.
>>
>> I work with what nearly everyone in the field works with these days: a
>> mix of Unix, MacOS, and Windows, mainly Unix server blades whose
>> services are accessed by mainly Windows desktop/netbook users with a
>> smattering of Mac users and a small but growing contingent of
>> smartphone users.
> 
> The you won't have any users using ASCII.

Funnily enough, all of them can cope just fine with ASCII text files. I 
wonder how that can be, Arne, unless of course you're wrong yet again.
0
Reply Ken 2/26/2011 10:17:00 AM

On Fri, 25 Feb 2011 12:34:43 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:06, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 13:12:36 -0700, Jim Janney wrote:
>>> Ken Wesson<kwesson@gmail.com>  writes:
>>>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>>>
>>>>> On the IBM i machines (formerly i Series, formerly System i,
>>>>> formerly AS/400, successor to the System/3x), blah blah blah
>>>>
>>>> You're one to talk about provincialism. Who the hell uses these
>>>> ancient museum pieces any more?
>>>
>>> Um, that would be me, or rather my employer's customers.
>>
>> Your employer may happen to be using such legacy systems, but I very
>> much doubt that very many people deal with them in an IT capacity. Far,
>> *far* fewer than deal with Unix, Windows, and Mac boxes in such a
>> capacity.
>>
>> How many end-users interact indirectly with these systems is of course
>> irrelevant.
> 
> Not really

Yes, really.

Let us recall the context in which this silly argument blew up, shall we?

Someone asked for an efficient way to get at the last line of a text 
file. *Several* people suggested seeking backwards from the end to find a 
newline character. (Interestingly, only one of those people has since 
been subjected to flamage for this suggestion. I wonder why?)

You and some others pooh-poohed that suggestion because it might not work 
properly on fewer than 0.01% (you've since admitted fewer than 0.001%) of 
computers, with the implied grounds that any number at all above zero is 
unacceptable.

I've got news for you. That suggestion also won't work on any system 
without a JVM.

It won't work on a 2MB RAM 286 too cramped for the program to run on.

It won't work on an Xbox 360 or any other system that won't run unsigned 
binaries and whose signing authority won't sign this program.

And so on.

NO program will work on *all computers*, Arne. So that goal is 
unattainable.

Hardly any users, if any at all, of the OP's program would be using it on 
a weird machine like those 0.001% you say *a subset* of which don't store 
text normally. (Apparently having no true text files at all!)

I'd expect those few users will expect their quirky computers to not 
accept software that works nearly everywhere else, and accept that, or 
else they would have gotten a less quirky computer instead.

And in actual fact the OP's user base almost certainly consists of Unix 
sysadmins who want to view the last entry or last few entries of a 
ginormous log file without difficulty, in which case the OP could 
probably get away even with hardcoding \u000A as the line-end character 
(though I wouldn't recommend they actually do so).

*That* is what's relevant here.
0
Reply Ken 2/26/2011 10:24:19 AM

On Fri, 25 Feb 2011 10:26:27 -0500, Michael Wojcik wrote:

> Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:25:05 +0000, Martin Gregorie wrote:
>>>>
>>> You know, you sound exactly like a character who surfaced in a Y2K
>>> newsgroup back in 1998/99. He refused to believe that any computers
>>> apart from PCs were in use at the time.
>> 
>> I doubt that. He may have correctly pointed out that the vast
>> *majority* of computers were PCs at the time. (Now, laptops and
>> smartphones may have the slight edge, or perhaps even server blades,
>> now that typical servers are racks full of small computers instead of
>> single big computers.)
> 
> Embedded computers have a huge majority over all general-purpose
> computers, by orders of magnitude

I wasn't counting non-general-purpose computers. Only computers you can 
get an open-ended set of vari-purposed apps for.

> The line between "smartphones" and other mobile phones is fuzzy

It's pretty sharp if you use the criterion I just articulated above.

> but among computing devices that support at least some general-purpose
> applications (as opposed to dedicated controllers), phones are far and
> away in the majority, by number of CPUs.
> 
> In other words, wrong again, Ken.

Not at all. I said desktop PCs would have been in the majority over a 
decade ago, and that phones might have an edge now. I doubt they are "far 
and away" in the majority, though.

Googling suggests 41 million iPhones and another over 8 million Android 
phones have been sold. Let's round this up to an even 50 million 
smartphones, total, absorbing the small numbers of true smartphones with 
open-ended sets of downloadable apps that are neither iPhones nor 
Androids.

The same methods indicate the number of PCs in use (not just sold ever, 
but in use now) at over 1 billion.

So it is likely that PCs are still outnumbering phones, perhaps by as 
much as 20 to 1.

For now.
0
Reply Ken 2/26/2011 10:29:35 AM

On Fri, 25 Feb 2011 10:44:10 -0500, Michael Wojcik wrote:

> Ken Wesson wrote:
>> On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:
>> 
>>> Ken Wesson wrote:
>>>> Who asked you for your opinions of others here?
>>> No one. I offer them out of sheer generosity.
>> 
>> Calling other people names is hardly what I would call "generosity"
> 
> No, you wouldn't.

Nor I imagine would anyone sane and honest.

>> nor
>> is polluting a newsgroup with off-topic traffic.
> 
> ignorance and dull repetition

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

My posts have addressed *technical* arguments with cogent, technical 
counterarguments.

I have only been "repetitive" in my responses to personal comments that 
do not belong in a technical discussion.

I think though that I shall change my policy with regard to such comments 
from now on, to even more clearly distinguish them from legitimate 
argumentation.

>> Your personal opinions of others are not the topic of this newsgroup.
> 
> Actually, they are.

Of course they're not, because your personal opinions of others have 
nothing whatsoever to do with the Java programming language.

>> Do you have anything Java-related to say?
> 
> Yes.

Then by all means, do go ahead, but please stop posting non-Java-related 
material, particularly personal attacks, in this group.

>>>>> On the IBM i machines (formerly i Series, formerly System i,
>>>>> formerly AS/400, successor to the System/3x), blah blah blah
>>>> You're one to talk about provincialism. Who the hell uses these
>>>> ancient museum pieces any more?
>>> Thousands of organizations, which is why they still enjoy healthy
>>> sales.
>> 
>> Ah, must be vendor lockin. Sucks to be them. Soon they'll be
>> outcompeted by newer, nimbler firms that use modern things like the
>> free Unixes on commodity hardware.
> 
> Yes, soon, no doubt.

Glad you agree.

> O glorious day, when we are ushered into the Age of Wessonism!

There is no such thing, so far as I am aware, as "Wessonism".

> Free unicorns for all!

What does that have to do with Java, Michael?

>> Of course, they might last a while if they can keep convincing the
>> government to give them "bailouts" or other protectionist help in the
>> face of competitors and their own screwups.
> 
> Careful, Ken - you'll short out your keyboard with all that spittle.

I see no spittle here.

>> Still, your "thousands" of organizations are outweighed by the
>> *hundreds* of thousands that don't use such systems
> 
> No, they aren't.

Yes, they are. (And if you really think otherwise, I gotta ask -- what is 
someone who flunked elementary-school arithmetic doing in a computer 
programming newsgroup? Nobody should be posting here that honestly 
believes that 100,000 < 1000.)

> your cognitive abilities

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>>> And the majority of business transactions
>> 
>> have no bearing on this discussion, which has to do with the majority
>> of *computers*
> 
> No, it doesn't.

Yes, it does.

> You don't have the power to determine what the discussion is about

I have as much power to do so as you do, in general, so your statement is 
inherently rather ironic.

Moreover, this *particular* discussion grew out of a statement that *I* 
made a while back. I *certainly* am the sole arbiter of what precisely I 
meant by that statement, and any other that I made.

> it's about whatever the participants - all the participants - decide to
> discuss.

Well, put that way, then since this newsgroup is named 
comp.lang.java.programmer anyone posting here is implying via their 
Newsgroups: line that they have decided to discuss Java. Since Ken 
Wesson's various alleged flaws and merits are not relevant to Java, then, 
the participants have at the very least decided not to discuss *that*, so 
let's cut out all of the anti-Wesson blatherings from you, Arne, and 
others *right* now.

As for what computers and users are relevant to the matter at actual 
debate, perhaps we should consider the likely user base of whatever the 
thread's OP was coding. Since he wants to grab the final lines of text 
files very efficiently, it's likely that he's coding something for 
viewing, extracting to a new file, or otherwise operating upon long log 
files. That means the likely user base consists of Unix sysadmins, and 
therefore that non-Unix operating systems with wacky data storage formats 
that don't ever store text in actual text files are indeed quite 
irrelevant.

> your failure to learn

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/26/2011 10:40:48 AM

On Fri, 25 Feb 2011 12:33:16 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:19, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:
>>> And the majority of business transactions
>>
>> have no bearing on this discussion, which has to do with the majority
>> of *computers* and, secondarily, what will be encountered routinely by
>> the majority of *IT workers*.
> 
> Well the topic

The topic was as I stated above.

> And since somebody is willing to pay a lot more for a mainframe running
> an entire bank than for somebody to be able to read email, then counting
> computers does not really reflect

The original debate arose from your worry that if the thread's OP counted 
backward from the end of a file to the final newline character in it, 
this would break on a handful of oddball systems.

So what really matters is: what percentage of the people who would 
potentially be downloading and installing the OP's software are likely to 
actually be affected, if this were the case?

I'd argue that percentage to be very low. In fact, given the likelihood 
that he's writing a tool for dealing specifically with Unix logfiles, the 
percentage may well be exactly zero.
0
Reply Ken 2/26/2011 10:43:21 AM

On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:

> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
> 
>> If by "very common" you mean used on one in ten thousand or fewer of
>> their computers. For every single z/OS machine in corporate America
>> there are probably a thousand blade servers and ten thousand office PCs
>> and employer-provided laptops and God alone knows how many employee
>> smartphones with plans and/or handsets paid for by their company.
>
> By that standard PCs, in which lets include desktops and laptops, are
> also a tiny small proportion of all computers once you count phones and
> all the embedded computers in vehicles.

I'm only counting machines you can add onto with an open-ended set of 
software applications.

> IMO its a silly argument because very many PCs are used for only a small
> part of the day and do very little apart from using electricity and
> occasionally receiving and sending a few e-mails. A better measure is
> the number of transactions and documents handled by each machine per
> year.

In the context in which this debate arose, the best measure is based on 
the users of the software the thread's OP is developing. How many of 
those users will want to run it on the oddball systems you're so worried 
about? My guess is very few and quite possibly zero.
0
Reply Ken 2/26/2011 10:45:19 AM

On Fri, 25 Feb 2011 09:45:01 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:26, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:09:44 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 13:42, Ken Wesson wrote:
>>>> You're one to talk about provincialism. Who the hell uses these
>>>> ancient museum pieces any more?
>>>
>>> Lots of places.
>>>
>>> Retail sector, public sector, financial sector
>>
>> If you're counting it that way, that's 3 places. Hardly "lots". :)
> 
> I have news for you - the number of business entities in those 3 sectors
> are a lot higher than 3.

But he wasn't counting business entities; he was counting the sectors 
themselves.

> you have no knowledge about businesses

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

> You are no aware

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>> See other posts. Perhaps a collected few tens of thousands of computers
>> using museum-worthy OSes like those versus a collected *billion* or
>> more of machines running Windows, MacOS, iOS, Android, and Unix.
> 
> There are also more flies than humans on earth.
> 
> That does not make flies more important.

Ah, so all of the thread-OP's users don't matter. They're mere flies, 
because they aren't filthy stinking rich. It's much more important that 
the software he's developing run on some oddball computer type he's quite 
possibly never even heard of and that none of his users probably want to 
run it on, even at the expense of making it inefficient or even 
nonfunctional on the Unix servers that are probably the actual target 
system, because Unix sysadmins don't make as much money as bankers. 
Gotcha.

I hope to God you're not involved in making important decisions at any 
computer company, especially in marketing and/or somewhere in the process 
that converts marketing data into sets of requirements.

>>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>>> have been prominent 30 years ago, but not now.
>>>
>>> Both z/OS and i are widely used today.
>>
>> If by "widely used" you mean on one in ten thousand or fewer computers.
> 
> But a lot more in revenue.

That is not what "widely used" means. By your definition, Ferraris are 
more widely used than ordinary four-door sedans, for Christ's sake.

>>>> Fine, then -- corporate America and home computers in America then.
>>>
>>> OK - neither z/OS or i are common on home computers.
>>>
>>> But they are very common in corporate America.
>>
>> If by "very common" you mean used on one in ten thousand or fewer of
>> their computers. For every single z/OS machine in corporate America
>> there are probably a thousand blade servers and ten thousand office PCs
>> and employer-provided laptops and God alone knows how many employee
>> smartphones with plans and/or handsets paid for by their company.
> 
> And?
> 
> If a company buys a mainframe for 20 M$ and 10000 PC's for 10 M$, then
> it is 2/3 mainframe.

That's not a useful way of looking at it when the topic is software 
compatibility. How large a fraction of machines the OP's software will 
run correctly on, out of the set people might try to run it on, is the 
metric that matters there.

>>> If all z/OS systems disappeared over night then everything would break
>>> down, because so many critical systems are running on them.
>>
>> A somewhat scary thought, but hardly relevant unless you're trying to
>> stir up enough public alarm to foment a general movement to replace
>> these legacy systems with more modern ones.
> 
> It is relevant because the point is that most of the world important
> data are processed by mainframes.

That's clearly not true. A lot of financial data might be, but very 
little else, and an awful lot of that "else" is also "important data".

> Sure they can be replaced. 10-20 years and 10-20 trillion dollars.

You're joking, right? It might cost that much to replace them with more 
of the same, but to replace them with commodity hardware and operating 
systems will certainly cost a lot less, modulo the cost of porting 
software. (In practice it probably makes more sense to phase out their 
use by just not getting new ones, or even just by having new companies 
that enter those fields use modern systems and waiting for the older 
companies in the space to die off over time, because of that porting 
cost.)
0
Reply Ken 2/26/2011 10:55:13 AM

On Fri, 25 Feb 2011 09:46:30 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:28, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:
>>> And it is a pretty good guess that the RandomAccessFile searching for
>>> CR and LF will fail on i also then.
>>
>> How fortunate that i runs on fewer than one in ten thousand machines.
>> Does Java even run on i?
> 
> Yes.

And what the hell is it used for on i?

>>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>>
>> Both contain ASCII as a subset -- if you take a pure-ASCII file and
>> reencode it in either the result is the identical byte sequence.
> 
> Yes

There you go.
0
Reply Ken 2/26/2011 10:56:05 AM

On Fri, 25 Feb 2011 10:52:08 -0500, Arne Vajhøj wrote:

> On 25-02-2011 00:36, Ken Wesson wrote:
>> On Thu, 24 Feb 2011 20:45:36 -0500, Arne Vajhøj wrote:
>>> On 24-02-2011 08:06, Ken Wesson wrote:
>>>> Obsolete systems do not interest me.
>>>
>>> Whether a solution works in general or not depends on whether it is
>>> guaranteed to work on all platforms or not.
>>
>> Actually, "in general" tends to have some kind of implicit scope that
>> is usually less than "all platforms". For instance, when discussing a
>> Java solution, we can exclude platforms that Java doesn't run on.
> 
> True.

There you go.

>>> The RandomAccessFile and search for CR and LF does not.
>>
>> It probably runs on all platforms Java is normally used on. It
>> certainly runs on 99.99% or more of the machines anyone is likely to
>> run Java on,
> 
> If you are counting machines: yes.

And I am counting machines. I think I even just said so.

>>           Hell, these machines may not even be able to represent C
>>           source
>> trees normally, requiring the compiler vendor to jump through hoops and
>> requiring unusual tools and IDEs be used to hack C sources and not just
>> the system text editor.
> 
> Text editors are by definition able to create text files and source code
> is text files.

Yes, but you've just explained that many of these machines don't 
apparently *have* text files.

> Try think logical.

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>>                         Hell, I wouldn't be surprised if there were no
>> working C implementations on some of these systems
> 
> They do have C.

Some of them probably do, yes.

> you don't seem to know much

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>> On the contrary, whether software works on platforms that interest its
>> developer and user base is 100% relevant and whether it works on
>> platforms that *don't* interest its developer and user base is
>> irrelevant.
> 
> No.

Yes.

*Please* tell me you do not work in requirements or marketing at any 
major tech company!

> And it is bad Java programming to write code that only works on some
> Java platforms even though the expectation is that the program will only
> be used on platforms where it do work.

In your opinion.

I happen to think it's perfectly alright if a method that is supposed to 
return the last line of a text file has undefined behavior if called on a 
binary file.

>>>>                                 Since those days, the world has
>>>> standardized on ASCII flat files for text files.
>>>
>>> Not really.
>>>
>>> Windows uses CP-1252, UTF-8 and UTF-16 Unix/Linux/VMS uses ISO-8859-1
>>> and UTF-8
>>
>> All ASCII supersets. Which means the common denominator among all those
>> is ... ta-da! ASCII. :)
> 
> That does not make them use ASCII.

In a pig's eye it doesn't.

>>> IBM mainframe uses EBCDIC
>>
>> And hardly anyone uses IBM mainframe (sic). What was that figure again?
>> 0.01% of all computers?
> 
> I think the number was

The relevant number here was 0.01% of computers. You later admitted it's 
closer to 0.001%.

>>> There are really very few systems today that uses just ASCII.
>>
>> But many that use ASCII.
> 
> Very few.
> 
> Most support ASCII

Wow, a contradiction in just two sentences, five words!
0
Reply Ken 2/26/2011 11:01:46 AM

On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:

> On Fri, 25 Feb 2011 06:38:32 +0100, Ken Wesson wrote:
>> Other character sets mostly intersect in ASCII. Nearly all in any kind
>> of widespread use intersect in using characters 10 and 13 as the
>> potential- line-end characters. And "other record formats" are not
>> relevant in a discussion of text files, as has been explained already.
>
> Bad argument

Not at all.

> a text file contains records. They are variable length
> records with a 'newline' encoding as the delimiter.

By that definition the concept of "record-based" vs. "not-record-based" 
becomes completely meaningless.

But most of us use "records" to mean a structure that involves out-of-
band boundaries of some sort. Linear text with inline line break etc. 
characters has only in-band boundaries and is much less structured than 
what a "record" typically implies.

> BTW, you can use C to handle iSeries text files through the usual gets()
> and puts() functions despite the iSeries holding text in what are
> effectively database rows. They have three fields per row - a line
> number, a fixed length text field and an 8 byte ID.

That's text plus file metadata. It's *almost* a legitimate text file (and 
a Windows .txt file is a legitimate text file, though last-modified date 
and other information is stored with the file by Windows).

What makes it not *quite* a legitimate text file is that the file's 
actual content contains a line break that is distinct from 0x0A, 0x0D, 
and every single other character value, resulting in it being impossible 
to represent faithfully as a character string. In particular, assuming 
that reading the file with Reader or gets converts the database row 
boundaries into 0x0A characters in the stream and writing it converts 
0x0A characters to row boundaries, a file that contained both 0x0A 
characters *and* row boundaries could not be copied via gets/puts or 
Reader/Writer without loss of information, specifically if you encoded 
information in a pattern of row boundaries and 0x0A characters that 
pattern would be erased in the copy. If copying the file as a text file 
destroys information, then the file was obviously not a real text file. 
It was a text-plus-some-other-stuff file.

> I don't know why an OS/400 text file would need an ID field, but
> its there.

Database rows need an ID field so there's something you can uniquely key 
on, and you said the system stores text in database rows, so there's your 
explanation. The thing that makes no sense is it storing text in database 
rows instead of as native text.

> The reason that C's standard text handling works on these
> files is down to the standard library, which is written to inter-convert
> between C's internal null delimited string representations of lines and
> the external fixed field representation.

Actually C is already broken here even on "normal" systems, because C 
strings can't properly represent text containing NUL characters. Java's 
String class does not have this problem, and Java can read and write true 
text files without truncating them or otherwise screwing them up even if 
they do contain NUL characters. But even Java will strip information from 
your OS/400 "text files", despite doing a supposedly faithful character 
by character copy, which means these files store content (and not just 
metadata) outside the entirety of the Unicode character set, and 
therefore are not true text files.

> Getting back on topic, I haven't used Java on an OS/400 but its
> available and will almost certainly work the same way

Nope; see above. If everything you've told me is accurate then it is 
possible to write an OS/400 "text file" that encodes some information 
that will be destroyed in a copy made by simply reading it character by 
character through a java.io.Reader and outputting it character by 
character, unaltered, through a java.io.Writer. The information thus 
destroyed cannot therefore have been text, and since the file contained 
it the file cannot therefore have been a text file -- it was a binary 
file that happened to contain some text. Run "strings" on an executable 
file on your system and you'll probably find that that contains some 
text, too, so you can't say "if it contains some text then it's a text 
file"; no, the only sensible definition of "text file" must be "it 
doesn't contain non-text". And the OS/400 "text file" (potentially) 
contains non-text, as determined by the pass-it-through-a-full-Unicode-
character-by-character-copy-and-see-if-you-can-ever-lose-information test.
0
Reply Ken 2/26/2011 11:15:21 AM

On 11-02-25 11:15 PM, Peter Duniho wrote:
[ SNIP ]
>
> To me, the real question is: if we do define "text file" in a rigorous
> way, how in the world does that help any of us write better Java
> programs? What's the point?
>
> Pete

The usefulness of the term "text file" for me is that it describes a 
file that can be opened, viewed and used by every application, tool and 
utility, on every OS and platform, that purports to be a "text editor". 
The absolute baseline of "text editor" programs can deal with a text 
file. Tools like cat, grep, awk, more etc and their equivalents on 
non-UNIX/Linux systems can handle them with ease. vi and Notepad and 
gedit and pico and every other self-proclaimed "text editor" faithfully 
show all the content.

To me the term "text file" indicates that I have the widest possible 
options on any system to view it, and to process it as _generic_ text. 
Clearly I may not be able to *process* any given text file in the manner 
that it's _intended to be processed_, but I can certainly *view* it 
without a specific application.

It's with this meaning that I think it's a useful term. But I don't 
think any of this helps us write better Java programs, no. :-)

AHS
-- 
We must recognize the chief characteristic of the modern era - a 
permanent state of what I call violent peace.
-- James D. Watkins
0
Reply Arved 2/26/2011 1:27:12 PM

On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:

> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
> 
>> On Fri, 25 Feb 2011 06:38:32 +0100, Ken Wesson wrote:
>>> Other character sets mostly intersect in ASCII. Nearly all in any kind
>>> of widespread use intersect in using characters 10 and 13 as the
>>> potential- line-end characters. And "other record formats" are not
>>> relevant in a discussion of text files, as has been explained already.
>>
>> Bad argument
> 
> Not at all.
> 
>> a text file contains records. They are variable length records with a
>> 'newline' encoding as the delimiter.
> 
> By that definition the concept of "record-based" vs. "not-record-based"
> becomes completely meaningless.
>
It is pretty much meaningless unless you're referring to the way a 
programs handles data. Consider a file containing nothing but printable 
characters:

- if a C or Java program reads the file byte by byte or parses it
  by reading words separated by whitespace then line delimiters are
  utterly meaningless and the program doesn't care whether the file
  contains records or not.

- OTOH if a different program reads the same file a line at a time, e.g
  C using fgets(), Java using BufferedReader.readLine(), then this is
  pure record-level access.

> But most of us use "records" to mean a structure that involves out-of-
> band boundaries of some sort.
>
Not necessarily. A CSV file is generally treated as containing a fixed 
number of variable length fields with the last field terminated by a 
newline. In this case, both commas and newlines are out-of-band (and so 
are some quote marks if the implementation allows fields to contain 
commas).

However, fixed length records made up of fixed length fields contain no 
out-of-band structure. You want an example? How about the two magnetic 
stripe tracks on a credit card - 40 bytes and containing fields whose 
content and meaning are defined by their position.

>> BTW, you can use C to handle iSeries text files through the usual
>> gets() and puts() functions despite the iSeries holding text in what
>> are effectively database rows. They have three fields per row - a line
>> number, a fixed length text field and an 8 byte ID.
> 
> That's text plus file metadata.
>
Indeed it is. Technically it is made up of fixed length fields with no 
delimiters. Apart from the record description that forms part of every 
file and the member separators the only metadata is similar to a UNIX 
directory entry plus the i-node. OS/400 and Z/OS text files are closer to 
a tar or zip file than what a Unix or Windows user considers to be a text 
file because you can store many separate chunks of text in a single text 
file.

> What makes it not *quite* a legitimate text file is that the file's
> actual content contains a line break that is distinct from 0x0A, 0x0D,
>
No it doesn't. The editor won't let you put newlines into an OS/400  text 
file - it automatically starts another text line record and assigns a 
line number to it.

> Database rows need an ID field so there's something you can uniquely key
> on, and you said the system stores text in database rows, so there's
> your explanation. The thing that makes no sense is it storing text in
> database rows instead of as native text.
>
Nice guess, but that's not how it works. That role is taken by the line 
number (which can be a decimal value - when you add lines between lines 
0002 and 0003 they'll be numbered 0002.01, 0002.02 etc until yo ask the 
editor to renumber the member - unlike Unix and Windows systems the line 
numbers in compilation errors aren't screwed up by editing the source.

The ID is a complete mystery - most people and programs don't use it and 
IIRC its not accessible via the editor so you can't change it, though it 
may be possible to ask the editor to maintain it.
  
> Actually C is already broken here even on "normal" systems, because C
> strings can't properly represent text containing NUL characters.
>
By definition they can't be included in 'text files' - they can be 
handled perfectly well in files via the read() and write() functions.

> Nope; see above. If everything you've told me is accurate then it is
> possible to write an OS/400 "text file" that encodes some information
> that will be destroyed in a copy made by simply reading it character by
> character through a java.io.Reader and outputting it character by
> character, unaltered, through a java.io.Writer.
>
Incorrect assumption because you can't put non-printable characters in an 
OS/400 source file member - the editor and other programs won't let you.

The OS/400 is a database machine. There are no files that aren't 
databases. Every file has defining metadata which is automatically 
generated for standard file types, e.g. source files and compiled 
binaries. The field types control what byte values can appear in every 
field, so you might limit a text field to upper case. Violating these 
rules generally causes an exception which, of course, can be caught and 
acted on.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/26/2011 1:29:48 PM

On 2/26/11 9:27 PM, Arved Sandstrom wrote:
> The usefulness of the term "text file" for me is that it describes a
> file that can be opened, viewed and used by every application, tool and
> utility, on every OS and platform, that purports to be a "text editor".

Then I think you need to define "text file" more narrowly than what is 
actually out there.  In this thread alone, there have been mentioned a 
number of true text file formats that are simply not readable in your 
average or even above-average text editor found on mainstream OSs.

> [...]
> It's with this meaning that I think it's a useful term. But I don't
> think any of this helps us write better Java programs, no. :-)

Okay.  Just checking.  :)

Pete
0
Reply Peter 2/26/2011 1:36:54 PM

On Fri, 25 Feb 2011 22:41:18 +0000, Tom Anderson wrote:

> What happens if you send one of these alleged text files from a
> mainframe to a printer or a shell?
>
You'd need to convert it before it could be handled just like you use 
unix2dos and dos2unix to convert newlines when you move fkiles between 
*nixen and DOS/Windows, though the translation is more thorough since 
you'd be converting the file to or from EBCDIC character encoding. As you 
might expect, EBCDIC, like ASCII and Unicode, has a similar collection of 
format effectors as well as field and record separators, though they 
occupy 0x00 to 0x3f rather than the ASCII/Unicode 0x00 to 0x1f.

Conversion to/from EBCDIC can only be done with a lookup table because it 
is a bit of a mess - A-Z and a-z are not contiguous (their encoding is 
related to the way that letters were encoded on punched cards with gaps 
between A-I, J-R, S-Z) and the numbers are 0xf0 - 0xf9. 


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/26/2011 1:52:15 PM

On 11-02-26 09:36 AM, Peter Duniho wrote:
> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>> The usefulness of the term "text file" for me is that it describes a
>> file that can be opened, viewed and used by every application, tool and
>> utility, on every OS and platform, that purports to be a "text editor".
>
> Then I think you need to define "text file" more narrowly than what is
> actually out there. In this thread alone, there have been mentioned a
> number of true text file formats that are simply not readable in your
> average or even above-average text editor found on mainstream OSs.

I'm a bit hard-nosed about this one I guess. A text file for me is a 
stream of variable-length lines with a reasonably common line separator.

To take VMS for example (and I've used VMS off and on since about 1980) 
my view of the 4 record formats in VMS RMS - fixed-length records, 
variable length records with count byte per record, variable length 
recorsd with fixed length control block, and stream (variable length 
records with line delimiters) - my personal view is that only the stream 
format with a common delimiter choice (LF or CR) qualifies as a "text file".

You may also have gathered that I am referring to plain text, let's say 
Unicode these days, as opposed to any type of formatted text.

>> [...]
>> It's with this meaning that I think it's a useful term. But I don't
>> think any of this helps us write better Java programs, no. :-)
>
> Okay. Just checking. :)
>
> Pete

AHS
-- 
We must recognize the chief characteristic of the modern era - a 
permanent state of what I call violent peace.
-- James D. Watkins
0
Reply Arved 2/26/2011 3:25:24 PM

On Sat, 26 Feb 2011, Peter Duniho wrote:

> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>
>> The usefulness of the term "text file" for me is that it describes a 
>> file that can be opened, viewed and used by every application, tool and 
>> utility, on every OS and platform, that purports to be a "text editor".
>
> Then I think you need to define "text file" more narrowly than what is 
> actually out there.  In this thread alone, there have been mentioned a 
> number of true text file formats that are simply not readable in your 
> average or even above-average text editor found on mainstream OSs.

Either (a) according to Arved's definition, which is highly appealing to 
me, they are not true text file formats, or (b) they *are* readable with 
the standard text editors *on the OSs on which they are found*, in which 
case, perhaps they are.

Perhaps talking about plain text is a bit like talking about plain 
speaking. I might deny that saying "alea iacta est" is plain speaking, and 
i would be correct where i'm standing, because i'm standing in an 
English-speaking country, but in the Vatican or ancient Rome, it would 
have been perfectly plain. The term 'plain' need not mean the same thing 
in all times and places.

Nonetheless, in general discourse on a newsgroups like this, there is a 
presumption that we're standing in the lands of the tribe of Ken Thompson, 
which has come to occupy the greater part of the world, and than plain 
text means ASCII or one of its successors, with lines terminated by CR 
and/or LF, and no funny business. This is not a universal truth, but it is 
a truth where we are right now.

tom

-- 
One chants out between two worlds Fire - Walk With Me.
0
Reply Tom 2/26/2011 3:30:42 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-206769150-1298734937=:2988
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Fri, 25 Feb 2011, Arne Vajh�j wrote:

> On 25-02-2011 16:43, Tom Anderson wrote:
>> On Fri, 25 Feb 2011, Lew wrote:
>> 
>>> On Feb 25, 1:07 pm, Daniele Futtorovic <da.futt.n...@laposte-dot-
>>> net.invalid> wrote:
>>>> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>>>> 
>>>>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>>> 
>>>>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>>>>> it's not (...) ASCII (...).
>>>>> 
>>>>> Alleged by whom? That distorted quote is most certainly not what I
>>>>> wrote.
>>>> 
>>>> Alleged by my Usenet provider.
>>>> 
>>>> I was trying to extract the wisdom in your postings. Give me some credit
>>>> here. That quote is most certainly what you (pertinently) wrote, minus
>>>> the fluff.
>>>> 
>>>> And please, I beg of you sincerely and benevolently, stop acting like
>>>> such a loonie.
>>> 
>>> "Your grand-daddy's ASCII" is exactly today's ASCII.
>> 
>> My grandfather's ASCII would probably have been ASCII-1963, which is not
>> today's ASCII.
>
> Are there that big differences between ASCII 63, 67 and 86?

The '67 version added lowercase letters! See:

http://www.wps.com/projects/codes/#ASCII-1967

I'm not aware of any actual differences between 86 and 67. IANA considers 
them synonyms (assuming that ANSI_X3.4-1968 means the 1967 version):

http://www.iana.org/assignments/character-sets

So i suspect it was just a revision of the spec to clean it up or 
something.

tom

-- 
One chants out between two worlds Fire - Walk With Me.
--232016332-206769150-1298734937=:2988--
0
Reply Tom 2/26/2011 3:41:46 PM

On 11-02-26 09:29 AM, Martin Gregorie wrote:
> On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:
[ SNIP ]
>>
> Not necessarily. A CSV file is generally treated as containing a fixed
> number of variable length fields with the last field terminated by a
> newline. In this case, both commas and newlines are out-of-band (and so
> are some quote marks if the implementation allows fields to contain
> commas).
>
> However, fixed length records made up of fixed length fields contain no
> out-of-band structure. You want an example? How about the two magnetic
> stripe tracks on a credit card - 40 bytes and containing fields whose
> content and meaning are defined by their position.
[ SNIP ]

I think the point is that every file scheme has out-of-band structure, 
explicit or implied. There's the structure information and there's the 
data. At one end of the spectrum you've got files that are almost fully 
self-describing (which category itself subsumes everything from 
fixed-length record/fixed-length field files, to stream-oriented plain 
text files that are variable length records delimited by EOL markers) to 
files that are totally non-self-describing. In every case, including 
that of the magnetic stripes example, there is out-of-band structure; 
it's just that it may be implied.

So if the records don't contain that structure, something else (either 
the file or the processor) does.

AHS
-- 
We must recognize the chief characteristic of the modern era - a 
permanent state of what I call violent peace.
-- James D. Watkins
0
Reply Arved 2/26/2011 3:43:05 PM

On Sat, 26 Feb 2011 11:43:05 -0400, Arved Sandstrom wrote:

> On 11-02-26 09:29 AM, Martin Gregorie wrote:
>> On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:
> [ SNIP ]
>>>
>> Not necessarily. A CSV file is generally treated as containing a fixed
>> number of variable length fields with the last field terminated by a
>> newline. In this case, both commas and newlines are out-of-band (and so
>> are some quote marks if the implementation allows fields to contain
>> commas).
>>
>> However, fixed length records made up of fixed length fields contain no
>> out-of-band structure. You want an example? How about the two magnetic
>> stripe tracks on a credit card - 40 bytes and containing fields whose
>> content and meaning are defined by their position.
> [ SNIP ]
> 
> I think the point is that every file scheme has out-of-band structure,
> explicit or implied. There's the structure information and there's the
> data. At one end of the spectrum you've got files that are almost fully
> self-describing (which category itself subsumes everything from
> fixed-length record/fixed-length field files, to stream-oriented plain
> text files that are variable length records delimited by EOL markers) to
> files that are totally non-self-describing. In every case, including
> that of the magnetic stripes example, there is out-of-band structure;
> it's just that it may be implied.
> 
> So if the records don't contain that structure, something else (either
> the file or the processor) does.
>
Well put. 

KW seemed to be saying that a record had to have structure, which is 
true, and that this had to be in the form of metadata included within the 
record, which isn't true. I was attempting to say that, though not as 
succinctly as yourself.
 

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/26/2011 5:18:50 PM

On 2/23/2011 9:21 AM, Lew wrote:
> On 02/23/2011 10:59 AM, Robin Wenger wrote:
>> Is it possible to read the last text line from a text file WITHOUT
>> reading the previous (n-1) lines?
>
> Yes, but it's tricky. You need a random-access file and seek backwards
> to a newline.
>
You can do something a little better than seeking backwards.  You can 
make some guesses about line length. If it is a typical text file, you 
can guess that the length f that line is < 1024 (for instance). Seek to 
that location before the end of the file and then perform the typical 
"tail" operation.

If you don't find the EOL as expected, you would then do the same thing, 
but start further back.

Also, be ware of the special case that the final line may or may not end 
with EOL.  Many text files have an end of line before end of file, but 
not always. So really you want to match "(start-of-file or EOL)(final 
line)(Optional EOL)(EOF)

-- 
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
0
Reply Daniel 2/26/2011 6:42:03 PM

On 26-02-2011 04:43, Ken Wesson wrote:
> On Fri, 25 Feb 2011 12:51:12 -0500, Arne Vajhøj wrote:
>> On 24-02-2011 23:48, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 20:50:55 -0500, Arne Vajhøj wrote:
>>>> On 24-02-2011 15:49, Tom Anderson wrote:
>>>>> On Fri, 25 Feb 2011, Peter Duniho wrote:
>>>>>> On 2/24/11 10:42 PM, Lars Enderin wrote:
>>>>>>> 2011-02-24 15:26, Peter Duniho skrev:
>>>>>>>> On 2/24/11 10:14 PM, Lars Enderin wrote:
>>>>>>>>> ASCII character values are limited to the 0-127 range. That's an
>>>>>>>>> outdated "standard".
>>>>>>>>
>>>>>>>> Used by "obsolete systems". A key point in my amusement. :)
>>>>>>>
>>>>>>> I thought so, but Ken seemed to need an explanation.
>>>>>>
>>>>>> Yes, and it was a good explanation. Unfortunately, I don't think he
>>>>>> understood the explanation, nor do I think he will understand
>>>>>> further clarification. I think it more likely that the harder anyone
>>>>>> tries to explain to him these points, the more dug in his heels will
>>>>>> be.
>>>>>>
>>>>>> To do otherwise would necessarily require an admission that there's
>>>>>> no single "text file" format, and that even if there were, ASCII or
>>>>>> any of the single-byte derivatives thereof ain't it. I don't see any
>>>>>> way such an admission would ever be produced.
>>>>>
>>>>> There is a single text file format: lines of characters in some
>>>>> encoding, terminated by an end-of-line sequence which is
>>>>> distinguishable from any other characters.
>>>>>
>>>>> It's merely the case that some current mainframes, and some obscure
>>>>> or historical systems, do not store text in text files!
>>>>
>>>> No.
>>>>
>>>> There are also count prefix (and sometimes suffix) formats.
>>>
>>> Those aren't text files. Text is, notionally, a string of characters,
>>> including perhaps spaces and line-end characters. A text file is
>>> therefore a file whose content is a string of characters, including
>>> perhaps spaces and line-end characters. Such a thing is, logically, the
>>> only native way to represent raw text. Anything more structured is
>>> obviously not a plain text file. It may be a text-containing file of
>>> some kind but it is not a text file.
>>
>> A text file is something you read and write as lines of text.
>>
>> Whether the system used LF delimiters or CR LF delimters or a counted
>> approach does not matter.
>
> Except, of course, that you don't read "a counted approach" as lines of
> text; you read it as binary integers mixed with text strings. It is not,
> physically, a text file.

It is a text file and can be used as a text file.

>>> That's nonsense. The only character a normal text file cannot have in
>>> lines is a line break, and in actual fact you cannot have a line break
>>> in the middle of a line *by definition*. Wherever there is a line break
>>> one line ENDS and another one BEGINS, *by definition*. If that weren't
>>> the case then it wouldn't be a line break!
>>
>> No.
>
> Yes. A line break in the middle of a line is utter nonsense, the logical
> equivalent of an odd even number or an endpoint of a circle or a corner
> of a disc.

It is actually very logical that something that is not considered
a line break in that record format can be in the middle of a line.

>> [ASCII 10] is perfectly valid as content in the middle of a line on
>> older MacOS systems
>
> Sophistry. Those just use ASCII 13 to mean the same thing.

Yes.

And in a counted prefix format both LF and CR are valid in lines.

>>> So there is no "advantage" here. What you are actually describing is a
>>> "list-of-strings" file, not a text file
>>
>> A text file is a list of strings.
>
> No, a text file is a single string.

No. That is why it is called a file of lines.

>> C source files are stored using count prefix line format son systems
>> that uses such.
>
> I said "the formats commonly used to store, e.g., C source files". No
> "count prefix line format" is *commonly* used to store C source files --
> 99.99% or more of C source files residing on hard disks in this world are
> undoubtedly in fact LF-delimited, and most of the rest CRLF delimited
> (Windows wackiness strikes again).

You assumption that amount of C cod eon non-*nix platforms to
be less than 0.01% is rather amusing.

How can anyone be so much out of touch with the real world.

Arne
0
Reply arne6 (9487) 2/26/2011 7:07:49 PM

On 26-02-2011 08:36, Peter Duniho wrote:
> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>> The usefulness of the term "text file" for me is that it describes a
>> file that can be opened, viewed and used by every application, tool and
>> utility, on every OS and platform, that purports to be a "text editor".
>
> Then I think you need to define "text file" more narrowly than what is
> actually out there. In this thread alone, there have been mentioned a
> number of true text file formats that are simply not readable in your
> average or even above-average text editor found on mainstream OSs.

They are edited fine by any text editor on those systems.

This includes cross platform editors that are also available
on *nix and Windows.

If the files are FTP'ed to a Unix box in text mode they can be
edited with any Unix text editor.

If their location is mounted as a Samba drive, then they can be
edited from Windows with any Windows text editor.

For obvious reasons notepad.exe can not be run on the systems.

Arne
0
Reply arne6 (9487) 2/26/2011 7:12:44 PM

On 26-02-2011 10:30, Tom Anderson wrote:
> On Sat, 26 Feb 2011, Peter Duniho wrote:
>
>> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>>
>>> The usefulness of the term "text file" for me is that it describes a
>>> file that can be opened, viewed and used by every application, tool
>>> and utility, on every OS and platform, that purports to be a "text
>>> editor".
>>
>> Then I think you need to define "text file" more narrowly than what is
>> actually out there. In this thread alone, there have been mentioned a
>> number of true text file formats that are simply not readable in your
>> average or even above-average text editor found on mainstream OSs.
>
> Either (a) according to Arved's definition, which is highly appealing to
> me, they are not true text file formats, or (b) they *are* readable with
> the standard text editors *on the OSs on which they are found*, in which
> case, perhaps they are.

If Java, C, Fortran etc. reads them as text files, then it seems weird
not to consider them text files.

> Nonetheless, in general discourse on a newsgroups like this, there is a
> presumption that we're standing in the lands of the tribe of Ken
> Thompson, which has come to occupy the greater part of the world, and
> than plain text means ASCII or one of its successors, with lines
> terminated by CR and/or LF, and no funny business. This is not a
> universal truth, but it is a truth where we are right now.

I really don't see any reason to redefine the concept of
lines due to many people having very limited experience
with other OS'es than Windows or Unix.

Java has not been standardized and made as well defined as it has
just to have "if it happens to work on *nix and Windows then it is
portable".

Arne
0
Reply ISO 2/26/2011 7:17:32 PM

On 26-02-2011 04:45, Ken Wesson wrote:
> On Fri, 25 Feb 2011 12:53:49 -0500, Arne Vajhøj wrote:
>
>> On 24-02-2011 23:51, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 21:42:10 -0500, Eric Sosman wrote:
>>>
>>>>        I think it's amusing that he says "All the world's ASCII,"
>>>
>>> Who says "all the world's ASCII", Sosman? I can't recall anybody doing
>>> so in this group recently.
>>>
>>> It is true that almost all the world seems to use encodings that
>>> contain ASCII as a subset. That is not quite the same thing.
>>
>> Somebody with the name of Ken Wesson wrote:
>
># Since those days, the world has standardized on ASCII flat files for text files.
>
># Windows text files are flat ASCII files (with CRLF line ends). Mac text
># files are flat ASCII files (with CR line ends). Unix text files are flat
># ASCII files (with LF line ends).
>
> I don't see an argument of any kind in your post. Forget to include one?

The quotation answers your question to Sosman.

You said that.

Arne


0
Reply UTF 2/26/2011 7:19:38 PM

On 26-02-2011 10:41, Tom Anderson wrote:
> On Fri, 25 Feb 2011, Arne Vajh�j wrote:
>
>> On 25-02-2011 16:43, Tom Anderson wrote:
>>> On Fri, 25 Feb 2011, Lew wrote:
>>>
>>>> On Feb 25, 1:07 pm, Daniele Futtorovic <da.futt.n...@laposte-dot-
>>>> net.invalid> wrote:
>>>>> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>>>>>
>>>>>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>>>>
>>>>>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>>>>>> it's not (...) ASCII (...).
>>>>>>
>>>>>> Alleged by whom? That distorted quote is most certainly not what I
>>>>>> wrote.
>>>>>
>>>>> Alleged by my Usenet provider.
>>>>>
>>>>> I was trying to extract the wisdom in your postings. Give me some
>>>>> credit
>>>>> here. That quote is most certainly what you (pertinently) wrote, minus
>>>>> the fluff.
>>>>>
>>>>> And please, I beg of you sincerely and benevolently, stop acting like
>>>>> such a loonie.
>>>>
>>>> "Your grand-daddy's ASCII" is exactly today's ASCII.
>>>
>>> My grandfather's ASCII would probably have been ASCII-1963, which is not
>>> today's ASCII.
>>
>> Are there that big differences between ASCII 63, 67 and 86?
>
> The '67 version added lowercase letters!

Well - that is a significant difference!

Arne
0
Reply ISO 2/26/2011 7:22:20 PM

On 26-02-2011 05:56, Ken Wesson wrote:
> On Fri, 25 Feb 2011 09:46:30 -0500, Arne Vajhøj wrote:
>> On 25-02-2011 00:28, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:
>>>> And it is a pretty good guess that the RandomAccessFile searching for
>>>> CR and LF will fail on i also then.
>>>
>>> How fortunate that i runs on fewer than one in ten thousand machines.
>>> Does Java even run on i?
>>
>> Yes.
>
> And what the hell is it used for on i?

The same that Java is used for on other platforms.

>>>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>>>
>>> Both contain ASCII as a subset -- if you take a pure-ASCII file and
>>> reencode it in either the result is the identical byte sequence.
>>
>>Yes, but that does not change that they do not use ASCII. They
>>use ISO-8859-1 or UTF-8.
>
> There you go.

Exactly.

You finally got it?

Arne

0
Reply UTF 2/26/2011 7:26:18 PM

On 26-02-2011 05:17, Ken Wesson wrote:
> On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:
>> On 25-02-2011 00:04, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:
>>>> On 24-02-2011 09:00, Ken Wesson wrote:
>>>>> On Thu, 24 Feb 2011 21:23:34 +0800, Peter Duniho wrote:
>>>>>
>>>>>> On 2/24/11 9:06 PM, Ken Wesson wrote:
>>>>>>> [...]
>>>>>>> Obsolete systems do not interest me.
>>>>>>
>>>>>> then…
>>>>>>
>>>>>>> Since those days, the world has standardized on ASCII flat files
>>>>>>> for text files.
>>>>>>
>>>>>> LOL!
>>>>>
>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>
>>>> No.
>>>>
>>>> They are CP-1252, UTF-8 or UTF-16.
>>>
>>> All of which are ASCII++, for all intents and purposes.
>>
>> This is an IT group.
>>
>> Not a group for hairdressers or chefs.
>
> Evasive change of subject noted.
>
>>>>>                                   And that exhausts 99.99% of the
>>>>> operating system market share right there, if not more,
>>>>
>>>> No.
>>>>
>>>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>>>
>>> Nonsense. There are *at least* ten thousand PCs running Windows for
>>> every one machine running one of those operating systems.
>>>
>>> Ten thousand *PCs running Windows*.
>>
>> The PC/mainframe ratio is probably like 100000:1.
>
> Hence why I said *at least*. I was being conservative in my estimates --
> as generous to *your* case as possible. And still I was demolishing it.

Not at all.

Because market share is counted in dollars.

>> But the relevance is not that big. Because mainframes happen to be a lot
>> more expensive than PC's.
>
> One computer is still one computer, no matter how expensive it is. It's
> the price tag whose relevance is not that big.

I have news for you: money is always important.

>>>>> I can't remember the last time I had to interoperate with any machine
>>>>> that had anything other than standard ASCII as the native format for
>>>>> text files. It's gotta be decades.
>>>>
>>>> Possible that you only work with 20+ year old Unix and OpenVMS systems
>>>> with 7 bit VT100 access.
>>>>
>>>> But that is not very common.
>>>
>>> I work with what nearly everyone in the field works with these days: a
>>> mix of Unix, MacOS, and Windows, mainly Unix server blades whose
>>> services are accessed by mainly Windows desktop/netbook users with a
>>> smattering of Mac users and a small but growing contingent of
>>> smartphone users.
>>
>> The you won't have any users using ASCII.
>
> Funnily enough, all of them can cope just fine with ASCII text files. I
> wonder how that can be, Arne, unless of course you're wrong yet again.

It is called "backwards compatibility".

Try google the term.

Oops - sorry - I know you are too lazy to do that.

You will have to trust me that such a term exists.

Arne
0
Reply UTF 2/26/2011 7:29:08 PM

On 26-02-2011 05:43, Ken Wesson wrote:
> On Fri, 25 Feb 2011 12:33:16 -0500, Arne Vajhøj wrote:
>> On 25-02-2011 00:19, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:
>>>> And the majority of business transactions
>>>
>>> have no bearing on this discussion, which has to do with the majority
>>> of *computers* and, secondarily, what will be encountered routinely by
>>> the majority of *IT workers*.
>>
>> Well the topic was market share.
>>
>> Market share is counted in dollars.
>>
>> And since somebody is willing to pay a lot more for a mainframe
>> running an entire bank than for somebody to be able to read email,
>> then counting computers does not really reflect market share.
>
> The topic was as I stated above.

No.

You wrote:

#            And that exhausts 99.99% of the
# operating system market share right there

Note the word "market share".

>> And since somebody is willing to pay a lot more for a mainframe running
>> an entire bank than for somebody to be able to read email, then counting
>> computers does not really reflect
>
> The original debate arose from your worry that if the thread's OP counted
> backward from the end of a file to the final newline character in it,
> this would break on a handful of oddball systems.

How does your hand look like?

There are probably something like 100000 of these systems
in production.

Arne
0
Reply UTF 2/26/2011 7:33:40 PM

On 26-02-2011 05:13, Ken Wesson wrote:
> On Fri, 25 Feb 2011 12:30:23 -0500, Arne Vajhøj wrote:
>> On 25-02-2011 00:00, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 20:58:24 -0500, Arne Vajhøj wrote:
>>>> On 24-02-2011 19:12, Lew wrote:
>>>>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>>
>>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>>> so-called
>>>>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>>>>> BOM).
>>>>>>
>>>>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>>>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document
>>>>>> (turns out to be CP-1252), Text-Document DOS format (turns out to be
>>>>>> CP-850) and Unicode. No
>>>>>> ASCII.
>>>>>
>>>>> Windows hasn't used ASCII in decades.
>>>>
>>>> I don't think it ever have.
>>>
>>> Funny then that bog-standard ASCII files seem to read and write just
>>> fine in Notepad on the occasions that I use Windows computers.
>>
>> That just mean that it use something ASCII compatible - not that it uses
>> ASCII.
>
> Sophistry.

Simple fact.

>> And you can easily verify that it indeed supports characters not part of
>> ASCII.
>
> Never said it didn't.

Yes - you did.

You said they used ASCII.

If they did that then they would not support characters not ASCII.

>>> All of those seem to be ASCII plus another up to 128 characters, or in
>>> the case of UTF-16, another up to 65408 characters.
>>>
>>> Saying that a 7-bit-clean file interpreted in one of those is not ASCII
>>> is like saying that humans are not mammals.
>>
>> And?
>>
>> Noone is saying that such a file is not ASCII.
>
> You were.

No.

I have very specifically been talking about what the
systems uses.

>> PS: UTF-16 is *not* ASCII compatible.
>
> It is if you strip the high bytes and not just the 7th bits.

Which means non compatible.

Arne

0
Reply UTF 2/26/2011 7:36:03 PM

On 26-02-2011 05:24, Ken Wesson wrote:
> On Fri, 25 Feb 2011 12:34:43 -0500, Arne Vajhøj wrote:

>> On 25-02-2011 00:06, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 13:12:36 -0700, Jim Janney wrote:
>>>> Ken Wesson<kwesson@gmail.com>   writes:
>>>>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>>>>
>>>>>> On the IBM i machines (formerly i Series, formerly System i,
>>>>>> formerly AS/400, successor to the System/3x), blah blah blah
>>>>>
>>>>> You're one to talk about provincialism. Who the hell uses these
>>>>> ancient museum pieces any more?
>>>>
>>>> Um, that would be me, or rather my employer's customers.
>>>
>>> Your employer may happen to be using such legacy systems, but I very
>>> much doubt that very many people deal with them in an IT capacity. Far,
>>> *far* fewer than deal with Unix, Windows, and Mac boxes in such a
>>> capacity.
>>>
>>> How many end-users interact indirectly with these systems is of course
>>> irrelevant.
>>
>> Not really
>
> Yes, really.
>
> Let us recall the context in which this silly argument blew up, shall we?
>
> Someone asked for an efficient way to get at the last line of a text
> file. *Several* people suggested seeking backwards from the end to find a
> newline character. (Interestingly, only one of those people has since
> been subjected to flamage for this suggestion. I wonder why?)

Actually no one was flamed for suggesting using that trick.

You got flamed because:
- you do not understand the concept of portability
- you do not understand the IT market
which resulted in some weird statements from you.

> You and some others pooh-poohed that suggestion because it might not work
> properly on fewer than 0.01% (you've since admitted fewer than 0.001%) of
> computers, with the implied grounds that any number at all above zero is
> unacceptable.

I don't even think anyone claimed it was unacceptable.

The point was that it was a solution that was not guaranteed
to work on all platforms.

> I've got news for you. That suggestion also won't work on any system
> without a JVM.
>
> It won't work on a 2MB RAM 286 too cramped for the program to run on.
>
> It won't work on an Xbox 360 or any other system that won't run unsigned
> binaries and whose signing authority won't sign this program.
>
> And so on.
>
> NO program will work on *all computers*, Arne. So that goal is
> unattainable.

That is not so relevant.

A Java program will not work on a system with no Java.

Everyone should know that.

That is not an excuse for writing non portable Java code
without even noting the portability problem.

> Hardly any users, if any at all, of the OP's program would be using it on
> a weird machine like those 0.001% you say *a subset* of which don't store
> text normally. (Apparently having no true text files at all!)

They have text files.

Just different physical storage than those computers you have
experience with.

> I'd expect those few users will expect their quirky computers to not
> accept software that works nearly everywhere else, and accept that, or
> else they would have gotten a less quirky computer instead.

They will expect that Java code is written in a portable way.

Portability was one of the main design goals by Java.

> And in actual fact the OP's user base almost certainly consists of Unix
> sysadmins who want to view the last entry or last few entries of a
> ginormous log file without difficulty, in which case the OP could
> probably get away even with hardcoding \u000A as the line-end character
> (though I wouldn't recommend they actually do so).
>
> *That* is what's relevant here.

No.

What is relevant is that suggesting such without any notes
about portability based in guesses about the users context
is pretty bad programming.

Arne
0
Reply UTF 2/26/2011 7:43:50 PM

Daniel Pitts wrote:
> Lew wrote:
>> Robin Wenger wrote:
>>> Is it possible to read the last text line from a text file WITHOUT
>>> reading the previous (n-1) lines?
>>
>> Yes, but it's tricky. You need a random-access file and seek backwards
>> to a newline.
>>
> You can do something a little better than seeking backwards. You can make some
> guesses about line length. If it is a typical text file, you can guess that
> the length f that line is < 1024 (for instance). Seek to that location before
> the end of the file and then perform the typical "tail" operation.
>
> If you don't find the EOL as expected, you would then do the same thing, but
> start further back.

That's a form of seeking backwards.

> Also, be ware of the special case that the final line may or may not end with
> EOL. Many text files have an end of line before end of file, but not always.
> So really you want to match "(start-of-file or EOL)(final line)(Optional
> EOL)(EOF)

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/26/2011 8:36:40 PM

On 26-02-2011 05:09, Ken Wesson wrote:
> On Fri, 25 Feb 2011 12:18:27 -0500, Arne Vajhøj wrote:
>
>> On 24-02-2011 23:57, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 20:55:44 -0500, Arne Vajhøj wrote:
>>>> On 24-02-2011 13:43, Ken Wesson wrote:
>>>>> On Thu, 24 Feb 2011 14:49:22 +0000, RedGrittyBrick wrote:
>>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>>
>>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>>> so-called 'ANSI' (mostly CP-1252)
>>>>>
>>>>> Same difference.
>>>>
>>>> Completely different char set.
>>>
>>> Funny that something so "completely different" intersects with ASCII in
>>> the entirety of ASCII's range (0-127). It just specifies what 128-255
>>> mean instead of leaving those values undefined. Unicode specifies what
>>> 128-65535 mean and still intersects with ASCII on 0-127.
>>
>> It is still a different char set.
>
> It's a "different char set" in the manner that adding a suit of clothes
> to a naked person results in a "different person", Arne.

It is a different char set in the manner that adding a
suit of clothes to a naked person results in it no
longer being a naked person.

Arne
0
Reply UTF 2/26/2011 9:24:20 PM

On 26-02-2011 05:29, Ken Wesson wrote:
> On Fri, 25 Feb 2011 10:26:27 -0500, Michael Wojcik wrote:
>> Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 21:25:05 +0000, Martin Gregorie wrote:
>>>>>
>>>> You know, you sound exactly like a character who surfaced in a Y2K
>>>> newsgroup back in 1998/99. He refused to believe that any computers
>>>> apart from PCs were in use at the time.
>>>
>>> I doubt that. He may have correctly pointed out that the vast
>>> *majority* of computers were PCs at the time. (Now, laptops and
>>> smartphones may have the slight edge, or perhaps even server blades,
>>> now that typical servers are racks full of small computers instead of
>>> single big computers.)
>>
>> Embedded computers have a huge majority over all general-purpose
>> computers, by orders of magnitude
>
> I wasn't counting non-general-purpose computers. Only computers you can
> get an open-ended set of vari-purposed apps for.
>
>> The line between "smartphones" and other mobile phones is fuzzy
>
> It's pretty sharp if you use the criterion I just articulated above.

No.

You can get Java ME apps for a BlackBerry and for a cheap Nokia
phone.

One is considered a smart phone the other is not.

>> but among computing devices that support at least some general-purpose
>> applications (as opposed to dedicated controllers), phones are far and
>> away in the majority, by number of CPUs.
>>
>> In other words, wrong again, Ken.
>
> Not at all. I said desktop PCs would have been in the majority over a
> decade ago, and that phones might have an edge now. I doubt they are "far
> and away" in the majority, though.
>
> Googling suggests 41 million iPhones and another over 8 million Android
> phones have been sold. Let's round this up to an even 50 million
> smartphones, total, absorbing the small numbers of true smartphones with
> open-ended sets of downloadable apps that are neither iPhones nor
> Androids.
>
> The same methods indicate the number of PCs in use (not just sold ever,
> but in use now) at over 1 billion.
>
> So it is likely that PCs are still outnumbering phones, perhaps by as
> much as 20 to 1.

Better googling skills would have let you to:

http://en.wikipedia.org/wiki/Mobile_phones

<quote>
In the twenty years from 1990 to 2010, worldwide mobile phone 
subscriptions grew from 12.4 million to over 4.6 billion
</quote>

Arne
0
Reply UTF 2/26/2011 9:27:20 PM

On 26-02-2011 05:45, Ken Wesson wrote:
> On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:
>> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
>>> If by "very common" you mean used on one in ten thousand or fewer of
>>> their computers. For every single z/OS machine in corporate America
>>> there are probably a thousand blade servers and ten thousand office PCs
>>> and employer-provided laptops and God alone knows how many employee
>>> smartphones with plans and/or handsets paid for by their company.
>>
>> By that standard PCs, in which lets include desktops and laptops, are
>> also a tiny small proportion of all computers once you count phones and
>> all the embedded computers in vehicles.
>
> I'm only counting machines you can add onto with an open-ended set of
> software applications.

Practically all mobile phones support Java ME.

Arne
0
Reply UTF 2/26/2011 9:28:13 PM

On 26-02-2011 05:55, Ken Wesson wrote:
> On Fri, 25 Feb 2011 09:45:01 -0500, Arne Vajhøj wrote:
>> On 25-02-2011 00:26, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 21:09:44 -0500, Arne Vajhøj wrote:
>>>> On 24-02-2011 13:42, Ken Wesson wrote:
>>>>> You're one to talk about provincialism. Who the hell uses these
>>>>> ancient museum pieces any more?
>>>>
>>>> Lots of places.
>>>>
>>>> Retail sector, public sector, financial sector
>>>
>>> If you're counting it that way, that's 3 places. Hardly "lots". :)
>>
>> I have news for you - the number of business entities in those 3 sectors
>> are a lot higher than 3.
>
> But he wasn't counting business entities; he was counting the sectors
> themselves.

There are not much point in counting sectors at arbitrarily
granularity.

You could also argue that PC'es are only used 2 places
(work and private).

BTW, rather unusual to refer to yourself as "he".

>>> See other posts. Perhaps a collected few tens of thousands of computers
>>> using museum-worthy OSes like those versus a collected *billion* or
>>> more of machines running Windows, MacOS, iOS, Android, and Unix.
>>
>> There are also more flies than humans on earth.
>>
>> That does not make flies more important.
>
> Ah, so all of the thread-OP's users don't matter. They're mere flies,
> because they aren't filthy stinking rich.

But that is your claim.

You are claiming that the people using large shared computers
does not count.

We want to count people - you want to count computers.

>>>>> There is nothing at all prominent about those IBM dinosaurs. They may
>>>>> have been prominent 30 years ago, but not now.
>>>>
>>>> Both z/OS and i are widely used today.
>>>
>>> If by "widely used" you mean on one in ten thousand or fewer computers.
>>
>> But a lot more in revenue.
>
> That is not what "widely used" means. By your definition, Ferraris are
> more widely used than ordinary four-door sedans, for Christ's sake.

No.

Ferrari revenue i small. Mainframe revenue is big.

>>>>> Fine, then -- corporate America and home computers in America then.
>>>>
>>>> OK - neither z/OS or i are common on home computers.
>>>>
>>>> But they are very common in corporate America.
>>>
>>> If by "very common" you mean used on one in ten thousand or fewer of
>>> their computers. For every single z/OS machine in corporate America
>>> there are probably a thousand blade servers and ten thousand office PCs
>>> and employer-provided laptops and God alone knows how many employee
>>> smartphones with plans and/or handsets paid for by their company.
>>
>> And?
>>
>> If a company buys a mainframe for 20 M$ and 10000 PC's for 10 M$, then
>> it is 2/3 mainframe.
>
> That's not a useful way of looking at it when the topic is software
> compatibility. How large a fraction of machines the OP's software will
> run correctly on, out of the set people might try to run it on, is the
> metric that matters there.

If the programmer is developing software for money, then he cares
about how much money he can make not how many computers it will
run on.

>>>> If all z/OS systems disappeared over night then everything would break
>>>> down, because so many critical systems are running on them.
>>>
>>> A somewhat scary thought, but hardly relevant unless you're trying to
>>> stir up enough public alarm to foment a general movement to replace
>>> these legacy systems with more modern ones.
>>
>> It is relevant because the point is that most of the world important
>> data are processed by mainframes.
>
> That's clearly not true.

It is what the various analysis'es say.

That is what

>> Sure they can be replaced. 10-20 years and 10-20 trillion dollars.
>
> You're joking, right? It might cost that much to replace them with more
> of the same, but to replace them with commodity hardware and operating
> systems will certainly cost a lot less, modulo the cost of porting
> software. (In practice it probably makes more sense to phase out their
> use by just not getting new ones, or even just by having new companies
> that enter those fields use modern systems and waiting for the older
> companies in the space to die off over time, because of that porting
> cost.)

Not joking.

Why do you think it has not happened?

Arne

0
Reply UTF 2/26/2011 9:37:09 PM

On 26-02-2011 06:15, Ken Wesson wrote:
> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>> On Fri, 25 Feb 2011 06:38:32 +0100, Ken Wesson wrote:
>>> Other character sets mostly intersect in ASCII. Nearly all in any kind
>>> of widespread use intersect in using characters 10 and 13 as the
>>> potential- line-end characters. And "other record formats" are not
>>> relevant in a discussion of text files, as has been explained already.
>>
>> Bad argument
>
> Not at all.
>
>> a text file contains records. They are variable length
>> records with a 'newline' encoding as the delimiter.
>
> By that definition the concept of "record-based" vs. "not-record-based"
> becomes completely meaningless.
>
> But most of us use "records" to mean a structure that involves out-of-
> band boundaries of some sort. Linear text with inline line break etc.
> characters has only in-band boundaries and is much less structured than
> what a "record" typically implies.

A line is by definition a structure because there is something
that determines where it starts and where it ends.

Neither a count prefix or the the line delimiter are part of the
line itself.

Arne
0
Reply UTF 2/26/2011 9:44:43 PM

Ken Wesson <kwesson@gmail.com> writes:

> Your personal opinions of others are not the topic of this newsgroup. Do 
> you have anything Java-related to say?

Most of your recent rambling here has had nothing to do with Java, just
attacking other people personally and repeating yourself about your
weird ideas of what ASCII is or is not.
Look in the mirror, PLEASE!

-- 
Jukka Lahtinen
0
Reply Jukka 2/26/2011 10:13:16 PM

Ken Wesson <kwesson@gmail.com> writes:

> On Fri, 25 Feb 2011 12:34:43 -0500, Arne Vajhøj wrote:
>
>> On 25-02-2011 00:06, Ken Wesson wrote:
>>> On Thu, 24 Feb 2011 13:12:36 -0700, Jim Janney wrote:
>>>> Ken Wesson<kwesson@gmail.com>  writes:
>>>>> On Thu, 24 Feb 2011 09:18:06 -0500, Michael Wojcik wrote:
>>>>>
>>>>>> On the IBM i machines (formerly i Series, formerly System i,
>>>>>> formerly AS/400, successor to the System/3x), blah blah blah
>>>>>
>>>>> You're one to talk about provincialism. Who the hell uses these
>>>>> ancient museum pieces any more?
>>>>
>>>> Um, that would be me, or rather my employer's customers.
>>>
>>> Your employer may happen to be using such legacy systems, but I very
>>> much doubt that very many people deal with them in an IT capacity. Far,
>>> *far* fewer than deal with Unix, Windows, and Mac boxes in such a
>>> capacity.
>>>
>>> How many end-users interact indirectly with these systems is of course
>>> irrelevant.
>> 
>> Not really
>
> Yes, really.
>
> Let us recall the context in which this silly argument blew up, shall we?
>
> Someone asked for an efficient way to get at the last line of a text 
> file. *Several* people suggested seeking backwards from the end to find a 
> newline character. (Interestingly, only one of those people has since 
> been subjected to flamage for this suggestion. I wonder why?)
>
> You and some others pooh-poohed that suggestion because it might not work 
> properly on fewer than 0.01% (you've since admitted fewer than 0.001%) of 
> computers, with the implied grounds that any number at all above zero is 
> unacceptable.
>
> I've got news for you. That suggestion also won't work on any system 
> without a JVM.
>
> It won't work on a 2MB RAM 286 too cramped for the program to run on.
>
> It won't work on an Xbox 360 or any other system that won't run unsigned 
> binaries and whose signing authority won't sign this program.
>
> And so on.

Your argument works for Java programs that only work with files stored
on the system that they are running on, but this is another bad
assumption.  The programs I work on run under Windows JVMs but still
read and write files on the AS/400 (or whatever IBM is calling it this
month) file system.

Even dinosaurs know about networked file systems.

-- 
Jim Janney
0
Reply Jim 2/26/2011 11:35:09 PM

On Sat, 26 Feb 2011 14:07:49 -0500, Arne Vajhøj spammed:
> On 26-02-2011 04:43, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 12:51:12 -0500, Arne Vajhøj wrote:
>>> Whether the system used LF delimiters or CR LF delimters or a counted
>>> approach does not matter.
>>
>> Except, of course, that you don't read "a counted approach" as lines of
>> text; you read it as binary integers mixed with text strings. It is
>> not, physically, a text file.
> 
> It is a text file and can be used as a text file.

No, it isn't and it can't. There is additional information in the file 
that is lost if it is treated as text.

>> Yes. A line break in the middle of a line is utter nonsense, the
>> logical equivalent of an odd even number or an endpoint of a circle or
>> a corner of a disc.
> 
> It is actually very logical that something that is not considered a line
> break can be in the middle of a line.

That's irrelevant. We were discussing line breaks.

>>> [ASCII 10] is perfectly valid as content in the middle of a line on
>>> older MacOS systems
>>
>> Sophistry. Those just use ASCII 13 to mean the same thing.
> 
> Yes.
> 
> And in a counted prefix format both LF and CR are valid in lines.

You run aground on the fact that a counted prefix format uses a line end 
that is not representable in any character set, not even full Unicode. 
There is therefore no way to faithfully represent the file's contents in 
a String, say, and then copy it back out again, losslessly. You have to 
change the out-of-band, non-character line ends to *some* character, 
which may as well be ASCII 10, and then you have no way of distinguishing 
true ASCII 10 characters from these record boundaries. That data is lost 
in translation. The very fact that there *is* a translation for something 
to be lost *in* when you load the file as text *proves* that the file *is 
not a text file*.

>>>> So there is no "advantage" here. What you are actually describing is
>>>> a "list-of-strings" file, not a text file
>>>
>>> A text file is a list of strings.
>>
>> No, a text file is a single string.
> 
> No.

Yes.

Try to read one of your weird files into vi, say, and then output it 
again, and it will not be lossless. Most likely 0x0A characters in the 
original would be replaced with record boundaries in the output if you 
read the file in vi and then saved out a copy. But ultimately the problem 
is that you have N+1 kinds of things there -- N characters plus record 
boundaries -- and even a full Unicode editor is representing them 
internally as a buffer each element of which is one of only N characters. 
Something has to give.

>> I said "the formats commonly used to store, e.g., C source files". No
>> "count prefix line format" is *commonly* used to store C source files
>> -- 99.99% or more of C source files residing on hard disks in this
>> world are undoubtedly in fact LF-delimited, and most of the rest CRLF
>> delimited (Windows wackiness strikes again).
> 
> You assumption that amount of C cod eon non-*nix platforms to be less
> than 0.01% is rather amusing.

Why? It's certainly true; Windows systems, smartphones, and embedded 
systems are more numerous but very few of them tend to contain C source 
files (even if most have some compiled C binaries on them), and non-Unix 
machines that aren't any of those either are far less numerous. Whereas 
virtually every Unix machine has C source files in great profusion stored 
on it somewhere.

> Bark bark bark bark bark bark bark bark bark bark bark bark!

Wow, that's one hell of a sore throat, Arne; you should probably get that 
looked at.
0
Reply Ken 2/27/2011 1:24:08 PM

On Sat, 26 Feb 2011 14:12:44 -0500, Arne Vajhøj wrote:

> On 26-02-2011 08:36, Peter Duniho wrote:
>> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>>> The usefulness of the term "text file" for me is that it describes a
>>> file that can be opened, viewed and used by every application, tool
>>> and utility, on every OS and platform, that purports to be a "text
>>> editor".
>>
>> Then I think you need to define "text file" more narrowly than what is
>> actually out there. In this thread alone, there have been mentioned a
>> number of true text file formats that are simply not readable in your
>> average or even above-average text editor found on mainstream OSs.
> 
> They are edited fine by any text editor on those systems.
> 
> This includes cross platform editors that are also available on *nix and
> Windows.

No. Those editors, at minimum, will strip some information from the file, 
even if you just open it and then save out a copy. If the original had my 
hypothetical hidden message "the attack begins at midnight" encoded in a 
pattern of actual 0x0A characters and record boundaries, the output will 
not, and will generally have converted all 0x0A characters in the 
original into record boundaries specifically, in that

foo<0x0A>bar<boundary>baz

would probably have ended up in the vi buffer (or emacs buffer, or 
whatever) as

foo<0x0A>bar<0x0A>baz

and then output would be converted to

foo<boundary>bar<boundary>baz.

If a text editor like vi cannot losslessly load and re-save the file then 
it is not a text file by any sane definition, including particularly 
Arved's pretty darn good definition.

> If the files are FTP'ed to a Unix box in text mode they can be edited
> with any Unix text editor.
> 
> If their location is mounted as a Samba drive, then they can be edited
> from Windows with any Windows text editor.

And "the attack begins at midnight" will not survive this editing, even 
if all you do is load and immediately re-save the file without making any 
intentional changes.
0
Reply Ken 2/27/2011 1:28:34 PM

On Sat, 26 Feb 2011 14:19:38 -0500, Arne Vajhøj wrote:

> On 26-02-2011 04:45, Ken Wesson wrote:
>> I don't see an argument of any kind in your post. Forget to include
>> one?
> 
> The quotation

was all that there was, besides attributions and your signoff "Arne". 
Your post contained no meaningful original text.
0
Reply Ken 2/27/2011 1:29:29 PM

On Sat, 26 Feb 2011 16:24:20 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:09, Ken Wesson wrote:
>> It's a "different char set" in the manner that adding a suit of clothes
>> to a naked person results in a "different person", Arne.
> 
> It is a different char set in the manner that adding a suit of clothes
> to a naked person results in it no longer being a naked person.

Wrong.
0
Reply Ken 2/27/2011 1:30:03 PM

On Sat, 26 Feb 2011 14:36:03 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:13, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 12:30:23 -0500, Arne Vajhøj wrote:
>>> On 25-02-2011 00:00, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 20:58:24 -0500, Arne Vajhøj wrote:
>>>>> On 24-02-2011 19:12, Lew wrote:
>>>>>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>>>
>>>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>>>> so-called
>>>>>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>>>>>> BOM).
>>>>>>>
>>>>>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>>>>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document
>>>>>>> (turns out to be CP-1252), Text-Document DOS format (turns out to
>>>>>>> be CP-850) and Unicode. No
>>>>>>> ASCII.
>>>>>>
>>>>>> Windows hasn't used ASCII in decades.
>>>>>
>>>>> I don't think it ever have.
>>>>
>>>> Funny then that bog-standard ASCII files seem to read and write just
>>>> fine in Notepad on the occasions that I use Windows computers.
>>>
>>> That just mean that it use something ASCII compatible - not that it
>>> uses ASCII.
>>
>> Sophistry.
> 
> Simple fact.

No, sophistry. You can't use a superset of ASCII without using ASCII, any 
more than you can take a bath in soapy water without using water.

>>> And you can easily verify that it indeed supports characters not part
>>> of ASCII.
>>
>> Never said it didn't.
> 
> Yes - you did.

No - I didn't.

> You said they used ASCII.

I didn't say they used *only* ASCII.

> If they did that then they would not support characters not ASCII.

Don't be silly. If I write software that uses TCP/IP does that mean it 
does not support any communication protocols other than TCP/IP? What if 
it directly sends serial port commands to /dev/lpt0 as well as using TCP/
IP? Uh-oh! According to Arne such software is impossible! And yet I note 
that there are web browsers that can both surf and print. Some can even 
do them at the same time. :)

>>>> All of those seem to be ASCII plus another up to 128 characters, or
>>>> in the case of UTF-16, another up to 65408 characters.
>>>>
>>>> Saying that a 7-bit-clean file interpreted in one of those is not
>>>> ASCII is like saying that humans are not mammals.
>>>
>>> And?
>>>
>>> Noone is saying that such a file is not ASCII.
>>
>> You were.
> 
> No.

Yes, you were! You even did it again, above.

I think I'm getting close to the point of plonking you, Arne. When you're 
not barking at me like some rabid wolf you're mainly engaging in shallow 
forms of intellectual dishonesty such as the above.

>>> PS: UTF-16 is *not* ASCII compatible.
>>
>> It is if you strip the high bytes and not just the 7th bits.
> 
> Which means non compatible.

Nonsense.

0
Reply Ken 2/27/2011 1:35:11 PM

On Sat, 26 Feb 2011 14:29:08 -0500, Arne Vajhøj yipped:

> On 26-02-2011 05:17, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:
>>> The PC/mainframe ratio is probably like 100000:1.
>>
>> Hence why I said *at least*. I was being conservative in my estimates
>> -- as generous to *your* case as possible. And still I was demolishing
>> it.
> 
> Not at all.

I was demolishing it. You can scream "not at all" until you are blue in 
the face Arne but the universe will not magically change the facts to 
oblige you in that wish.

>>> But the relevance is not that big. Because mainframes happen to be a
>>> lot more expensive than PC's.
>>
>> One computer is still one computer, no matter how expensive it is. It's
>> the price tag whose relevance is not that big.
> 
> I have news for you

We've been over this time and again, Arne. Suppose I wrote an iPhone app. 
Suppose I followed your theory of who to design it for. So I made it run 
perfectly on supercomputers, mainframes, and other big, expensive, mostly 
obsolete behemoths at the expense if it working on iPhones. Would I still 
be in business a month later?

Now suppose instead that I followed my theory. So I decide I don't care 
HOW much IBM big iron costs, I'm going to worry only about making the app 
work as well as possible on iPhones, the platform the users will actually 
be running it on. And a month later I have oodles of happy customers and 
I'm laughing all the way to the bank.

Starting to figure it out yet, Arne, where you went wrong?

>>> The you won't have any users using ASCII.
>>
>> Funnily enough, all of them can cope just fine with ASCII text files. I
>> wonder how that can be, Arne, unless of course you're wrong yet again.
> 
> It is called "backwards compatibility".

That's very fascinating, Arne, but it does not alter the fact that they 
can cope just fine with ASCII text files. :)

> Bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark bark!

You know, cljp wouldn't be half bad as a place to hang out if it weren't 
for that annoying yapping terrier. I don't suppose you guys could be 
convinced to dump it at the pound and spring for a new mascot? Maybe a 
little Persian cat that will mostly sit quietly in a corner looking cute, 
say, or even a well-behaved standard poodle or something.
0
Reply Ken 2/27/2011 1:40:59 PM

On Sat, 26 Feb 2011 14:43:50 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:24, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 12:34:43 -0500, Arne Vajhøj wrote:
>>> On 25-02-2011 00:06, Ken Wesson wrote:
>>>> How many end-users interact indirectly with these systems is of
>>>> course irrelevant.
>>>
>>> Not really
>>
>> Yes, really.
>>
>> Let us recall the context in which this silly argument blew up, shall
>> we?
>>
>> Someone asked for an efficient way to get at the last line of a text
>> file. *Several* people suggested seeking backwards from the end to find
>> a newline character. (Interestingly, only one of those people has since
>> been subjected to flamage for this suggestion. I wonder why?)
> 
> Actually no one was flamed

Wow. This is one for the history files, folks. First "mistakes were made" 
and later "we had to destroy the village to save it". And now we have "no 
one was flamed" right in the middle of a raging inferno. Fucking 
unbelivable.

> You bark bark bark bark!
> - you bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark!

Someone take that goddamn mutt for a walk already before I really lose 
it. :P

>> You and some others pooh-poohed that suggestion because it might not
>> work properly on fewer than 0.01% (you've since admitted fewer than
>> 0.001%) of computers, with the implied grounds that any number at all
>> above zero is unacceptable.
> 
> I don't even think anyone claimed it was unacceptable.

Claimed outright? Perhaps not. But you sure did imply it with your 
repeated attacks on me for daring to suggest a method for efficiently 
reading the last line of a Unix log file that might break on somewhere 
around 0.001% of all general-purpose computers while working perfectly on 
every single POSIX-compliant one in existence.

> The point was that it was a solution that was not guaranteed to work on
> all platforms.

Actually, it is, if one construes "text file" correctly. It's just that 
there are apparently some platforms that have no text files, and 
therefore where code that is specifically designed for operating on text 
files is useless (not even broken, just inapplicable, useless for much 
the reason a Win32 PE executable debugging tool is useless on a Linux 
box).

And it's extraordinarily unlikely any of the OP's intended user base 
would ever have any problem with it, in particular.

>> NO program will work on *all computers*, Arne. So that goal is
>> unattainable.
> 
> That is not so relevant.

Sure it is, unless you're saying that this is also not so relevant:

> The point was that it was a solution that was not guaranteed to work on
> all platforms.

So, which is it, Arne? Is it relevant, or isn't it?

> That is not an excuse for writing non portable Java code without even
> noting the portability problem.

How is it non portable? Just because it's for working with text files and 
therefore there simply isn't anything for it to do on systems that don't 
have any text files doesn't make it non-portable.

>> Hardly any users, if any at all, of the OP's program would be using it
>> on a weird machine like those 0.001% you say *a subset* of which don't
>> store text normally. (Apparently having no true text files at all!)
> 
> They have text files.

Apparently they don't. They have "text files" that store additional data 
and may lose information if loaded into vi and saved back out again, so 
are not actually text files but a type of binary file being used largely 
as a repository for text.

It is not a portability error if a tool designed for working on text 
files does goofy things when you use it on a binary file. The Java 
program would be as portable as vi is, if not more so -- but neither of 
them could be used on these particular files losslessly, that's all.

> Just different physical storage than those computers you have experience
> with.

No, as has been proven by the "the attack begins at midnight" thought-
experiment.

>> I'd expect those few users will expect their quirky computers to not
>> accept software that works nearly everywhere else, and accept that, or
>> else they would have gotten a less quirky computer instead.
> 
> They will expect that Java code is written in a portable way.
> 
> Portability was one of the main design goals by Java.

They will not expect that a program explicitly documented as being for 
working only with text files will work on binary files.

>> And in actual fact the OP's user base almost certainly consists of Unix
>> sysadmins who want to view the last entry or last few entries of a
>> ginormous log file without difficulty, in which case the OP could
>> probably get away even with hardcoding \u000A as the line-end character
>> (though I wouldn't recommend they actually do so).
>>
>> *That* is what's relevant here.
> 
> No.

Yes.

> What is relevant is that suggesting such without any notes about
> portability based in guesses about the users context is pretty bad
> programming.

According to Arne here posting code for working with text files that may 
break when used on certain binary files is "pretty bad programming". 
Fine, I accept the charge -- but them I'm in distinguished company, 
including all of the developers who worked on vi. You might be familiar 
with one of their names: Bill Joy...
0
Reply kwesson (107) 2/27/2011 1:53:08 PM

On Sat, 26 Feb 2011 16:35:09 -0700, Jim Janney wrote:

> Ken Wesson <kwesson@gmail.com> writes:
> 
>> Let us recall the context in which this silly argument blew up, shall
>> we?
>>
>> Someone asked for an efficient way to get at the last line of a text
>> file. *Several* people suggested seeking backwards from the end to find
>> a newline character. (Interestingly, only one of those people has since
>> been subjected to flamage for this suggestion. I wonder why?)
>>
>> You and some others pooh-poohed that suggestion because it might not
>> work properly on fewer than 0.01% (you've since admitted fewer than
>> 0.001%) of computers, with the implied grounds that any number at all
>> above zero is unacceptable.
>>
>> I've got news for you. That suggestion also won't work on any system
>> without a JVM.
>>
>> It won't work on a 2MB RAM 286 too cramped for the program to run on.
>>
>> It won't work on an Xbox 360 or any other system that won't run
>> unsigned binaries and whose signing authority won't sign this program.
>>
>> And so on.
> 
> Your argument works for Java programs that only work with files stored
> on the system that they are running on, but this is another bad
> assumption.

I don't think so.

> The programs I work on run under Windows JVMs but still read and write
> files on the AS/400 (or whatever IBM is calling it this month) file
> system.

Perhaps, but if the OP's code is for working with Unix log files -- well, 
who the heck is likely to store Unix log files on an AS/400 box? The Unix 
machines networked with the hypothetical AS/400 box will no doubt store 
their log files on their own hard drives, not the AS/400 box's.
0
Reply kwesson (107) 2/27/2011 1:55:22 PM

On Sat, 26 Feb 2011 16:27:20 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:29, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 10:26:27 -0500, Michael Wojcik wrote:
>>> The line between "smartphones" and other mobile phones is fuzzy
>>
>> It's pretty sharp if you use the criterion I just articulated above.
> 
> No.

Yes. The criterion I articulated was exact. A phone either satisfies it 
or does not, and there are no ambiguous cases; there is no "gray area". 
Therefore it's pretty sharp if you use that criterion.

>> Not at all. I said desktop PCs would have been in the majority over a
>> decade ago, and that phones might have an edge now. I doubt they are
>> "far and away" in the majority, though.
>>
>> Googling suggests 41 million iPhones and another over 8 million Android
>> phones have been sold. Let's round this up to an even 50 million
>> smartphones, total, absorbing the small numbers of true smartphones
>> with open-ended sets of downloadable apps that are neither iPhones nor
>> Androids.
>>
>> The same methods indicate the number of PCs in use (not just sold ever,
>> but in use now) at over 1 billion.
>>
>> So it is likely that PCs are still outnumbering phones, perhaps by as
>> much as 20 to 1.
> 
> Better googling skills bark bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark!

If you jump up at me, I will take action to defend myself, and I outmass 
all terriers by *at least* a factor of 20 to 1, so you *will* get the raw 
end of it!

> <quote>
> In the twenty years from 1990 to 2010, worldwide mobile phone
> subscriptions grew from 12.4 million to over 4.6 billion </quote>

Fascinating. But we weren't counting all cell phones. We were only 
counting phones with app stores and the like.
0
Reply kwesson (107) 2/27/2011 1:59:21 PM

On Sun, 27 Feb 2011 00:13:16 +0200, Jukka Lahtinen wrote:

> Ken Wesson <kwesson@gmail.com> writes:
> 
>> Your personal opinions of others are not the topic of this newsgroup.
>> Do you have anything Java-related to say?
> 
> Most of your recent rambling here has had nothing to do with Java

It's had as much to do with Java as what it's been replying to. When 
attacked unfairly, I respond in my own defense and debunk the claims 
about me made by my attacker. If people don't like such off-topic posts 
by me appearing here then they should not make posts that personally 
attack me; it is that simple. In fact, each such post *not* made reduces 
the off-topic posts here by two -- the attacker's post *and* my followup. 
More, given the tendency people seem to have to attack that followup.

> just attacking other people personally

What?! No, I am just calling them as I see them. Arne is the principal 
villain here with respect to "attacking other people personally". Isn't 
that obvious even from a quick skim over some of this thread?

> and repeating yourself about your weird ideas of what ASCII is or is
> not.

So, "ASCII is a subset of ISO-8859-1" is a "weird idea"? And "a file 
format with the property that a file in that format may lose information 
if read into a java.lang.String and written back out again, or read into 
vi or emacs and written back out again, is not a text file format" is a 
"weird idea about ASCII"??

> Look in the mirror, PLEASE!

Tell that to Arne. And yourself.
0
Reply kwesson (107) 2/27/2011 2:05:07 PM

On Sat, 26 Feb 2011 14:33:40 -0500, Arne Vajhøj growled:

> On 26-02-2011 05:43, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 12:33:16 -0500, Arne Vajhøj wrote:
>>> On 25-02-2011 00:19, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 17:11:02 -0500, Michael Wojcik wrote:
>>>>> And the majority of business transactions
>>>>
>>>> have no bearing on this discussion, which has to do with the majority
>>>> of *computers* and, secondarily, what will be encountered routinely
>>>> by the majority of *IT workers*.
>>>
>>> Well the topic was
>>
>> The topic was as I stated above.
> 
> No.

Yes.

> You bark bark!
>
> Bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark!
>
> Bark bark bark bark bark!

Perhaps some lozenges?

>>> And since somebody is willing to pay a lot more for a mainframe
>>> running an entire bank than for somebody to be able to read email,
>>> then counting computers does not really reflect
>>
>> The original debate arose from your worry that if the thread's OP
>> counted backward from the end of a file to the final newline character
>> in it, this would break on a handful of oddball systems.
> 
> How does your bark bark bark!

OK, I think we're done here.

0
Reply Ken 2/27/2011 2:07:36 PM

On 27-02-2011 08:59, Ken Wesson wrote:
> On Sat, 26 Feb 2011 16:27:20 -0500, Arne Vajhøj wrote:
>> <quote>
>> In the twenty years from 1990 to 2010, worldwide mobile phone
>> subscriptions grew from 12.4 million to over 4.6 billion</quote>
>
> Fascinating. But we weren't counting all cell phones. We were only
> counting phones with app stores and the like.

Practically all of these phones can run Java ME apps.

Arne

0
Reply UTF 2/27/2011 2:08:04 PM

On Sat, 26 Feb 2011 16:28:13 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:45, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:
>>> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
>>>> If by "very common" you mean used on one in ten thousand or fewer of
>>>> their computers. For every single z/OS machine in corporate America
>>>> there are probably a thousand blade servers and ten thousand office
>>>> PCs and employer-provided laptops and God alone knows how many
>>>> employee smartphones with plans and/or handsets paid for by their
>>>> company.
>>>
>>> By that standard PCs, in which lets include desktops and laptops, are
>>> also a tiny small proportion of all computers once you count phones
>>> and all the embedded computers in vehicles.
>>
>> I'm only counting machines you can add onto with an open-ended set of
>> software applications.
> 
> Practically all mobile phones

Don't be ridiculous. There are exactly two app stores out there so far 
for phones: the Android Market and the Apple App Store.
0
Reply Ken 2/27/2011 2:08:55 PM

On 2/27/11 9:59 PM, Ken Wesson wrote:
> [...]
>> Better googling skills bark bark bark bark bark!
>>
>> Bark bark bark bark bark bark bark bark bark!
>
> If you jump up at me, I will take action to defend myself, and I outmass
> all terriers by *at least* a factor of 20 to 1, so you *will* get the raw
> end of it!

20 to 1?  _All_ terriers?  Well, let's see…a small Airedale runs around 
50 lbs, which puts you at around half a ton.

So, is being too fat to get out of your bed the reason why you act the 
way you do?

I see from your re-writing of Arne's posts (quite ironic coming from 
someone whining about someone else doing nothing more than selectively 
removing words from quotes of your own posts) that while your weight may 
exceed even the largest terrier by a substantial amount (no, 
seriously…you really should see a doctor about that), your maturity 
exceeds not even the most child-like one.

Pete
0
Reply Peter 2/27/2011 2:10:04 PM

On 27-02-2011 08:55, Ken Wesson wrote:
> On Sat, 26 Feb 2011 16:35:09 -0700, Jim Janney wrote:
>> The programs I work on run under Windows JVMs but still read and write
>> files on the AS/400 (or whatever IBM is calling it this month) file
>> system.
>
> Perhaps, but if the OP's code is for working with Unix log files -- well,
> who the heck is likely to store Unix log files on an AS/400 box? The Unix
> machines networked with the hypothetical AS/400 box will no doubt store
> their log files on their own hard drives, not the AS/400 box's.

I don't think that it is good to give advice on how
to read the last line from a text file in Java based
on an assumption that it must be a Unix log file when the
OP did not indicate so.

Arne
0
Reply UTF 2/27/2011 2:13:05 PM

On 27-02-2011 09:08, Ken Wesson wrote:
> On Sat, 26 Feb 2011 16:28:13 -0500, Arne Vajhøj wrote:
>
>> On 26-02-2011 05:45, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:
>>>> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
>>>>> If by "very common" you mean used on one in ten thousand or fewer of
>>>>> their computers. For every single z/OS machine in corporate America
>>>>> there are probably a thousand blade servers and ten thousand office
>>>>> PCs and employer-provided laptops and God alone knows how many
>>>>> employee smartphones with plans and/or handsets paid for by their
>>>>> company.
>>>>
>>>> By that standard PCs, in which lets include desktops and laptops, are
>>>> also a tiny small proportion of all computers once you count phones
>>>> and all the embedded computers in vehicles.
>>>
>>> I'm only counting machines you can add onto with an open-ended set of
>>> software applications.
>>
>> Practically all mobile phones
>
> Don't be ridiculous. There are exactly two app stores out there so far
> for phones: the Android Market and the Apple App Store.

Java ME apps could be put on mobile phones long time
before app stores were invented.

And two app stores? Where have you been living????

Check out:
 
http://en.wikipedia.org/wiki/List_of_digital_distribution_platforms_for_mobile_devices

Arne

0
Reply arne6 (9487) 2/27/2011 2:17:31 PM

On 27-02-2011 09:10, Peter Duniho wrote:
> On 2/27/11 9:59 PM, Ken Wesson wrote:
>> [...]
>>> Better googling skills bark bark bark bark bark!
>>>
>>> Bark bark bark bark bark bark bark bark bark!
>>
>> If you jump up at me, I will take action to defend myself, and I outmass
>> all terriers by *at least* a factor of 20 to 1, so you *will* get the raw
>> end of it!
>
> 20 to 1? _All_ terriers? Well, let's see…a small Airedale runs around 50
> lbs, which puts you at around half a ton.

Even a small terrier at 20 pound put him at 400 pounds, which would
cause some health concerns.

Arne

0
Reply arne6 (9487) 2/27/2011 2:19:37 PM

On Sat, 26 Feb 2011 16:37:09 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:55, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 09:45:01 -0500, Arne Vajhøj wrote:
>>> On 25-02-2011 00:26, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 21:09:44 -0500, Arne Vajhøj wrote:
>>>>> On 24-02-2011 13:42, Ken Wesson wrote:
>>>>>> You're one to talk about provincialism. Who the hell uses these
>>>>>> ancient museum pieces any more?
>>>>>
>>>>> Lots of places.
>>>>>
>>>>> Retail sector, public sector, financial sector
>>>>
>>>> If you're counting it that way, that's 3 places. Hardly "lots". :)
>>>
>>> I have news for you - the number of business entities in those 3
>>> sectors are a lot higher than 3.
>>
>> But he wasn't counting business entities; he was counting the sectors
>> themselves.
> 
> There are not much point in counting sectors at arbitrarily granularity.

Tell that to whoever wrote

>>>>> Lots of places.
>>>>>
>>>>> Retail sector, public sector, financial sector

(Hey, according to the attribution lines, that person was you!)

>>>> See other posts. Perhaps a collected few tens of thousands of
>>>> computers using museum-worthy OSes like those versus a collected
>>>> *billion* or more of machines running Windows, MacOS, iOS, Android,
>>>> and Unix.
>>>
>>> There are also more flies than humans on earth.
>>>
>>> That does not make flies more important.
>>
>> Ah, so all of the thread-OP's users don't matter. They're mere flies,
>> because they aren't filthy stinking rich.
> 
> But that is your claim.

No, it is yours. You were the one who said only rich people are important 
and the rest are as flies to their humans.

> You are claiming that the people using large shared computers does not
> count.

Actually, I am claiming that the people deploying the OP's app are the 
ones that count.

>>>>>> There is nothing at all prominent about those IBM dinosaurs. They
>>>>>> may have been prominent 30 years ago, but not now.
>>>>>
>>>>> Both z/OS and i are widely used today.
>>>>
>>>> If by "widely used" you mean on one in ten thousand or fewer
>>>> computers.
>>>
>>> But a lot more in revenue.
>>
>> That is not what "widely used" means. By your definition, Ferraris are
>> more widely used than ordinary four-door sedans, for Christ's sake.
> 
> No.

Yes.

But here in the real world widely used refers purely to frequency, 
relative to something. A model is not "widely used" if another is 10,000 
times more frequent, for instance.

>>> If a company buys a mainframe for 20 M$ and 10000 PC's for 10 M$, then
>>> it is 2/3 mainframe.
>>
>> That's not a useful way of looking at it when the topic is software
>> compatibility. How large a fraction of machines the OP's software will
>> run correctly on, out of the set people might try to run it on, is the
>> metric that matters there.
> 
> If the programmer is developing software for money, then he cares about
> how much money he can make not how many computers it will run on.

The price tags of the computers are irrelevant to that, too, Arne.

>>> It is relevant because the point is that most of the world important
>>> data are processed by mainframes.
>>
>> That's clearly not true.
> 
> It is what the various analysis'es say.

Flawed ones, then. First of all, how exactly are they defining 
"important" with respect to data? And are they measuring this data by the 
megabyte or in some wacky way instead, such as trying to count discrete 
user-level objects? (That way lies madness of course, since objects may 
nest. Is a word processing file with embedded graphs one item or several?)

I'm sure using any reasonably sane definition of the terms you'd find 
that a relatively small proportion of "important data" -- perhaps 
gigabytes to a few terabytes -- flows through mainframes on any given 
day, while petabytes of "importand data" flows through Unix servers but 
not mainframes.

Actually if there's any single class of machine that carries "the 
greatest" amount of "important data" it is surely the humble TCP/IP 
router and not a general purpose computer at all!

> That is what

What?

>>> Sure they can be replaced. 10-20 years and 10-20 trillion dollars.
>>
>> You're joking, right? It might cost that much to replace them with more
>> of the same, but to replace them with commodity hardware and operating
>> systems will certainly cost a lot less, modulo the cost of porting
>> software. (In practice it probably makes more sense to phase out their
>> use by just not getting new ones, or even just by having new companies
>> that enter those fields use modern systems and waiting for the older
>> companies in the space to die off over time, because of that porting
>> cost.)
> 
> Not joking.
> 
> Why do you think it has not happened?

The usual reasons: sysadmins too busy putting out fires to do a massive 
upgrade/overhaul, budgetary constraints, nobody feels like trying to port 
all that awful COBOL crap today, procrastination, stupidity, and general 
lack of will.
0
Reply kwesson (107) 2/27/2011 2:25:16 PM

On 02/27/2011 09:13 AM, Arne Vajhøj wrote:
> On 27-02-2011 08:55, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 16:35:09 -0700, Jim Janney wrote:
>>> The programs I work on run under Windows JVMs but still read and write
>>> files on the AS/400 (or whatever IBM is calling it this month) file
>>> system.
>>
>> Perhaps, but if the OP's code is for working with Unix log files -- well,
>> who the heck is likely to store Unix log files on an AS/400 box? The Unix
>> machines networked with the hypothetical AS/400 box will no doubt store
>> their log files on their own hard drives, not the AS/400 box's.
>
> I don't think that it is good to give advice on how
> to read the last line from a text file in Java based
> on an assumption that it must be a Unix log file when the
> OP did not indicate so.

In this forum people all the time try to make the claim that an OP meant 
something not in the original post.  For example, the other thread where the 
OP asked how to produce a 'List <String1>' and everyone (except me) assumed 
and argued that they *must* have meant 'List <String>', even though they took 
great pains not to say so.

So I would expand your advice to add that one eschew assuming anything outside 
the problem statement.  Beyond that, because this is a discussion forum and 
not a help desk, it is entirely appropriate to discuss the general 
applicability of principles elicited from a specific problem.  Thus, even if 
the OP did want to speak only of log files, it is important and highly 
relevant to point out that "text files" (about which they actually did ask) 
have a wider and fuzzer meaning that certain ignorant trolls would believe.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/27/2011 2:27:11 PM

On Sat, 26 Feb 2011 14:26:18 -0500, Arne Vajhøj wrote:

> On 26-02-2011 05:56, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 09:46:30 -0500, Arne Vajhøj wrote:
>>> On 25-02-2011 00:28, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:
>>>>> And it is a pretty good guess that the RandomAccessFile searching
>>>>> for CR and LF will fail on i also then.
>>>>
>>>> How fortunate that i runs on fewer than one in ten thousand machines.
>>>> Does Java even run on i?
>>>
>>> Yes.
>>
>> And what the hell is it used for on i?
> 
> The same that Java is used for on other platforms.

Evasion noted. But people don't use i to play games or fileshare or run 
web services, which are the most usual places I see Java being used. They 
play Java games on Java phones and Windows, fileshare with Limewire on 
(mainly) Windows, and run Java web services on Unix.

What do people do on i boxes?

>>>>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>>>>
>>>> Both contain ASCII as a subset -- if you take a pure-ASCII file and
>>>> reencode it in either the result is the identical byte sequence.
>>>
>>> Yes
>>
>> There you go.
> 
> Exactly.

Then I do hope you will drop this particular part of the argument now, 
Arne.

> You bark bark bark?

Sorry, Arne, but you *are* going to the vet today and that's final!
0
Reply Ken 2/27/2011 2:27:51 PM

On 02/27/2011 09:10 AM, Peter Duniho wrote:
> On 2/27/11 9:59 PM, Ken Wesson wrote:
>> [...]
>>> Better googling skills bark bark bark bark bark!
>>>
>>> Bark bark bark bark bark bark bark bark bark!
>>
>> If you jump up at me, I will take action to defend myself, and I outmass
>> all terriers by *at least* a factor of 20 to 1, so you *will* get the raw
>> end of it!

Now we know why he finds fear of violence to be an acceptable workplace 
parameter.  Clearly Paul here, ahem, sorry, I mean "Ken Wesson" here is a 
violent psychopath.

> 20 to 1? _All_ terriers? Well, let's see…a small Airedale runs around 50 lbs,
> which puts you at around half a ton.

> So, is being too fat to get out of your bed the reason why you act the way you
> do?
>
> I see from your re-writing of Arne's posts (quite ironic coming from someone
> whining about someone else doing nothing more than selectively removing words
> from quotes of your own posts) that while your weight may exceed even the
> largest terrier by a substantial amount (no, seriously…you really should see a
> doctor about that), your maturity exceeds not even the most child-like one.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/27/2011 2:31:05 PM

On Sat, 26 Feb 2011 13:29:48 +0000, Martin Gregorie wrote:

> On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:
> 
>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>> 
>>> a text file contains records. They are variable length records with a
>>> 'newline' encoding as the delimiter.
>> 
>> By that definition the concept of "record-based" vs. "not-record-based"
>> becomes completely meaningless.
>
> It is pretty much meaningless unless you're referring to the way a
> programs handles data. Consider a file containing nothing but printable
> characters:
> 
> - if a C or Java program reads the file byte by byte or parses it
>   by reading words separated by whitespace then line delimiters are
>   utterly meaningless and the program doesn't care whether the file
>   contains records or not.
> 
> - OTOH if a different program reads the same file a line at a time, e.g
>   C using fgets(), Java using BufferedReader.readLine(), then this is
>   pure record-level access.

But the text file itself is not "record-based". You can implement a 
record-based format *on top of text* -- CSV goes further that way -- but 
the resulting file, crucially, can still be manipulated with tools 
designed for generic operations on arbitrary text files properly. In 
particular, this should be lossless on it:

import java.io.*;

public class TextFileCopier {
    public static void main (String[] args) throws IOException {
        if (args.length < 3) {
            System.out.println("Please specify source and" +
                "destination file.");
            return;
        }
        File f = new File(args[1]);
        InputStream is = new FileInputStream(f);
        Reader rdr = new InputStreamReader(is);
        File g = new File(args[2]);
        OutputStream os = new FileOutputStream(g);
        Writer wtr = new OutputStreamWriter(os);
        int c;
        while ((c = rdr.read()) != -1) wtr.write(c);
    }
}

But this won't be lossless on the strange file formats Arne has become 
obsessed with. At the reading stage, the record boundaries in those file 
formats will be translated into some newline character or another, likely 
\u000A. When that happens, the distinction between those and literal 
\u000A characters in the source file will be lost and can never be 
regained.

Surely you agree that a file format cannot be regarded as a true text 
file format unless the above TextFileCopier can copy all files in that 
format faithfully?

>> But most of us use "records" to mean a structure that involves out-of-
>> band boundaries of some sort.
>
> Not necessarily.

Yes necessarily, where "out of band" is taken with respect to whatever is 
in the record fields.

> However, fixed length records made up of fixed length fields contain no
> out-of-band structure. You want an example? How about the two magnetic
> stripe tracks on a credit card - 40 bytes and containing fields whose
> content and meaning are defined by their position.

The boundaries are certainly out of band here -- they aren't represented 
in the data itself at all, but rather in the reader and writer software!

>>> BTW, you can use C to handle iSeries text files through the usual
>>> gets() and puts() functions despite the iSeries holding text in what
>>> are effectively database rows. They have three fields per row - a line
>>> number, a fixed length text field and an 8 byte ID.
>> 
>> That's text plus file metadata.
>
> Indeed it is. Technically it is made up of fixed length fields with no
> delimiters. Apart from the record description that forms part of every
> file and the member separators the only metadata is similar to a UNIX
> directory entry plus the i-node. OS/400 and Z/OS text files are closer
> to a tar or zip file than what a Unix or Windows user considers to be a
> text file because you can store many separate chunks of text in a single
> text file.

So which is it -- a tar-like archive of multiple text files, each with 
internal newlines, or a single text file with a funny representation of 
newlines? Arne indicated the latter while you seem to be indicating the 
former.

Of course, neither are true text files -- both fail the TextFileCopier 
test, in particular, and yours doesn't even pass the most elementary 
sniff test -- calling that a text file would be like expecting

List<String> x = whatever;
String y = x;

to compile and work ... somehow.

It's a clear type error.

>> What makes it not *quite* a legitimate text file is that the file's
>> actual content contains a line break that is distinct from 0x0A, 0x0D,
>
> No it doesn't. The editor won't let you put newlines into an OS/400 
> text file

If so, then that editor is broken. And if you edit it on a working editor 
(say, by mounting the file system over the network and using vi on it 
from the comfort of your nice, sane Unix work station) you can certainly 
put newlines in it.

>> Database rows need an ID field so there's something you can uniquely
>> key on, and you said the system stores text in database rows, so
>> there's your explanation. The thing that makes no sense is it storing
>> text in database rows instead of as native text.
>
> Nice guess, but that's not how it works.

Sure it is.

> That role is taken by the line number (which can be a decimal value -
> when you add lines between lines 0002 and 0003 they'll be numbered
> 0002.01, 0002.02 etc until yo ask the editor to renumber the member -
> unlike Unix and Windows systems the line numbers in compilation errors
> aren't screwed up by editing the source.

Well, that's a silly design, then. They already had a field perfectly 
usable as the row ID field and added another separate one? What the hell 
course in DB design did they take? Maybe one that dogmatically told them 
never to make any meaningful data the key field even if it is guaranteed 
unique?

>> Actually C is already broken here even on "normal" systems, because C
>> strings can't properly represent text containing NUL characters.
>
> By definition they can't be included in 'text files'

They belong to the Unicode (and indeed even the base ASCII) character 
set, so by definition they *can* be included in text files.

>> Nope; see above. If everything you've told me is accurate then it is
>> possible to write an OS/400 "text file" that encodes some information
>> that will be destroyed in a copy made by simply reading it character by
>> character through a java.io.Reader and outputting it character by
>> character, unaltered, through a java.io.Writer.
>
> Incorrect assumption

No, it is not.

> because you can't put non-printable characters in an OS/400 source file
> member - the editor and other programs won't let you.

The programs that ship with the system won't? But then you can't properly 
edit text files. No spaces? No tabs? No newlines? There goes making 
anything long be readable. "Missing separator. Stop." on all your 
makefiles. Etc.

Unless of course you use a more normal editor. Say, mount the machine's 
file system over the network and use vi. Or copy a file to it that was 
written with normal editors. Or, of course, emit a file containing 
newlines from Java with Writer.

> The OS/400 is a database machine. There are no files that aren't
> databases. Every file has defining metadata which is automatically
> generated for standard file types, e.g. source files and compiled
> binaries. The field types control what byte values can appear in every
> field, so you might limit a text field to upper case. Violating these
> rules generally causes an exception which, of course, can be caught and
> acted on.

If so, then you're indicating that the operating system *itself* will 
throw an exception if you try to write a text file containing a newline 
to the system.

So much for using it for text files, then. And so much for the claims 
others made that you could read and write text files on such a system 
normally over the network with vi and similar tools, store and compile C 
sources, etc. So much also for the claim that you can use Java normally 
on such a system -- an awful lot of Java programs will break if the 
filesystem throws exceptions for writing perfectly normal things like 09 
and 0A out to a text file. The OP's seek-back-to-start-of-last-line issue 
is gonna be the *least* of his problems if he tries to run his Java code 
on a wacky box like the one that you've just described!
0
Reply kwesson (107) 2/27/2011 2:55:18 PM

On Sat, 26 Feb 2011 17:18:50 +0000, Martin Gregorie wrote:

> KW seemed to be saying that a record had to have structure, which is
> true, and that this had to be in the form of metadata included within
> the record, which isn't true.

I didn't say metadata had to be "included within the record". I said a 
record had out-of-band boundaries of some sort. Later I said that a 
specific format that was described was "text plus file metadata". Nowhere 
was there a claim that records in general inherently *contain metadata*.
0
Reply kwesson (107) 2/27/2011 2:57:29 PM

On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:

> On 26-02-2011 06:15, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>> a text file contains records. They are variable length records with a
>>> 'newline' encoding as the delimiter.
>>
>> By that definition the concept of "record-based" vs. "not-record-based"
>> becomes completely meaningless.
>>
>> But most of us use "records" to mean a structure that involves out-of-
>> band boundaries of some sort. Linear text with inline line break etc.
>> characters has only in-band boundaries and is much less structured than
>> what a "record" typically implies.
> 
> A line is by definition a structure because there is something that
> determines where it starts and where it ends.

But it's entirely in-band structure. Line breaks are a natural part of 
texts.

> Neither a count prefix or the the line delimiter are part of the line
> itself.

But you're looking at the wrong unit of granularity here. A line 
delimiter is part of the *text* itself. But a count prefix is not. Read a 
page of a novel. You will notice many line breaks, but no count prefixes, 
if your selection was at all typical.
0
Reply kwesson (107) 2/27/2011 2:59:15 PM

On Sun, 27 Feb 2011 09:13:05 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:55, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 16:35:09 -0700, Jim Janney wrote:
>>> The programs I work on run under Windows JVMs but still read and write
>>> files on the AS/400 (or whatever IBM is calling it this month) file
>>> system.
>>
>> Perhaps, but if the OP's code is for working with Unix log files --
>> well, who the heck is likely to store Unix log files on an AS/400 box?
>> The Unix machines networked with the hypothetical AS/400 box will no
>> doubt store their log files on their own hard drives, not the AS/400
>> box's.
> 
> Bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark!

Please shut up.
0
Reply Ken 2/27/2011 3:09:26 PM

On Sun, 27 Feb 2011 09:27:11 -0500, Lew wrote:

> On 02/27/2011 09:13 AM, Arne Vajhøj wrote:
>> On 27-02-2011 08:55, Ken Wesson wrote:
>>> On Sat, 26 Feb 2011 16:35:09 -0700, Jim Janney wrote:
>>>> The programs I work on run under Windows JVMs but still read and
>>>> write files on the AS/400 (or whatever IBM is calling it this month)
>>>> file system.
>>>
>>> Perhaps, but if the OP's code is for working with Unix log files --
>>> well, who the heck is likely to store Unix log files on an AS/400 box?
>>> The Unix machines networked with the hypothetical AS/400 box will no
>>> doubt store their log files on their own hard drives, not the AS/400
>>> box's.
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark!

Is anyone planning to actually discuss Java any time soon? Because this 
constant pointless mudslinging is getting somewhat tiresome to listen to. 
Calling people "ignorant troll" and the like can accomplish nothing 
constructive, Lew, nor can barking incessantly like a badly-trained dog, 
Arne.

If you don't have something useful, relevant, and NOT laced with 
innuendos about people you happen to personally dislike (or even merely 
disagree with) then perhaps you should refrain from clicking "Send".

Have a nice day.
0
Reply Ken 2/27/2011 3:12:46 PM

On Sun, 27 Feb 2011 09:08:04 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:59, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 16:27:20 -0500, Arne Vajhøj wrote:
>>> <quote>
>>> In the twenty years from 1990 to 2010, worldwide mobile phone
>>> subscriptions grew from 12.4 million to over 4.6 billion</quote>
>>
>> Fascinating. But we weren't counting all cell phones. We were only
>> counting phones with app stores and the like.
> 
> Practically all of these phones can run Java ME apps.

So you claim. But even if there was some truth to that claim, where would 
your typical user *get* these apps? It's not really running an open-ended 
set of apps unless those apps are actually out there, more are being 
written, anyone (not just the phone company) can write them and some 
people other than the phone company are doing so, and the typical user 
can and sometimes will go and download some.
0
Reply Ken 2/27/2011 3:14:29 PM

On Sun, 27 Feb 2011 22:10:04 +0800, Peter Duniho wrote:

> On 2/27/11 9:59 PM, Ken Wesson wrote:
>> [...]
>>> Better googling skills bark bark bark bark bark!
>>>
>>> Bark bark bark bark bark bark bark bark bark!
>>
>> If you jump up at me, I will take action to defend myself, and I
>> outmass all terriers by *at least* a factor of 20 to 1, so you *will*
>> get the raw end of it!
> 
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark!
> 
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark!
> 
> Bark!

Just how many fucking yappers ARE there around here anyway? Are there 
actually any *human beings*, aside from myself, that are actually capable 
of engaging in civil discourse and even sometimes disagreeing with what 
somebody says *without* turning it into a personal fight full of barks, 
growls, name-calling, and general hystrionics of a most unseemly nature?
0
Reply Ken 2/27/2011 3:16:54 PM

On Sun, 27 Feb 2011 09:19:37 -0500, Arne Vajhøj wrote:

> On 27-02-2011 09:10, Peter Duniho wrote:
>> On 2/27/11 9:59 PM, Ken Wesson wrote:
>>> [...]
>>>> Better googling skills bark bark bark bark bark!
>>>>
>>>> Bark bark bark bark bark bark bark bark bark!
>>>
>>> If you jump up at me, I will take action to defend myself, and I
>>> outmass all terriers by *at least* a factor of 20 to 1, so you *will*
>>> get the raw end of it!
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark!
> 
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark!
> 
> Bark!

.... *sigh* ...

I remember when this newsgroup used to actually be about Java programming.
0
Reply Ken 2/27/2011 3:18:17 PM

On Sun, 27 Feb 2011 09:31:05 -0500, Lew wrote:

> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
> a violent psychopath!

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?

>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark!
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark!
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark!

Why quote all that useless, off-topic crud if you're not going to even 
respond to it, Lew?
0
Reply Ken 2/27/2011 3:20:06 PM

On Sun, 27 Feb 2011 09:17:31 -0500, Arne Vajhøj wrote:

> On 27-02-2011 09:08, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 16:28:13 -0500, Arne Vajhøj wrote:
>>
>>> On 26-02-2011 05:45, Ken Wesson wrote:
>>>> On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:
>>>>> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
>>>>>> If by "very common" you mean used on one in ten thousand or fewer
>>>>>> of their computers. For every single z/OS machine in corporate
>>>>>> America there are probably a thousand blade servers and ten
>>>>>> thousand office PCs and employer-provided laptops and God alone
>>>>>> knows how many employee smartphones with plans and/or handsets paid
>>>>>> for by their company.
>>>>>
>>>>> By that standard PCs, in which lets include desktops and laptops,
>>>>> are also a tiny small proportion of all computers once you count
>>>>> phones and all the embedded computers in vehicles.
>>>>
>>>> I'm only counting machines you can add onto with an open-ended set of
>>>> software applications.
>>>
>>> Practically all mobile phones
>>
>> Don't be ridiculous. There are exactly two app stores out there so far
>> for phones: the Android Market and the Apple App Store.
> 
> Bark bark bark bark bark bark bark bark bark bark bark
> bark bark bark bark bark bark!
> 
> Bark bark bark bark bark bark you been living????
> 
> Bark bark!
>  
> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark!
> 
> Bark!

Your personal opinions of others are not the topic of this newsgroup. Do 
you have anything Java-related to say?
0
Reply Ken 2/27/2011 3:21:30 PM

Ken Wesson <kwesson@gmail.com> writes:

> On Sun, 27 Feb 2011 09:13:05 -0500, Arne Vajhøj wrote:
>
>> On 27-02-2011 08:55, Ken Wesson wrote:
>>> On Sat, 26 Feb 2011 16:35:09 -0700, Jim Janney wrote:
>>>> The programs I work on run under Windows JVMs but still read and write
>>>> files on the AS/400 (or whatever IBM is calling it this month) file
>>>> system.
>>>
>>> Perhaps, but if the OP's code is for working with Unix log files --
>>> well, who the heck is likely to store Unix log files on an AS/400 box?
>>> The Unix machines networked with the hypothetical AS/400 box will no
>>> doubt store their log files on their own hard drives, not the AS/400
>>> box's.
>> 
>> Bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark!
>
> Please shut up.

http://www.catb.org/~esr/faqs/smart-questions.html#not_losing

-- 
Jim Janney
0
Reply Jim 2/27/2011 3:34:51 PM

On 26/02/2011 10:47, Ken Wesson allegedly wrote:
> On Fri, 25 Feb 2011 19:07:45 +0100, Daniele Futtorovic wrote:
>
>> On 25/02/2011 05:50, Ken Wesson allegedly wrote:
>>> On Thu, 24 Feb 2011 20:14:49 +0100, Daniele Futtorovic wrote:
>>>
>>>> On 24/02/2011 19:46, Ken Wesson allegedly wrote:
>>>>> it's not (...) ASCII (...).
>>>
>>> Alleged by whom? That distorted quote is most certainly not what I
>>> wrote.
>>
>> Alleged by my Usenet provider.
>
> No, I'm sure your Usenet provider correctly indicated that I said "it's
> not your grandfather's ASCII" and you then altered the quotation.
>
>> minus the fluff
>
> Your personal opinions of others are not the topic of this newsgroup. Do
> you have anything Java-related to say?
>
>> such a loonie
>
> Your personal opinions of others are not the topic of this newsgroup. Do
> you have anything Java-related to say?

Sigh.

*PLONK*
0
Reply Daniele 2/27/2011 3:41:01 PM

On Sun, 27 Feb 2011 16:41:01 +0100, Daniele Futtorovic wrote:

> On 26/02/2011 10:47, Ken Wesson allegedly wrote:
>> On Fri, 25 Feb 2011 19:07:45 +0100, Daniele Futtorovic wrote:
>>> such a loonie
>>
>> Your personal opinions of others are not the topic of this newsgroup.
>> Do you have anything Java-related to say?
> 
> Bark!
> 
> Bark bark!

*sigh*

Will you people give it a FUCKING REST ALREADY?

Sheesh!

You've made your "point" (such as it is) already; there is no need for 
endless carping repetitions of your *opinions*.
0
Reply Ken 2/27/2011 3:46:13 PM

On Sun, 27 Feb 2011 08:34:51 -0700, Jim Janney wrote:

> Ken Wesson <kwesson@gmail.com> writes:
>> On Sun, 27 Feb 2011 09:13:05 -0500, Arne Vajhøj wrote:
>>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>>> bark bark bark bark bark bark bark bark bark bark!
>>
>> Please shut up.
> 
> Bark bark bark bark bark bark bark bark bark bark bark!

I'm sure Arne is quite capable of barking at me himself, Jim.

0
Reply Ken 2/27/2011 3:47:06 PM

On 27-02-2011 10:46, Ken Wesson wrote:
> On Sun, 27 Feb 2011 16:41:01 +0100, Daniele Futtorovic wrote:
>> On 26/02/2011 10:47, Ken Wesson allegedly wrote:
>>> On Fri, 25 Feb 2011 19:07:45 +0100, Daniele Futtorovic wrote:
>>>> such a loonie
>>>
>>> Your personal opinions of others are not the topic of this newsgroup.
>>> Do you have anything Java-related to say?
>>
>> Bark!
>>
>> Bark bark!
>
> *sigh*
>
> Will you people give it a FUCKING REST ALREADY?
>
> Sheesh!
>
> You've made your "point" (such as it is) already; there is no need for
> endless carping repetitions of your *opinions*.

Have you considered taking your own advice??

Arne
0
Reply UTF 2/27/2011 3:57:22 PM

On 27-02-2011 10:14, Ken Wesson wrote:
> On Sun, 27 Feb 2011 09:08:04 -0500, Arne Vajhøj wrote:
>> On 27-02-2011 08:59, Ken Wesson wrote:
>>> On Sat, 26 Feb 2011 16:27:20 -0500, Arne Vajhøj wrote:
>>>> <quote>
>>>> In the twenty years from 1990 to 2010, worldwide mobile phone
>>>> subscriptions grew from 12.4 million to over 4.6 billion</quote>
>>>
>>> Fascinating. But we weren't counting all cell phones. We were only
>>> counting phones with app stores and the like.
>>
>> Practically all of these phones can run Java ME apps.
>
> So you claim.

You can easily verify be checking phones from the big
producers like Nokia.

>                But even if there was some truth to that claim, where would
> your typical user *get* these apps?

Via the phones browser and HTTP from any web server.

(or cable or Bluetooth if that should be preferred)

Arne


0
Reply UTF 2/27/2011 4:01:35 PM

Lew <noone@lewscanon.com> wrote:
> In this forum people all the time try to make the claim that an OP meant 
> something not in the original post.  For example, the other thread where the 
> OP asked how to produce a 'List <String1>' and everyone (except me) assumed 
> and argued that they *must* have meant 'List <String>', even though they took 
> great pains not to say so.

In this forum, people assume, that questions asked here do not necessarily
have to be of reviewed-problem-specification quality to deserve an answer.

So, if a question asked here seems to imply really odd points and does not
explicitly indicate that it is really meant to do so, then it's just more
likely that it wasn't meant so.

If the answer depends on further clarification, there is no problem making
assumptions about those.  If it turns out to be a wrong interpretation of the
question or a wrong assumption...  So what? 10 minutes wasted, but no harm done.
The typical intention is to help, not to answer questions literally.

If you prefer to answer questions literally, then by all means do so, for the
chance that they were actually reviewed-problem-specifications isn't 0, after
all.  Just don't be overly exasperated by others following different strategies
meant to help the questioner.  

0
Reply avl1 (2656) 2/27/2011 4:16:02 PM

On 27-02-2011 10:16, Ken Wesson wrote:
> On Sun, 27 Feb 2011 22:10:04 +0800, Peter Duniho wrote:
>
>> On 2/27/11 9:59 PM, Ken Wesson wrote:
>>> [...]
>>>> Better googling skills bark bark bark bark bark!
>>>>
>>>> Bark bark bark bark bark bark bark bark bark!
>>>
>>> If you jump up at me, I will take action to defend myself, and I
>>> outmass all terriers by *at least* a factor of 20 to 1, so you *will*
>>> get the raw end of it!
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark!
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark!
>>
>> Bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark bark bark bark bark bark bark
>> bark bark bark bark bark bark bark bark!
>>
>> Bark!
>
> Just how many fucking yappers ARE there around here anyway? Are there
> actually any *human beings*, aside from myself, that are actually capable
> of engaging in civil discourse and even sometimes disagreeing with what
> somebody says *without* turning it into a personal fight full of barks,
> growls, name-calling, and general hystrionics of a most unseemly nature?

I am pretty sure that you are the only one writing "bark".

Arne

0
Reply UTF 2/27/2011 4:25:58 PM

On 27-02-2011 09:27, Ken Wesson wrote:
> On Sat, 26 Feb 2011 14:26:18 -0500, Arne Vajhøj wrote:
>> On 26-02-2011 05:56, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 09:46:30 -0500, Arne Vajhøj wrote:
>>>> On 25-02-2011 00:28, Ken Wesson wrote:
>>>>> On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:
>>>>>> And it is a pretty good guess that the RandomAccessFile searching
>>>>>> for CR and LF will fail on i also then.
>>>>>
>>>>> How fortunate that i runs on fewer than one in ten thousand machines.
>>>>> Does Java even run on i?
>>>>
>>>> Yes.
>>>
>>> And what the hell is it used for on i?
>>
>> The same that Java is used for on other platforms.
>
> Evasion noted. But people don't use i to play games or fileshare or run
> web services, which are the most usual places I see Java being used. They
> play Java games on Java phones and Windows, fileshare with Limewire on
> (mainly) Windows, and run Java web services on Unix.
>
> What do people do on i boxes?

As I wrote - same as for other platforms.

Phones, file shares and web services are a rather small
part of Java (phone are growing though).

The majority of Java work is business apps (some UI - often
web based, some business logic, some persistence in database
etc.).

>>>>>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>>>>>
>>>>> Both contain ASCII as a subset -- if you take a pure-ASCII file and
>>>>> reencode it in either the result is the identical byte sequence.
>>>>
>>>> Yes, but that does not change that they do not use ASCII. They
>>>> use ISO-8859-1 or UTF-8.
>>>
>>> There you go.
>>
>> Exactly.
>
> Then I do hope you will drop this particular part of the argument now.

Sure.

Arne



0
Reply UTF 2/27/2011 4:30:31 PM

On 27-02-2011 09:59, Ken Wesson wrote:
> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>> On 26-02-2011 06:15, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>> a text file contains records. They are variable length records with a
>>>> 'newline' encoding as the delimiter.
>>>
>>> By that definition the concept of "record-based" vs. "not-record-based"
>>> becomes completely meaningless.
>>>
>>> But most of us use "records" to mean a structure that involves out-of-
>>> band boundaries of some sort. Linear text with inline line break etc.
>>> characters has only in-band boundaries and is much less structured than
>>> what a "record" typically implies.
>>
>> A line is by definition a structure because there is something that
>> determines where it starts and where it ends.
>
> But it's entirely in-band structure. Line breaks are a natural part of
> texts.
>
>> Neither a count prefix or the the line delimiter are part of the line
>> itself.
>
> But you're looking at the wrong unit of granularity here. A line
> delimiter is part of the *text* itself. But a count prefix is not. Read a
> page of a novel. You will notice many line breaks, but no count prefixes,
> if your selection was at all typical.

I suggest you look at Java BufferedReader readLine, Pascal readln etc. -
they do not return the line break as part of the line.

Arne


0
Reply UTF 2/27/2011 4:33:30 PM

On 2/26/2011 12:36 PM, Lew wrote:
> Daniel Pitts wrote:
>> Lew wrote:
>>> Robin Wenger wrote:
>>>> Is it possible to read the last text line from a text file WITHOUT
>>>> reading the previous (n-1) lines?
>>>
>>> Yes, but it's tricky. You need a random-access file and seek backwards
>>> to a newline.
>>>
>> You can do something a little better than seeking backwards. You can
>> make some
>> guesses about line length. If it is a typical text file, you can guess
>> that
>> the length f that line is < 1024 (for instance). Seek to that location
>> before
>> the end of the file and then perform the typical "tail" operation.
>>
>> If you don't find the EOL as expected, you would then do the same
>> thing, but
>> start further back.
>
> That's a form of seeking backwards.
I suppose you're right, and my idea was basically a buffered backwards 
seek.


-- 
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
0
Reply Daniel 2/27/2011 6:12:51 PM

On 02/27/2011 01:12 PM, Daniel Pitts wrote:
> On 2/26/2011 12:36 PM, Lew wrote:
>> Daniel Pitts wrote:
>>> Lew wrote:
>>>> Robin Wenger wrote:
>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>> reading the previous (n-1) lines?
>>>>
>>>> Yes, but it's tricky. You need a random-access file and seek backwards
>>>> to a newline.
>>>>
>>> You can do something a little better than seeking backwards. You can
>>> make some
>>> guesses about line length. If it is a typical text file, you can guess
>>> that
>>> the length f that line is < 1024 (for instance). Seek to that location
>>> before
>>> the end of the file and then perform the typical "tail" operation.
>>>
>>> If you don't find the EOL as expected, you would then do the same
>>> thing, but
>>> start further back.
>>
>> That's a form of seeking backwards.
> I suppose you're right, and my idea was basically a buffered backwards seek.

Don't get me wrong - your idea is quite good.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/27/2011 6:27:12 PM

On Feb 27, 9:55=A0am, Ken Wesson <kwes...@gmail.com> wrote:
> If so, then that editor is broken. And if you edit it on a working editor
> (say, by mounting the file system over the network and using vi on it
> from the comfort of your nice, sane Unix work station)

There is nothing sane about a Unix workstation running vi.
0
Reply Jerry 2/27/2011 7:24:14 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-784995912-1298836669=:3615
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Sat, 26 Feb 2011, Arne Vajhøj wrote:

> On 26-02-2011 05:17, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:
>>> On 25-02-2011 00:04, Ken Wesson wrote:
>>>> On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:
>>>>> On 24-02-2011 09:00, Ken Wesson wrote:
>>>>>
>>>>>> And that exhausts 99.99% of the operating system market share right 
>>>>>> there, if not more,
>>>>> 
>>>>> No.
>>>>> 
>>>>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>>>> 
>>>> Nonsense. There are *at least* ten thousand PCs running Windows for
>>>> every one machine running one of those operating systems.
>>>> 
>>>> Ten thousand *PCs running Windows*.
>>> 
>>> The PC/mainframe ratio is probably like 100000:1.
>> 
>> Hence why I said *at least*. I was being conservative in my estimates --
>> as generous to *your* case as possible. And still I was demolishing it.
>
> Not at all.
>
> Because market share is counted in dollars.

It bloody well is not!

Or are you saying that non-commercial distributions of Linux have zero 
market share *by definition*?

tom

-- 
We can only see a short distance ahead, but we can see plenty there that
needs to be done. -- Alan Turing
--232016332-784995912-1298836669=:3615--
0
Reply Tom 2/27/2011 7:57:49 PM

On 27-02-2011 14:57, Tom Anderson wrote:
> On Sat, 26 Feb 2011, Arne Vajhøj wrote:
>> On 26-02-2011 05:17, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:
>>>> On 25-02-2011 00:04, Ken Wesson wrote:
>>>>> On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:
>>>>>> On 24-02-2011 09:00, Ken Wesson wrote:
>>>>>>
>>>>>>> And that exhausts 99.99% of the operating system market share
>>>>>>> right there, if not more,
>>>>>>
>>>>>> No.
>>>>>>
>>>>>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>>>>>
>>>>> Nonsense. There are *at least* ten thousand PCs running Windows for
>>>>> every one machine running one of those operating systems.
>>>>>
>>>>> Ten thousand *PCs running Windows*.
>>>>
>>>> The PC/mainframe ratio is probably like 100000:1.
>>>
>>> Hence why I said *at least*. I was being conservative in my estimates --
>>> as generous to *your* case as possible. And still I was demolishing it.
>>
>> Not at all.
>>
>> Because market share is counted in dollars.
>
> It bloody well is not!
>
> Or are you saying that non-commercial distributions of Linux have zero
> market share *by definition*?

No.

The HW is not free.

When you talk about server market share it is the cost of the HW
and the OS.

For good reasons - in the case of proprietary OS then the division
of cost between HW and OS are arbitrary.

The OS for a free OS (like Debian or FreeBSD) does not contribute
in itself.

It is a bit unfair, but it is very difficult to assign an
amount other than the one actually paid.

And the unfairness of counting 10 M$ systems equivalent to
1 K$ systems are a bigger problem.

Arne

0
Reply UTF 2/27/2011 8:09:47 PM

On Sun, 27 Feb 2011, Lew wrote:

> In this forum people all the time try to make the claim that an OP meant 
> something not in the original post.  For example, the other thread where 
> the OP asked how to produce a 'List <String1>' and everyone (except me) 
> assumed and argued that they *must* have meant 'List <String>', even 
> though they took great pains not to say so.

Did we ever get an indication from that guy whether he did or did not mean 
String? I still assume that he *did* mean String, and that quite contrary 
to taking great pains in his saying, he'd expressed himself poorly.

It's virtually a default presumption for me that new posters with 
questions mean something other than or more than what they say. People 
come here with problems they can't solve themselves, and we get two kinds 
of them (caricaturing somewhat): dumb people with easy problems, and smart 
people with difficult problems; it is not reasonable to assume that the 
former have expressed themselves completely and correctly, and sadly, they 
outnumber the latter.

> So I would expand your advice to add that one eschew assuming anything 
> outside the problem statement.

I hope you don't mind if i decide not to travel with your Logical 
Positivism Bus Company here; i would suggest that while we must not assume 
anything outside the problem statement, we can suggest and hypothesise it.

> Beyond that, because this is a discussion forum and not a help desk, it 
> is entirely appropriate to discuss the general applicability of 
> principles elicited from a specific problem.  Thus, even if the OP did 
> want to speak only of log files, it is important and highly relevant to 
> point out that "text files" (about which they actually did ask) have a 
> wider and fuzzer meaning that certain ignorant trolls would believe.

Well, if not to point it out, to discuss the idea, at least :).

tom

-- 
Everything looks kind of OK
0
Reply Tom 2/27/2011 8:14:27 PM

On 27-02-2011 09:55, Ken Wesson wrote:
> On Sat, 26 Feb 2011 13:29:48 +0000, Martin Gregorie wrote:
>> On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>> a text file contains records. They are variable length records with a
>>>> 'newline' encoding as the delimiter.
>>>
>>> By that definition the concept of "record-based" vs. "not-record-based"
>>> becomes completely meaningless.
>>
>> It is pretty much meaningless unless you're referring to the way a
>> programs handles data. Consider a file containing nothing but printable
>> characters:
>>
>> - if a C or Java program reads the file byte by byte or parses it
>>    by reading words separated by whitespace then line delimiters are
>>    utterly meaningless and the program doesn't care whether the file
>>    contains records or not.
>>
>> - OTOH if a different program reads the same file a line at a time, e.g
>>    C using fgets(), Java using BufferedReader.readLine(), then this is
>>    pure record-level access.
>
> But the text file itself is not "record-based". You can implement a
> record-based format *on top of text* -- CSV goes further that way -- but
> the resulting file, crucially, can still be manipulated with tools
> designed for generic operations on arbitrary text files properly. In
> particular, this should be lossless on it:
>
> import java.io.*;
>
> public class TextFileCopier {
>      public static void main (String[] args) throws IOException {
>          if (args.length<  3) {
>              System.out.println("Please specify source and" +
>                  "destination file.");
>              return;
>          }
>          File f = new File(args[1]);
>          InputStream is = new FileInputStream(f);
>          Reader rdr = new InputStreamReader(is);
>          File g = new File(args[2]);
>          OutputStream os = new FileOutputStream(g);
>          Writer wtr = new OutputStreamWriter(os);
>          int c;
>          while ((c = rdr.read()) != -1) wtr.write(c);
>      }
> }
>
> But this won't be lossless on the strange file formats Arne has become
> obsessed with. At the reading stage, the record boundaries in those file
> formats will be translated into some newline character or another, likely
> \u000A. When that happens, the distinction between those and literal
> \u000A characters in the source file will be lost and can never be
> regained.
>
> Surely you agree that a file format cannot be regarded as a true text
> file format unless the above TextFileCopier can copy all files in that
> format faithfully?

Actually I think it is a bit weird to test if a file consists
of text lines without the program being line aware.

And the code is rather bad:
- you are not using args[0] (in Java args[0] does not contain the
   name of the program)
- you are not calling close on rdr and wtr
but those are easy to fix.

And I am happy to inform you that the above program
actually works with VMS variable length files.

Dump of input:

         Record type:                      Variable
         File organization:                Sequential
         Record attributes:                Implied carriage control
         End of file block:                1
         End of file byte:                 16

  007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000

Dump of output:

     VAX-11 RMS attributes
         Record type:                      Variable
         File organization:                Sequential
         Record attributes:                Implied carriage control
         End of file block:                1
         End of file byte:                 16

  007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000

So QED.

Arne

PS: for those with a VMS system that want to test themselves,
     then remember to set the logical that tells Java to use
     variable length files in stream mode.

PPS: I am actually somewhat surprised that it works. It is
      not that easy to get something as stream oriented as this
      to work in a record world. HP's Java and C engineers
      must have been rather smart.
0
Reply UTF 2/27/2011 8:19:15 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-1166670890-1298838007=:3615
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Sun, 27 Feb 2011, Arne Vajhøj wrote:

> On 27-02-2011 09:59, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>>> On 26-02-2011 06:15, Ken Wesson wrote:
>>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>>> a text file contains records. They are variable length records with a
>>>>> 'newline' encoding as the delimiter.
>>>> 
>>>> By that definition the concept of "record-based" vs. "not-record-based"
>>>> becomes completely meaningless.
>>>> 
>>>> But most of us use "records" to mean a structure that involves out-of-
>>>> band boundaries of some sort. Linear text with inline line break etc.
>>>> characters has only in-band boundaries and is much less structured than
>>>> what a "record" typically implies.
>>> 
>>> A line is by definition a structure because there is something that
>>> determines where it starts and where it ends.
>> 
>> But it's entirely in-band structure. Line breaks are a natural part of
>> texts.
>> 
>>> Neither a count prefix or the the line delimiter are part of the line
>>> itself.
>> 
>> But you're looking at the wrong unit of granularity here. A line
>> delimiter is part of the *text* itself. But a count prefix is not. Read a
>> page of a novel. You will notice many line breaks, but no count prefixes,
>> if your selection was at all typical.
>
> I suggest you look at Java BufferedReader readLine, Pascal readln etc. -
> they do not return the line break as part of the line.

Oddly, Pythons's file.readline() does. I believe it's so that readline() 
is the inverse of write(), which does not add a line terminator. You might 
think that it would be more sensible that readline() should strip the 
terminator, and that there should be a writeline() that adds one, but 
that's not how it is.

Now, who can point me at this atypical novel with count prefixes?

tom

-- 
Everything looks kind of OK
--232016332-1166670890-1298838007=:3615--
0
Reply Tom 2/27/2011 8:20:07 PM

On 27-02-2011 09:55, Ken Wesson wrote:
> On Sat, 26 Feb 2011 13:29:48 +0000, Martin Gregorie wrote:
>> On Sat, 26 Feb 2011 12:15:21 +0100, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>> BTW, you can use C to handle iSeries text files through the usual
>>>> gets() and puts() functions despite the iSeries holding text in what
>>>> are effectively database rows. They have three fields per row - a line
>>>> number, a fixed length text field and an 8 byte ID.
>>>
>>> That's text plus file metadata.
>>
>> Indeed it is. Technically it is made up of fixed length fields with no
>> delimiters. Apart from the record description that forms part of every
>> file and the member separators the only metadata is similar to a UNIX
>> directory entry plus the i-node. OS/400 and Z/OS text files are closer
>> to a tar or zip file than what a Unix or Windows user considers to be a
>> text file because you can store many separate chunks of text in a single
>> text file.
>
> So which is it -- a tar-like archive of multiple text files, each with
> internal newlines, or a single text file with a funny representation of
> newlines? Arne indicated the latter while you seem to be indicating the
> former.
>
> Of course, neither are true text files -- both fail the TextFileCopier
> test, in particular, and yours doesn't even pass the most elementary
> sniff test -- calling that a text file would be like expecting

If the files can be read and written as text files by the shell,
Fortran, Cobol, C, C++, Java etc. then they seems to be text
by those that created those languages.

i and OpenVMS are quite different, so no surprise that Martins
description of i and mine of OpenVMS differs somewhat.

And your copier program actually works on OpenVMS, so ...

>>> Actually C is already broken here even on "normal" systems, because C
>>> strings can't properly represent text containing NUL characters.
>>
>> By definition they can't be included in 'text files'
>
> They belong to the Unicode (and indeed even the base ASCII) character
> set, so by definition they *can* be included in text files.

Your weird definition.

The rest of us expect text files to contain printable
characters, which NUL is not.

Arne
0
Reply UTF 2/27/2011 8:25:06 PM

On 27-02-2011 15:20, Tom Anderson wrote:
> On Sun, 27 Feb 2011, Arne Vajhøj wrote:
>
>> On 27-02-2011 09:59, Ken Wesson wrote:
>>> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>>>> On 26-02-2011 06:15, Ken Wesson wrote:
>>>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>>>> a text file contains records. They are variable length records with a
>>>>>> 'newline' encoding as the delimiter.
>>>>>
>>>>> By that definition the concept of "record-based" vs.
>>>>> "not-record-based"
>>>>> becomes completely meaningless.
>>>>>
>>>>> But most of us use "records" to mean a structure that involves out-of-
>>>>> band boundaries of some sort. Linear text with inline line break etc.
>>>>> characters has only in-band boundaries and is much less structured
>>>>> than
>>>>> what a "record" typically implies.
>>>>
>>>> A line is by definition a structure because there is something that
>>>> determines where it starts and where it ends.
>>>
>>> But it's entirely in-band structure. Line breaks are a natural part of
>>> texts.
>>>
>>>> Neither a count prefix or the the line delimiter are part of the line
>>>> itself.
>>>
>>> But you're looking at the wrong unit of granularity here. A line
>>> delimiter is part of the *text* itself. But a count prefix is not.
>>> Read a
>>> page of a novel. You will notice many line breaks, but no count
>>> prefixes,
>>> if your selection was at all typical.
>>
>> I suggest you look at Java BufferedReader readLine, Pascal readln etc. -
>> they do not return the line break as part of the line.
>
> Oddly, Pythons's file.readline() does. I believe it's so that readline()
> is the inverse of write(), which does not add a line terminator. You
> might think that it would be more sensible that readline() should strip
> the terminator, and that there should be a writeline() that adds one,
> but that's not how it is.

Note that it looks like readline does not include the line delimiter
but explicitly include a new line.

Point being that on Windows it is still \n not \r\n.

> Now, who can point me at this atypical novel with count prefixes?

Docs or system or ... ?

Arne

0
Reply UTF 2/27/2011 8:39:37 PM

On 27-02-2011 13:27, Lew wrote:
> On 02/27/2011 01:12 PM, Daniel Pitts wrote:
>> On 2/26/2011 12:36 PM, Lew wrote:
>>> Daniel Pitts wrote:
>>>> Lew wrote:
>>>>> Robin Wenger wrote:
>>>>>> Is it possible to read the last text line from a text file WITHOUT
>>>>>> reading the previous (n-1) lines?
>>>>>
>>>>> Yes, but it's tricky. You need a random-access file and seek backwards
>>>>> to a newline.
>>>>>
>>>> You can do something a little better than seeking backwards. You can
>>>> make some
>>>> guesses about line length. If it is a typical text file, you can guess
>>>> that
>>>> the length f that line is < 1024 (for instance). Seek to that location
>>>> before
>>>> the end of the file and then perform the typical "tail" operation.
>>>>
>>>> If you don't find the EOL as expected, you would then do the same
>>>> thing, but
>>>> start further back.
>>>
>>> That's a form of seeking backwards.
>> I suppose you're right, and my idea was basically a buffered backwards
>> seek.
>
> Don't get me wrong - your idea is quite good.

For any serious (volume) usage it would be one of the first things
to get right.

Arne

0
Reply UTF 2/27/2011 9:25:06 PM

On 27-02-2011 08:28, Ken Wesson wrote:
> On Sat, 26 Feb 2011 14:12:44 -0500, Arne Vajhøj wrote:
>> On 26-02-2011 08:36, Peter Duniho wrote:
>>> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>>>> The usefulness of the term "text file" for me is that it describes a
>>>> file that can be opened, viewed and used by every application, tool
>>>> and utility, on every OS and platform, that purports to be a "text
>>>> editor".
>>>
>>> Then I think you need to define "text file" more narrowly than what is
>>> actually out there. In this thread alone, there have been mentioned a
>>> number of true text file formats that are simply not readable in your
>>> average or even above-average text editor found on mainstream OSs.
>>
>> They are edited fine by any text editor on those systems.
>>
>> This includes cross platform editors that are also available on *nix and
>> Windows.
>
> No. Those editors, at minimum, will strip some information from the file,
> even if you just open it and then save out a copy. If the original had my
> hypothetical hidden message "the attack begins at midnight" encoded in a
> pattern of actual 0x0A characters and record boundaries, the output will
> not, and will generally have converted all 0x0A characters in the
> original into record boundaries specifically, in that
>
> foo<0x0A>bar<boundary>baz
>
> would probably have ended up in the vi buffer (or emacs buffer, or
> whatever) as
>
> foo<0x0A>bar<0x0A>baz

Which is where you logic goes wrong.

You assume that the editor will store the data in a single
string separated by \n.

Not only does it not have to be done that way.

It would most likely not be done that way.

It is a horrible inefficient way to work with text.

ArrayList<String> in Java syntax is much more efficient.

> and then output would be converted to
>
> foo<boundary>bar<boundary>baz.
>
> If a text editor like vi cannot losslessly load and re-save the file then
> it is not a text file by any sane definition, including particularly
> Arved's pretty darn good definition.

Not only is your logic flawed as described above.

But is very easy to open such a file in one of the text editors
that comes with the system and save it and verify that the file
did not change.

Arne

0
Reply UTF 2/27/2011 9:33:20 PM

On 27-02-2011 08:29, Ken Wesson wrote:
> On Sat, 26 Feb 2011 14:19:38 -0500, Arne Vajhøj wrote:
>> On 26-02-2011 04:45, Ken Wesson wrote:
>>> I don't see an argument of any kind in your post. Forget to include
>>> one?
>>
>> The quotation
>
> was all that there was, besides attributions and your signoff "Arne".
> Your post contained no meaningful original text.

It seems pretty meaningful to show that you yourself wrote
what you asked about who wrote!

Arne
0
Reply UTF 2/27/2011 9:35:13 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-1674164513-1298842556=:3615
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-15; FORMAT=flowed
Content-Transfer-Encoding: 8BIT
Content-ID: <alpine.DEB.1.10.1102272135581.3615@urchin.earth.li>

On Sat, 26 Feb 2011, Arne Vajh�j wrote:

> On 26-02-2011 05:45, Ken Wesson wrote:
>> On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:
>>> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
>>>> If by "very common" you mean used on one in ten thousand or fewer of
>>>> their computers. For every single z/OS machine in corporate America
>>>> there are probably a thousand blade servers and ten thousand office PCs
>>>> and employer-provided laptops and God alone knows how many employee
>>>> smartphones with plans and/or handsets paid for by their company.
>>> 
>>> By that standard PCs, in which lets include desktops and laptops, are
>>> also a tiny small proportion of all computers once you count phones and
>>> all the embedded computers in vehicles.
>> 
>> I'm only counting machines you can add onto with an open-ended set of
>> software applications.
>
> Practically all mobile phones support Java ME.

Practically all non-smartphones do, i think. My understanding (gained from 
looking at the wikipedia article on smartphones) is that about 20% of 
phones in first-world countries are smartphones, and that very roughly 
speaking, a third of smartphones are Android, a third Symbian, a sixth 
iOS, and a sixth BlackBerry. Symbian and BlackBerry support J2ME; iOS and 
Android don't. So, about 10% of mobile phones in the first world don't 
support J2ME. I think that's enough that it's not 'practically all'.

Particularly since market share is apparently calculated by dollars, not 
units, in which case the market share of non-J2ME phones is very 
significantly greater than 10%.

tom

-- 
Let's roll on the floor!
--232016332-1674164513-1298842556=:3615--
0
Reply Tom 2/27/2011 9:37:17 PM

On 11-02-27 04:19 PM, Arne Vajhøj wrote:
[ SNIP ]

> And I am happy to inform you that the above program
> actually works with VMS variable length files.
> 
> Dump of input:
> 
>         Record type:                      Variable
>         File organization:                Sequential
>         Record attributes:                Implied carriage control
>         End of file block:                1
>         End of file byte:                 16
> 
>  007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
> 
> Dump of output:
> 
>     VAX-11 RMS attributes
>         Record type:                      Variable
>         File organization:                Sequential
>         Record attributes:                Implied carriage control
>         End of file block:                1
>         End of file byte:                 16
> 
>  007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
> 
> So QED.
> 
> Arne
> 
> PS: for those with a VMS system that want to test themselves,
>     then remember to set the logical that tells Java to use
>     variable length files in stream mode.
> 
> PPS: I am actually somewhat surprised that it works. It is
>      not that easy to get something as stream oriented as this
>      to work in a record world. HP's Java and C engineers
>      must have been rather smart.

I had to read that last two or three times before I got it. :-) The
surprise isn't so much over the fact that an OpenVMS Java can deal with
a stream of lines (which after all _is_ a text file), but rather over
the fact that the VMS developers managed to slot this record format in
which the other record format types?

AHS
-- 
We must recognize the chief characteristic of the modern era - a
permanent state of what I call violent peace.
-- James D. Watkins
0
Reply Arved 2/27/2011 10:05:52 PM

On 27-02-2011 08:40, Ken Wesson wrote:
> On Sat, 26 Feb 2011 14:29:08 -0500, Arne Vajhøj yipped:
>> On 26-02-2011 05:17, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:
>>>> But the relevance is not that big. Because mainframes happen to be a
>>>> lot more expensive than PC's.
>>>
>>> One computer is still one computer, no matter how expensive it is. It's
>>> the price tag whose relevance is not that big.
>>
>> I have news for you
>
> We've been over this time and again, Arne. Suppose I wrote an iPhone app.
> Suppose I followed your theory of who to design it for. So I made it run
> perfectly on supercomputers, mainframes, and other big, expensive, mostly
> obsolete behemoths at the expense if it working on iPhones. Would I still
> be in business a month later?
>
> Now suppose instead that I followed my theory. So I decide I don't care
> HOW much IBM big iron costs, I'm going to worry only about making the app
> work as well as possible on iPhones, the platform the users will actually
> be running it on. And a month later I have oodles of happy customers and
> I'm laughing all the way to the bank.
>
> Starting to figure it out yet, Arne, where you went wrong?

You have not figured out that your iPhone apps market if 100% iPhone
and 0% anything else may not be representive for all markets?

>>>> The you won't have any users using ASCII.
>>>
>>> Funnily enough, all of them can cope just fine with ASCII text files. I
>>> wonder how that can be, Arne, unless of course you're wrong yet again.
>>
>> It is called "backwards compatibility".
>
> That's very fascinating, Arne, but it does not alter the fact that they
> can cope just fine with ASCII text files. :)

No. That is "backwards compatibility".

Arne
0
Reply UTF 2/27/2011 11:06:12 PM

Lew wrote :
> Wojtek wrote:
> As for emails, I embed JPGs in email all the time.  Is that 7-bit ASCII?  Or 
> even pure text?

You must have missed reading the line:

"To get binary information you need to have an encoding standard, which 
is itself 7-bit."

For instance UU-ENCODE

-- 
Wojtek :-)


0
Reply Wojtek 2/27/2011 11:18:11 PM

On 27-02-2011 16:37, Tom Anderson wrote:
> On Sat, 26 Feb 2011, Arne Vajh�j wrote:
>> On 26-02-2011 05:45, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 14:37:07 +0000, Martin Gregorie wrote:
>>>> On Fri, 25 Feb 2011 06:26:43 +0100, Ken Wesson wrote:
>>>>> If by "very common" you mean used on one in ten thousand or fewer of
>>>>> their computers. For every single z/OS machine in corporate America
>>>>> there are probably a thousand blade servers and ten thousand office
>>>>> PCs
>>>>> and employer-provided laptops and God alone knows how many employee
>>>>> smartphones with plans and/or handsets paid for by their company.
>>>>
>>>> By that standard PCs, in which lets include desktops and laptops, are
>>>> also a tiny small proportion of all computers once you count phones and
>>>> all the embedded computers in vehicles.
>>>
>>> I'm only counting machines you can add onto with an open-ended set of
>>> software applications.
>>
>> Practically all mobile phones support Java ME.
>
> Practically all non-smartphones do, i think. My understanding (gained
> from looking at the wikipedia article on smartphones) is that about 20%
> of phones in first-world countries are smartphones, and that very
> roughly speaking, a third of smartphones are Android, a third Symbian, a
> sixth iOS, and a sixth BlackBerry. Symbian and BlackBerry support J2ME;
> iOS and Android don't. So, about 10% of mobile phones in the first world
> don't support J2ME. I think that's enough that it's not 'practically all'.
>
> Particularly since market share is apparently calculated by dollars, not
> units, in which case the market share of non-J2ME phones is very
> significantly greater than 10%.

OK - that is true.

But none of the mobile phones you mention without Java ME fell
in the category where you can not get apps.

Arne

PS: You can run Java ME apps on Android by adding a ME
     toolkit, but it does not really count, because it is
     not provide for or supported by Google.

0
Reply ISO 2/27/2011 11:20:07 PM

On 27-02-2011 17:05, Arved Sandstrom wrote:
> On 11-02-27 04:19 PM, Arne Vajhøj wrote:
> [ SNIP ]
>
>> And I am happy to inform you that the above program
>> actually works with VMS variable length files.
>>
>> Dump of input:
>>
>>          Record type:                      Variable
>>          File organization:                Sequential
>>          Record attributes:                Implied carriage control
>>          End of file block:                1
>>          End of file byte:                 16
>>
>>   007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
>>
>> Dump of output:
>>
>>      VAX-11 RMS attributes
>>          Record type:                      Variable
>>          File organization:                Sequential
>>          Record attributes:                Implied carriage control
>>          End of file block:                1
>>          End of file byte:                 16
>>
>>   007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
>>
>> So QED.
>>
>> Arne
>>
>> PS: for those with a VMS system that want to test themselves,
>>      then remember to set the logical that tells Java to use
>>      variable length files in stream mode.
>>
>> PPS: I am actually somewhat surprised that it works. It is
>>       not that easy to get something as stream oriented as this
>>       to work in a record world. HP's Java and C engineers
>>       must have been rather smart.
>
> I had to read that last two or three times before I got it. :-) The
> surprise isn't so much over the fact that an OpenVMS Java can deal with
> a stream of lines (which after all _is_ a text file), but rather over
> the fact that the VMS developers managed to slot this record format in
> which the other record format types?

Yes.

The program Ken provided is very stream oriented and not that
easy to get to fit with something record oriented.

And Java is a hard problem for this stuff, because:
1) the Java API itself is stream oreiented
2) the API is completely standardized (in Fortran, Pascal,
    C etc. they just added options to OPEN, open, fopen to
    support all the possibilities)

Arne
0
Reply UTF 2/27/2011 11:25:27 PM


"Ken Wesson" <kwesson@gmail.com> wrote in message 
news:4d6a530f$1@news.x-privat.org...
> On Sat, 26 Feb 2011 14:36:03 -0500, Arne Vajhøj wrote:
>
>> On 26-02-2011 05:13, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 12:30:23 -0500, Arne Vajhøj wrote:
>>>> On 25-02-2011 00:00, Ken Wesson wrote:
>>>>> On Thu, 24 Feb 2011 20:58:24 -0500, Arne Vajhøj wrote:
>>>>>> On 24-02-2011 19:12, Lew wrote:
>>>>>>> On 02/24/2011 09:49 AM, RedGrittyBrick wrote:
>>>>>>>> On 24/02/2011 14:00, Ken Wesson wrote:
>>>>>>>>> Windows text files are flat ASCII files (with CRLF line ends).
>>>>>>>>
>>>>>>>> Actually I find that, nowadays, lots of text files on Windows are
>>>>>>>> so-called
>>>>>>>> 'ANSI' (mostly CP-1252) or 'Unicode' (usually meaning UTF-16 with
>>>>>>>> BOM).
>>>>>>>>
>>>>>>>> Even on my ancient XP boxes, Notepad offers only ANSI, Unicode,
>>>>>>>> Unicode big-endian and UTF-8. Wordpad offers RTF, Text-Document
>>>>>>>> (turns out to be CP-1252), Text-Document DOS format (turns out to
>>>>>>>> be CP-850) and Unicode. No
>>>>>>>> ASCII.
>>>>>>>
>>>>>>> Windows hasn't used ASCII in decades.
>>>>>>
>>>>>> I don't think it ever have.
>>>>>
>>>>> Funny then that bog-standard ASCII files seem to read and write just
>>>>> fine in Notepad on the occasions that I use Windows computers.
>>>>
>>>> That just mean that it use something ASCII compatible - not that it
>>>> uses ASCII.
>>>
>>> Sophistry.
>>
>> Simple fact.
>
> No, sophistry. You can't use a superset of ASCII without using ASCII, any
> more than you can take a bath in soapy water without using water.
>
>>>> And you can easily verify that it indeed supports characters not part
>>>> of ASCII.
>>>
>>> Never said it didn't.
>>
>> Yes - you did.
>
> No - I didn't.
>
>> You said they used ASCII.
>
> I didn't say they used *only* ASCII.
>
>> If they did that then they would not support characters not ASCII.
>
> Don't be silly. If I write software that uses TCP/IP does that mean it
> does not support any communication protocols other than TCP/IP? What if
> it directly sends serial port commands to /dev/lpt0 as well as using TCP/
> IP? Uh-oh! According to Arne such software is impossible! And yet I note
> that there are web browsers that can both surf and print. Some can even
> do them at the same time. :)
>
>>>>> All of those seem to be ASCII plus another up to 128 characters, or
>>>>> in the case of UTF-16, another up to 65408 characters.
>>>>>
>>>>> Saying that a 7-bit-clean file interpreted in one of those is not
>>>>> ASCII is like saying that humans are not mammals.
>>>>
>>>> And?
>>>>
>>>> Noone is saying that such a file is not ASCII.
>>>
>>> You were.
>>
>> No.
>
> Yes, you were! You even did it again, above.
>
> I think I'm getting close to the point of plonking you, Arne. When you're
> not barking at me like some rabid wolf you're mainly engaging in shallow
> forms of intellectual dishonesty such as the above.
>
>>>> PS: UTF-16 is *not* ASCII compatible.
>>>
>>> It is if you strip the high bytes and not just the 7th bits.
>>
>> Which means non compatible.
>
> Nonsense.

If you try to read as ASCII file as if it were UTF-8, you'll see the right 
characters: compatible.
If you try to read as ASCII file as if it were UTF-16, you'll see garbage: 
incompatible.

 

0
Reply Mike 2/28/2011 12:18:43 AM


"Arne Vajhøj" <arne@vajhoej.dk> wrote in message 
news:4d6ab1be$0$23759$14726298@news.sunsite.dk...
>
> And I am happy to inform you that the above program
> actually works with VMS variable length files.
>
> Dump of input:
>
>         Record type:                      Variable
>         File organization:                Sequential
>         Record attributes:                Implied carriage control
>         End of file block:                1
>         End of file byte:                 16
>
>  007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
>
> Dump of output:
>
>     VAX-11 RMS attributes
>         Record type:                      Variable
>         File organization:                Sequential
>         Record attributes:                Implied carriage control
>         End of file block:                1
>         End of file byte:                 16
>
>  007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000
>
> So QED.
>
> Arne
>
> PS: for those with a VMS system that want to test themselves,
>     then remember to set the logical that tells Java to use
>     variable length files in stream mode.
>
> PPS: I am actually somewhat surprised that it works. It is
>      not that easy to get something as stream oriented as this
>      to work in a record world. HP's Java and C engineers
>      must have been rather smart.

It's not that hard, really.  You open and manipulate the file with RMS, 
which takes care of the record format.  On write, you translate every LF to 
"end current record", while on read you translate the end of every record to 
an LF in what's returned.  I've done similar things in C or C++ code that 
has to be portable among different OS's.  (The tricky bit is (or used to be) 
writing binary bag-of-bytes files on VMS.  For Unix compatibility, they need 
to know exactly how long they are, which may be an odd number of bytes, but 
RMS can't write files that aren't an even number of bytes, so you've got to 
use the XQP to update the file's length after RMS has closed it.  At least, 
that was true back in the early 90s.) 

0
Reply Mike 2/28/2011 12:30:21 AM

On Sun, 27 Feb 2011 15:39:37 -0500, Arne Vajhøj wrote:

> Note that it looks like readline does not include the line delimiter but
> explicitly include a new line.
> 
> Point being that on Windows it is still \n not \r\n.
>
But does write() put the \r back in Windows? I don't have a suitable 
windows box to check it on. If it does this it's merely implementing good 
cross-platform portability, which I'd expect from any current language: 
even C translates the external representation of \n to suit the platform 
its running on.
 

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |
0
Reply Martin 2/28/2011 12:44:38 AM

On 27-02-2011 08:35, Ken Wesson wrote:
> On Sat, 26 Feb 2011 14:36:03 -0500, Arne Vajhøj wrote:
>> On 26-02-2011 05:13, Ken Wesson wrote:
>>> On Fri, 25 Feb 2011 12:30:23 -0500, Arne Vajhøj wrote:
>>>> And you can easily verify that it indeed supports characters not part
>>>> of ASCII.
>>>
>>> Never said it didn't.
>>
>> Yes - you did.
>
> No - I didn't.
>
>> You said they used ASCII.
>
> I didn't say they used *only* ASCII.

No.

But per convention then ASCII means only ASCII.

If you set the charset to ASCII in a Content-Type
and the content is actually UTF-8, then you will
get errors.

>> If they did that then they would not support characters not ASCII.
>
> Don't be silly. If I write software that uses TCP/IP does that mean it
> does not support any communication protocols other than TCP/IP? What if
> it directly sends serial port commands to /dev/lpt0 as well as using TCP/
> IP? Uh-oh! According to Arne such software is impossible! And yet I note
> that there are web browsers that can both surf and print. Some can even
> do them at the same time. :)

That is not particular relevant.

No one has implied that because a computer uses UTF-8, then it
did not have a browser or could not print or anything similar.

>>>>> All of those seem to be ASCII plus another up to 128 characters, or
>>>>> in the case of UTF-16, another up to 65408 characters.
>>>>>
>>>>> Saying that a 7-bit-clean file interpreted in one of those is not
>>>>> ASCII is like saying that humans are not mammals.
>>>>
>>>> And?
>>>>
>>>> Noone is saying that such a file is not ASCII.
>>>
>>> You were.
>>
>> No.
 >>
>> I have very specifically been talking about what the
>> systems uses.
 >
> Yes, you were! You even did it again, above.

Read again.

> I think I'm getting close to the point of plonking you, Arne.

Do you need technical assistance on how to add me to the
killfile in your newsreader?

>>>> PS: UTF-16 is *not* ASCII compatible.
>>>
>>> It is if you strip the high bytes and not just the 7th bits.
>>
>> Which means non compatible.
>
> Nonsense.

Try it.

If you read UTF-16 as ASCII you get lots of extra nuls.

(and if the program is a C program then that is a big problem)

Arne

0
Reply UTF 2/28/2011 12:48:22 AM

On 27-02-2011 19:30, Mike Schilling wrote:
> "Arne Vajhøj" <arne@vajhoej.dk> wrote in message
> news:4d6ab1be$0$23759$14726298@news.sunsite.dk...
>> And I am happy to inform you that the above program
>> actually works with VMS variable length files.

>> PPS: I am actually somewhat surprised that it works. It is
>> not that easy to get something as stream oriented as this
>> to work in a record world. HP's Java and C engineers
>> must have been rather smart.
>
> It's not that hard, really. You open and manipulate the file with RMS,
> which takes care of the record format. On write, you translate every LF
> to "end current record", while on read you translate the end of every
> record to an LF in what's returned.

It is a bit more complex than that. Because the line contained a LF.

It works naturally in languages like Fortran, Pascal, Cobol.

In languages like C or Java that are stream oriented and even
may have special meaning for LF, then it becomes a bit messy.

For Java on VMS the recommendation is to use stream_lf files
instead of variable length files.

It is simpler.

But they have added some support for the other types to
be able to work with other stuff.

>                              I've done similar things in C or C++
> code that has to be portable among different OS's. (The tricky bit is
> (or used to be) writing binary bag-of-bytes files on VMS. For Unix
> compatibility, they need to know exactly how long they are, which may be
> an odd number of bytes, but RMS can't write files that aren't an even
> number of bytes, so you've got to use the XQP to update the file's
> length after RMS has closed it. At least, that was true back in the
> early 90s.)

Little has changed with RMS and SYS$QIO(W) since then.

Arne
0
Reply UTF 2/28/2011 12:54:48 AM

On 27-02-2011 19:44, Martin Gregorie wrote:
> On Sun, 27 Feb 2011 15:39:37 -0500, Arne Vajhøj wrote:
>
>> Note that it looks like readline does not include the line delimiter but
>> explicitly include a new line.
>>
>> Point being that on Windows it is still \n not \r\n.
>>
> But does write() put the \r back in Windows? I don't have a suitable
> windows box to check it on.

It does.

>                         If it does this it's merely implementing good
> cross-platform portability, which I'd expect from any current language:
> even C translates the external representation of \n to suit the platform
> its running on.

Yes.

And Python probably just inherited it from C.

Arne

0
Reply UTF 2/28/2011 12:58:02 AM

Ken Wesson <kwesson@gmail.com> wrote:
> 
> I remember when this newsgroup used to actually be about Java programming.

Oh, so you lurked first, then?

-- 
Leif Roar Moldskred
0
Reply Leif 2/28/2011 7:38:40 AM

Ken Wesson <kwesson@gmail.com> wrote:
> 
> Don't be ridiculous. There are exactly two app stores out there so far 
> for phones: the Android Market and the Apple App Store.

And Nokia's Ovi Store. And Sony-Ericsson's eStore. And an almost
fanatical devotion to the pope.

Maybe you should come in again?

-- 
Leif Roar Moldskred
0
Reply Leif 2/28/2011 7:45:10 AM

Leif Roar Moldskred wrote:
> Ken Wesson<kwesson@gmail.com>  wrote:
>>
>> I remember when this newsgroup used to actually be about Java programming.

Up until "Ken Wesson" started trolling.

> Oh, so you lurked first, then?

No, he just pulled the same stuff under different aliases, no doubt, or else 
he was busy being an idiot somewhere else.

Given all the fantasies and flat-out baloney that "Ken Wesson" claims to 
remember, I wouldn't believe this claim either. at least not just because he 
claims it.  I've been using cljp long enough to know what the truth is, anyway.

Personally I think he just lies and makes up stuff in order to argue.  Of 
course, I'm only reading the snippets people quote, so it's remotely possible 
I need a larger sample.  I only have a 99.99% confidence level.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 2/28/2011 1:04:31 PM

On 2011-02-28, Lew <noone@lewscanon.com> wrote:
> Leif Roar Moldskred wrote:
>> Ken Wesson<kwesson@gmail.com>  wrote:
>>>
>>> I remember when this newsgroup used to actually be about Java programming.
>
> Up until "Ken Wesson" started trolling.

I still think people should have let me keep containing him in the
Great SWT Program thread. It may have been ugly, but at least it was
only ugly in one single thread which could then be killfiled.

Cheers,
	Bent D
-- 
Bent Dalager - bcd@pvv.org - http://www.pvv.org/~bcd
                                    powered by emacs
0
Reply Bent 2/28/2011 2:19:08 PM

Whoa, whoa, TIME OUT.

This is getting out of hand. How did a simple debate about text file 
formats turn into this mess?!

Ken, Tom -- Obviously, reasonable people may disagree on where the 
boundary between text files and everything else lies.

Everyone that's flaming -- There's no need for making things personal 
and disparaging one another just because you disagree on where to draw 
the line between text and binary files.

I note that most of the flaming is directed at Ken and contains some 
outlandish assertions.

* "Ken is not keeping his cool". Ken seems to be keeping his cool more 
than at least two or three of the other participants, though that last 
round of replies replacing quoted material with "bark bark bark!" is a 
little bit dubious in that regard.

* "Ken is not a programmer". I think 
http://groups.google.com/group/clojure/msg/912f8570aa350e99 suffices to 
disprove that claim, unless you're going to start throwing around 
conspiracy theories that his posts to that group are all ghost-written 
by somebody else.

* "Ken is a troll". I checked Ken's history in this group and until 
recently he was making the occasional, mostly unnoticed constructive 
contribution. He only seems to have started generating dozens or 
argumentative posts when people started being unpleasant to him for 
whatever reason. The flashpoints being things like that "lmgtfy.com" 
link and similar instances where Ken said something that seems perfectly 
reasonable to me and someone's response was less than 100% technical, 
on-point, and devoid of personal insinuations regarding Ken.

* The more outlandish conspiracy theories raised in the past 24 hours 
aren't even worth consideration. The ONLY thing Ken has in common with a 
certain past cljp poster seems to be a refusal to let someone else have 
the last word if that word is interpreted as a personal attack or as 
otherwise wrong. And he clearly has that in common with Arne, among 
other people here, as well.

Also, His post headers look nothing like that other poster's; he 
apparently sometimes uses vi on unix workstations and that other poster 
hates vi; he apparently does a lot of coding in Lisp and that other 
poster hates Lisp; and so on, and so forth.

So, it seems like further trouble can be avoided if:

1. Ken tries not to make that big a deal out of minor insinuations such
    as lmgtfy links.

2. Everyone else avoids making personal comments about Ken, or implying
    same. In fact, everyone should be avoiding making or insinuating
    personal comments about everyone else anyway, because as Ken has
    pointed out that is not the topic of this newsgroup.

3. With regard to the three threads currently ablaze, everyone just
    shut up. Let Ken have the last word -- that seems fairer than the
    other options, as it's likely Ken will just make a few parting
    remarks about his definition of text vs. binary files, replace a
    few more quoted non-Java-related passages with "bark bark bark!",
    and say a couple of more times that this is a Java newsgroup and
    not a calling-people-names newsgroup and then drop it.
    Whereas just about anyone else getting the last words in will
    probably make them "Ken is a troll" or something similarly less
    desirable from the standpoint of topicality.

    Also, since the first (albeit weak) jabs were thrown at Ken, it
    seems only fair to let Ken get an equal number of responses.

This may sound an awful lot like it boils down to "let Ken win", and 
actually it basically is, but in my opinion with good reason: Ken has 
tried to stick to the technical matters and has mostly avoided 
namecalling and similar behaviors, though he might have been better 
served just trimming the namecalling of others silently than calling 
more attention to it and replacing it with "bark bark bark!", whereas 
almost everyone else (except Tom and maybe Arved) has been caught 
flaming. I'd say they clearly ceded Ken the moral high ground by doing so.

On the technical merits, call it a draw. By Ken's fairly sensible 
definition of text files, the record-oriented files on the mainframes 
aren't text files. By other definitions (say, "a file format whose 
primary purpose is to store text", they are. Ken makes a valid point 
that these files can be used to store information that will not be 
representable in common in-memory string formats, including 
java.lang.String; a valid counterpoint is that this would be a quite 
unusual use of the files, and in non-pathological cases the files can be 
processed by normal text-file-handling tools without loss. One could 
even make the argument that Ken's argument is analogous to "because JPEG 
compression would clobber information steganographically encoded in the 
low bits of the color channels of the pixels of a raster, JPEG and other 
lossy non-raster image formats aren't true picture files".
0
Reply A 2/28/2011 4:34:34 PM

On Feb 28, 4:34=A0pm, A Lurker <not.gonna.t...@you.com> wrote:
> Whoa, whoa, TIME OUT.
.....
> argument is analogous to "because JPEG
> compression would clobber information steganographically encoded in the
> low bits of the color channels of the pixels of a raster, JPEG and other
> lossy non-raster image formats aren't true picture files".

Well, they aren't. The One True image format is PNG. Obviously.
0
Reply Paul 2/28/2011 5:05:02 PM

"A Lurker" <not.gonna.tell@you.com> wrote in message 
news:ikgiqp$sk6$1@speranza.aioe.org...
>
> On the technical merits, call it a draw. By Ken's fairly sensible 
> definition of text files, the record-oriented files on the mainframes 
> aren't text files.

Sure they are.  In fact, they were the traditional way to represent text 
before the Unix "a file is nothing but a sequence of bytes, with any further 
structure imposed on it by the applications that use it" philosophy became 
so prevalent.  Explicit records are isomorphic with records delimited by 
characters (CR, LF, or CR/LF being the popular choices).
 

0
Reply Mike 2/28/2011 6:09:48 PM

On 02/28/2011 12:05 PM, Paul Cager wrote:
> On Feb 28, 4:34 pm, A Lurker<not.gonna.t...@you.com>  wrote:
>> Whoa, whoa, TIME OUT.
> ....
>> argument is analogous to "because JPEG
>> compression would clobber information steganographically encoded in the
>> low bits of the color channels of the pixels of a raster, JPEG and other
>> lossy non-raster image formats aren't true picture files".
>
> Well, they aren't. The One True image format is PNG. Obviously.

No, it's PBM.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth
0
Reply Joshua 2/28/2011 6:14:09 PM

On Feb 28, 6:14=A0pm, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
> On 02/28/2011 12:05 PM, Paul Cager wrote:
>
> > On Feb 28, 4:34 pm, A Lurker<not.gonna.t...@you.com> =A0wrote:
> >> Whoa, whoa, TIME OUT.
> > ....
> >> argument is analogous to "because JPEG
> >> compression would clobber information steganographically encoded in th=
e
> >> low bits of the color channels of the pixels of a raster, JPEG and oth=
er
> >> lossy non-raster image formats aren't true picture files".
>
> > Well, they aren't. The One True image format is PNG. Obviously.
>
> No, it's PBM.

I'm not so sure the situation is that black and white.
0
Reply Paul 2/28/2011 6:55:52 PM

Paul Cager wrote:
> Joshua Cranmer wrote:
>> Paul Cager wrote:
>>> Well, they aren't. The One True image format is PNG. Obviously.
>
>> No, it's PBM.
>
> I'm not so sure the situation is that black and white.
>

To make that the alpha choice your picks'll be clear.

--
Lew
0
Reply Lew 2/28/2011 7:19:51 PM

On 11-02-28 02:55 PM, Paul Cager wrote:
> On Feb 28, 6:14 pm, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
>> On 02/28/2011 12:05 PM, Paul Cager wrote:
>>
>>> On Feb 28, 4:34 pm, A Lurker<not.gonna.t...@you.com>  wrote:
>>>> Whoa, whoa, TIME OUT.
>>> ....
>>>> argument is analogous to "because JPEG
>>>> compression would clobber information steganographically encoded in the
>>>> low bits of the color channels of the pixels of a raster, JPEG and other
>>>> lossy non-raster image formats aren't true picture files".
>>
>>> Well, they aren't. The One True image format is PNG. Obviously.
>>
>> No, it's PBM.
> 
> I'm not so sure the situation is that black and white.

So you're admitting that your opinion is coloured?

AHS

-- 
We must recognize the chief characteristic of the modern era - a
permanent state of what I call violent peace.
-- James D. Watkins
0
Reply Arved 2/28/2011 8:55:41 PM

On Feb 28, 7:19=A0pm, Lew <l...@lewscanon.com> wrote:
> Paul Cager wrote:
> > Joshua Cranmer wrote:
> >> Paul Cager wrote:
> >>> Well, they aren't. The One True image format is PNG. Obviously.
>
> >> No, it's PBM.
>
> > I'm not so sure the situation is that black and white.
>
> To make that the alpha choice your picks'll be clear.

This is a serious matter. I find your flippancy unpalettable.
0
Reply Paul 2/28/2011 11:34:39 PM

On 02/28/2011 06:34 PM, Paul Cager wrote:
> On Feb 28, 7:19 pm, Lew<l...@lewscanon.com>  wrote:
>> Paul Cager wrote:
>>> Joshua Cranmer wrote:
>>>> Paul Cager wrote:
>>>>> Well, they aren't. The One True image format is PNG. Obviously.
>>
>>>> No, it's PBM.
>>
>>> I'm not so sure the situation is that black and white.
>>
>> To make that the alpha choice your picks'll be clear.
>
> This is a serious matter. I find your flippancy unpalettable.

Wait a bit, flipping out makes you a XOR loser.

-- 
Lew
Honi soit qui mal y pense.
0
Reply Lew 3/1/2011 12:04:35 AM

On Feb 28, 8:55=A0pm, Arved Sandstrom <asandstrom3min...@eastlink.ca>
wrote:
> On 11-02-28 02:55 PM, Paul Cager wrote:
>
>
>
>
>
>
>
>
>
> > On Feb 28, 6:14 pm, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
> >> On 02/28/2011 12:05 PM, Paul Cager wrote:
>
> >>> On Feb 28, 4:34 pm, A Lurker<not.gonna.t...@you.com> =A0wrote:
> >>>> Whoa, whoa, TIME OUT.
> >>> ....
> >>>> argument is analogous to "because JPEG
> >>>> compression would clobber information steganographically encoded in =
the
> >>>> low bits of the color channels of the pixels of a raster, JPEG and o=
ther
> >>>> lossy non-raster image formats aren't true picture files".
>
> >>> Well, they aren't. The One True image format is PNG. Obviously.
>
> >> No, it's PBM.
>
> > I'm not so sure the situation is that black and white.
>
> So you're admitting that your opinion is coloured?

True. It seems I must surrender the moral high ground.
0
Reply Paul 3/1/2011 10:36:00 AM

On Mon, 28 Feb 2011, Lew wrote:

> On 02/28/2011 06:34 PM, Paul Cager wrote:
>> On Feb 28, 7:19 pm, Lew<l...@lewscanon.com>  wrote:
>>> Paul Cager wrote:
>>>> Joshua Cranmer wrote:
>>>>> Paul Cager wrote:
>>>>>> Well, they aren't. The One True image format is PNG. Obviously.
>>>>> 
>>>>> No, it's PBM.
>>>> 
>>>> I'm not so sure the situation is that black and white.
>>> 
>>> To make that the alpha choice your picks'll be clear.
>> 
>> This is a serious matter. I find your flippancy unpalettable.
>
> Wait a bit, flipping out makes you a XOR loser.

Brethren, we can work this out if we just sit down peacefully together, 
put on some reggae, and smoke a few pipes of weed. I'm sure we'll soon 
come to some agreement about rasta image formats.

tom

-- 
I think the Vengaboys compliment his dark visions splendidly well. -- Mark
Watson, on 'Do you listen to particular music when reading lovecraft?'
0
Reply Tom 3/1/2011 9:49:14 PM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-512530360-1299017101=:10309
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Sun, 27 Feb 2011, Arne Vajhøj wrote:

> On 27-02-2011 15:20, Tom Anderson wrote:
>> On Sun, 27 Feb 2011, Arne Vajhøj wrote:
>> 
>>> On 27-02-2011 09:59, Ken Wesson wrote:
>>>> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>>>>> On 26-02-2011 06:15, Ken Wesson wrote:
>>>>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>>>>> a text file contains records. They are variable length records with a
>>>>>>> 'newline' encoding as the delimiter.
>>>>>> 
>>>>>> By that definition the concept of "record-based" vs.
>>>>>> "not-record-based"
>>>>>> becomes completely meaningless.
>>>>>> 
>>>>>> But most of us use "records" to mean a structure that involves out-of-
>>>>>> band boundaries of some sort. Linear text with inline line break etc.
>>>>>> characters has only in-band boundaries and is much less structured
>>>>>> than
>>>>>> what a "record" typically implies.
>>>>> 
>>>>> A line is by definition a structure because there is something that
>>>>> determines where it starts and where it ends.
>>>> 
>>>> But it's entirely in-band structure. Line breaks are a natural part of
>>>> texts.
>>>> 
>>>>> Neither a count prefix or the the line delimiter are part of the line
>>>>> itself.
>>>> 
>>>> But you're looking at the wrong unit of granularity here. A line
>>>> delimiter is part of the *text* itself. But a count prefix is not.
>>>> Read a
>>>> page of a novel. You will notice many line breaks, but no count
>>>> prefixes,
>>>> if your selection was at all typical.
>>> 
>>> I suggest you look at Java BufferedReader readLine, Pascal readln etc. -
>>> they do not return the line break as part of the line.
>> 
>> Oddly, Pythons's file.readline() does. I believe it's so that readline()
>> is the inverse of write(), which does not add a line terminator. You
>> might think that it would be more sensible that readline() should strip
>> the terminator, and that there should be a writeline() that adds one,
>> but that's not how it is.
>
> Note that it looks like readline does not include the line delimiter
> but explicitly include a new line.
>
> Point being that on Windows it is still \n not \r\n.

I think it's more that there's a low-level layer that converts 
platform-specific line endings to NL on the way in, and NL back to 
platform-specific line endings on the way out. On top of that, readline() 
includes the NL.

>> Now, who can point me at this atypical novel with count prefixes?
>
> Docs or system or ... ?

A bookshop or ISBN number would do. Or is it actually Count Prefixes, 
Dracula's Greek cousin?

tom

-- 
Sometimes it takes a madman like Iggy Pop before you can SEE the logic
really working.
--232016332-512530360-1299017101=:10309--
0
Reply Tom 3/1/2011 10:05:01 PM

On 01-03-2011 17:05, Tom Anderson wrote:
> On Sun, 27 Feb 2011, Arne Vajhøj wrote:
>
>> On 27-02-2011 15:20, Tom Anderson wrote:
>>> On Sun, 27 Feb 2011, Arne Vajhøj wrote:
>>>
>>>> On 27-02-2011 09:59, Ken Wesson wrote:
>>>>> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>>>>>> On 26-02-2011 06:15, Ken Wesson wrote:
>>>>>>> On Fri, 25 Feb 2011 14:58:27 +0000, Martin Gregorie wrote:
>>>>>>>> a text file contains records. They are variable length records
>>>>>>>> with a
>>>>>>>> 'newline' encoding as the delimiter.
>>>>>>>
>>>>>>> By that definition the concept of "record-based" vs.
>>>>>>> "not-record-based"
>>>>>>> becomes completely meaningless.
>>>>>>>
>>>>>>> But most of us use "records" to mean a structure that involves
>>>>>>> out-of-
>>>>>>> band boundaries of some sort. Linear text with inline line break
>>>>>>> etc.
>>>>>>> characters has only in-band boundaries and is much less structured
>>>>>>> than
>>>>>>> what a "record" typically implies.
>>>>>>
>>>>>> A line is by definition a structure because there is something that
>>>>>> determines where it starts and where it ends.
>>>>>
>>>>> But it's entirely in-band structure. Line breaks are a natural part of
>>>>> texts.
>>>>>
>>>>>> Neither a count prefix or the the line delimiter are part of the line
>>>>>> itself.
>>>>>
>>>>> But you're looking at the wrong unit of granularity here. A line
>>>>> delimiter is part of the *text* itself. But a count prefix is not.
>>>>> Read a
>>>>> page of a novel. You will notice many line breaks, but no count
>>>>> prefixes,
>>>>> if your selection was at all typical.
>>>>
>>>> I suggest you look at Java BufferedReader readLine, Pascal readln
>>>> etc. -
>>>> they do not return the line break as part of the line.
>>>
>>> Oddly, Pythons's file.readline() does. I believe it's so that readline()
>>> is the inverse of write(), which does not add a line terminator. You
>>> might think that it would be more sensible that readline() should strip
>>> the terminator, and that there should be a writeline() that adds one,
>>> but that's not how it is.
>>
>> Note that it looks like readline does not include the line delimiter
>> but explicitly include a new line.
>>
>> Point being that on Windows it is still \n not \r\n.
>
> I think it's more that there's a low-level layer that converts
> platform-specific line endings to NL on the way in, and NL back to
> platform-specific line endings on the way out. On top of that,
> readline() includes the NL.
>
>>> Now, who can point me at this atypical novel with count prefixes?
>>
>> Docs or system or ... ?
>
> A bookshop or ISBN number would do. Or is it actually Count Prefixes,
> Dracula's Greek cousin?

There is a description in the manual:
 
http://h71000.www7.hp.com/doc/731final/4506/4506pro.html#99_record_format_tab
   http://h71000.www7.hp.com/doc/731final/4506/4506pro_005.html

Arne
0
Reply UTF 3/2/2011 12:20:51 AM

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--232016332-636771403-1299104373=:8618
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Tue, 1 Mar 2011, Arne Vajhøj wrote:

> On 01-03-2011 17:05, Tom Anderson wrote:
>> On Sun, 27 Feb 2011, Arne Vajhøj wrote:
>>> On 27-02-2011 15:20, Tom Anderson wrote:
>>>>> On 27-02-2011 09:59, Ken Wesson wrote:
>>>>>> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>>>>>>
>>>>>>> Neither a count prefix or the the line delimiter are part of the line
>>>>>>> itself.
>>>>>> 
>>>>>> But you're looking at the wrong unit of granularity here. A line 
>>>>>> delimiter is part of the *text* itself. But a count prefix is not. 
>>>>>> Read a page of a novel. You will notice many line breaks, but no 
>>>>>> count prefixes, if your selection was at all typical.
>>>>
>>>> Now, who can point me at this atypical novel with count prefixes?
>>> 
>>> Docs or system or ... ?
>> 
>> A bookshop or ISBN number would do. Or is it actually Count Prefixes,
>> Dracula's Greek cousin?
>
> There is a description in the manual:
>
> http://h71000.www7.hp.com/doc/731final/4506/4506pro.html#99_record_format_tab
>  http://h71000.www7.hp.com/doc/731final/4506/4506pro_005.html

Ah. Ken Wesson had implied that there was a novel with count prefixes. 
That's what i was after.

That document is interesting, though, thanks.

I was interested to read that for relative files, "The preferred method of 
tracking relative record numbers is to assign them based on some numeric 
field within the record, for example, the account number". I would imagine 
that for typical data patterns, that would result in a very sparse file. 
Typical unix filesystems deal with sparse files efficiently, by not 
storing all-zero blocks, and i imagine VMS's filesystem does the same, but 
that only works when the areas of zeroes are at least as large as a block 
(or perhaps even a cluster - i don't know). If the record size in a 
relative file is smaller than a block, i imagine there will be a lot of 
empty cells actually stored on disk. Unless there's an indirection layer 
between the file and the disk that the manual doesn't mention.

tom

-- 
Tubes are the foul subterranean entrails of the London beast, stuffed
with the day's foetid offerings. -- Tokugawa
--232016332-636771403-1299104373=:8618--
0
Reply Tom 3/2/2011 10:19:32 PM

On 02-03-2011 17:19, Tom Anderson wrote:
> On Tue, 1 Mar 2011, Arne Vajhøj wrote:
>> On 01-03-2011 17:05, Tom Anderson wrote:
>>> On Sun, 27 Feb 2011, Arne Vajhøj wrote:
>>>> On 27-02-2011 15:20, Tom Anderson wrote:
>>>>>> On 27-02-2011 09:59, Ken Wesson wrote:
>>>>>>> On Sat, 26 Feb 2011 16:44:43 -0500, Arne Vajhøj wrote:
>>>>>>>
>>>>>>>> Neither a count prefix or the the line delimiter are part of the
>>>>>>>> line
>>>>>>>> itself.
>>>>>>>
>>>>>>> But you're looking at the wrong unit of granularity here. A line
>>>>>>> delimiter is part of the *text* itself. But a count prefix is
>>>>>>> not. Read a page of a novel. You will notice many line breaks,
>>>>>>> but no count prefixes, if your selection was at all typical.
>>>>>
>>>>> Now, who can point me at this atypical novel with count prefixes?
>>>>
>>>> Docs or system or ... ?
>>>
>>> A bookshop or ISBN number would do. Or is it actually Count Prefixes,
>>> Dracula's Greek cousin?
>>
>> There is a description in the manual:
>>
>> http://h71000.www7.hp.com/doc/731final/4506/4506pro.html#99_record_format_tab
>>
>> http://h71000.www7.hp.com/doc/731final/4506/4506pro_005.html
>
> Ah. Ken Wesson had implied that there was a novel with count prefixes.
> That's what i was after.

I think he was talking about a book having lines terminated with
line beaks not count prefixed lines.

> That document is interesting, though, thanks.
>
> I was interested to read that for relative files, "The preferred method
> of tracking relative record numbers is to assign them based on some
> numeric field within the record, for example, the account number". I
> would imagine that for typical data patterns, that would result in a
> very sparse file. Typical unix filesystems deal with sparse files
> efficiently, by not storing all-zero blocks, and i imagine VMS's
> filesystem does the same, but that only works when the areas of zeroes
> are at least as large as a block (or perhaps even a cluster - i don't
> know). If the record size in a relative file is smaller than a block, i
> imagine there will be a lot of empty cells actually stored on disk.
> Unless there's an indirection layer between the file and the disk that
> the manual doesn't mention.

I have used VMS for 25 years and I have never found and
usage for relative files.

It is always sequential files (flat files) or index-sequential
files (ISAM files).

Arne

0
Reply UTF 3/3/2011 12:42:20 AM

On Sun, 27 Feb 2011 16:33:20 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:28, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 14:12:44 -0500, Arne Vajhøj wrote:
>>> On 26-02-2011 08:36, Peter Duniho wrote:
>>>> On 2/26/11 9:27 PM, Arved Sandstrom wrote:
>>>>> The usefulness of the term "text file" for me is that it describes a
>>>>> file that can be opened, viewed and used by every application, tool
>>>>> and utility, on every OS and platform, that purports to be a "text
>>>>> editor".
>>>>
>>>> Then I think you need to define "text file" more narrowly than what
>>>> is actually out there. In this thread alone, there have been
>>>> mentioned a number of true text file formats that are simply not
>>>> readable in your average or even above-average text editor found on
>>>> mainstream OSs.
>>>
>>> They are edited fine by any text editor on those systems.
>>>
>>> This includes cross platform editors that are also available on *nix
>>> and Windows.
>>
>> No. Those editors, at minimum, will strip some information from the
>> file, even if you just open it and then save out a copy. If the
>> original had my hypothetical hidden message "the attack begins at
>> midnight" encoded in a pattern of actual 0x0A characters and record
>> boundaries, the output will not, and will generally have converted all
>> 0x0A characters in the original into record boundaries specifically, in
>> that
>>
>> foo<0x0A>bar<boundary>baz
>>
>> would probably have ended up in the vi buffer (or emacs buffer, or
>> whatever) as
>>
>> foo<0x0A>bar<0x0A>baz
> 
> Which is where you logic goes wrong.

No, it doesn't.

> You assume that the editor will store the data in a single string
> separated by \n.

If a text editor doesn't store it's data in strings, then it's a broken 
text editor.

> ArrayList<String> in Java syntax is much more efficient.

Perhaps, but regardless of the in-memory representation, the data is, 
conceptually, just a string. It has line breaks. Wherever the input file 
has line breaks it will have line breaks. Perhaps it will store things as 
a list of individual lines, but it will still conflate some line-end 
character such as 0x0A or 0x0D with those record boundaries you're so 
fond of.

>> If a text editor like vi cannot losslessly load and re-save the file
>> then it is not a text file by any sane definition, including
>> particularly Arved's pretty darn good definition.
> 
> Not only is your logic flawed as described above.

It is not, as described above.

> But is very easy to open such a file in one of the text editors that
> comes with the system and save it and verify that the file did not
> change.

I'm not talking about opening it in one of the "text editors" that comes 
with the system, though, I'm talking about opening it in a generic, 
portable text editor. I expect a system that doesn't have any true native 
text files also won't have any true native text editors, but may have 
things called "text editors" that aren't quite in the same way it has 
things called "text files" that aren't quite.
0
Reply kwesson (107) 3/12/2011 2:50:08 AM

On Sun, 27 Feb 2011 16:35:13 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:29, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 14:19:38 -0500, Arne Vajhøj wrote:
>>> On 26-02-2011 04:45, Ken Wesson wrote:
>>>> I don't see an argument of any kind in your post. Forget to include
>>>> one?
>>>
>>> The quotation
>>
>> was all that there was, besides attributions and your signoff "Arne".
>> Your post contained no meaningful original text.
> 
> It seems pretty meaningful

A post with zero non-machine-generated original text?! You HAVE to be 
joking, Arne.
0
Reply kwesson (107) 3/12/2011 2:58:50 AM

On Sun, 27 Feb 2011 10:57:22 -0500, Arne Vajhøj wrote:

> On 27-02-2011 10:46, Ken Wesson wrote:
>> Will you people give it a FUCKING REST ALREADY?
>>
>> Sheesh!
>>
>> You've made your "point" (such as it is) already; there is no need for
>> endless carping repetitions of your *opinions*.
> 
> Have you considered taking your own advice??

You first.
0
Reply kwesson (107) 3/12/2011 2:59:30 AM

On Sun, 27 Feb 2011 16:18:43 -0800, Mike Schilling wrote:

> "Ken Wesson" <kwesson@gmail.com> wrote in message
> news:4d6a530f$1@news.x-privat.org...
>> On Sat, 26 Feb 2011 14:36:03 -0500, Arne Vajhøj wrote:
>>
>>> On 26-02-2011 05:13, Ken Wesson wrote:
>>>> On Fri, 25 Feb 2011 12:30:23 -0500, Arne Vajhøj wrote:
>>>>> PS: UTF-16 is *not* ASCII compatible.
>>>>
>>>> It is if you strip the high bytes and not just the 7th bits.
>>>
>>> Which means non compatible.
>>
>> Nonsense.
> 
> If you try to read as ASCII file as if it were UTF-8, you'll see the
> right characters: compatible.
> If you try to read as ASCII file as if it were UTF-16, you'll see
> garbage: incompatible.

If you try to read a UTF-8 file with no non-ASCII characters as ASCII, 
you'll see the right characters: compatible.
If you try to read a UTF-16 file with no non-ASCII characters as ASCII, 
you'll see the right characters, though many viewers will render the 
alternating NULs as spaces and it will look a bit ugly but still be 
readable: somewhat compatible.
0
Reply kwesson (107) 3/12/2011 3:01:27 AM

On Sun, 27 Feb 2011 19:48:22 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:35, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 14:36:03 -0500, Arne Vajhøj wrote:
>>> You said they used ASCII.
>>
>> I didn't say they used *only* ASCII.
> 
> No.
> 
> But

No. I didn't say they used *only* ASCII. Thanks for admitting that.

>>> If they did that then they would not support characters not ASCII.
>>
>> Don't be silly. If I write software that uses TCP/IP does that mean it
>> does not support any communication protocols other than TCP/IP? What if
>> it directly sends serial port commands to /dev/lpt0 as well as using
>> TCP/ IP? Uh-oh! According to Arne such software is impossible! And yet
>> I note that there are web browsers that can both surf and print. Some
>> can even do them at the same time. :)
> 
> That is not particular relevant.

Of course it's relevant. It's exactly analogous to your reasoning process.

>>>>>> Saying that a 7-bit-clean file interpreted in one of those is not
>>>>>> ASCII is like saying that humans are not mammals.
>>>>>
>>>>> And?
>>>>>
>>>>> Noone is saying that such a file is not ASCII.
>>>>
>>>> You were.
>>>
>>> No.
>>
>> Yes, you were! You even did it again, above.
> 
> Bark bark.

No meaningful response from Arne here.

>> I think I'm getting close to the point of plonking you, Arne.
> 
> Do you need technical assistance on how to add me to the killfile in
> your newsreader?

No.

>>>>> PS: UTF-16 is *not* ASCII compatible.
>>>>
>>>> It is if you strip the high bytes and not just the 7th bits.
>>>
>>> Which means non compatible.
>>
>> Nonsense.
> 
> Try it.
> 
> If you read UTF-16 as ASCII you get lots of extra nuls.
> 
> (and if the program is a C program then that is a big problem)

Who said it was a C program? But again, you don't seem to be grasping 
what "strip the high bytes" means.
0
Reply kwesson (107) 3/12/2011 3:04:10 AM

On Sun, 27 Feb 2011 18:06:12 -0500, Arne Vajhøj wrote:

> On 27-02-2011 08:40, Ken Wesson wrote:
>> Starting to figure it out yet, Arne, where you went wrong?
> 
> You have not figured out that your iPhone apps market if 100% iPhone and
> 0% anything else may not be representive for all markets?

Please go away, and get back to me once you have mastered English well 
enough for make well-formed, non-ambiguous, grammatical sentences that I 
can actually parse and rebut. :)

>>>>> The you won't have any users using ASCII.
>>>>
>>>> Funnily enough, all of them can cope just fine with ASCII text files.
>>>> I wonder how that can be, Arne, unless of course you're wrong yet
>>>> again.
>>>
>>> It is called "backwards compatibility".
>>
>> That's very fascinating, Arne, but it does not alter the fact that they
>> can cope just fine with ASCII text files. :)
> 
> No.

They can, indeed, cope just fine with ASCII text files.

0
Reply kwesson (107) 3/12/2011 3:05:46 AM

On Sun, 27 Feb 2011 15:09:47 -0500, Arne Vajhøj wrote:

> On 27-02-2011 14:57, Tom Anderson wrote:
>> On Sat, 26 Feb 2011, Arne Vajhøj wrote:
>>> On 26-02-2011 05:17, Ken Wesson wrote:
>>>> On Fri, 25 Feb 2011 09:36:29 -0500, Arne Vajhøj wrote:
>>>>> On 25-02-2011 00:04, Ken Wesson wrote:
>>>>>> On Thu, 24 Feb 2011 20:48:18 -0500, Arne Vajhøj wrote:
>>>>>>> On 24-02-2011 09:00, Ken Wesson wrote:
>>>>>>>
>>>>>>>> And that exhausts 99.99% of the operating system market share
>>>>>>>> right there, if not more,
>>>>>>>
>>>>>>> No.
>>>>>>>
>>>>>>> z/OS, i, OpenVMS, MPE has a lot more market share than 0.01%.
>>>>>>
>>>>>> Nonsense. There are *at least* ten thousand PCs running Windows for
>>>>>> every one machine running one of those operating systems.
>>>>>>
>>>>>> Ten thousand *PCs running Windows*.
>>>>>
>>>>> The PC/mainframe ratio is probably like 100000:1.
>>>>
>>>> Hence why I said *at least*. I was being conservative in my estimates
>>>> -- as generous to *your* case as possible. And still I was
>>>> demolishing it.
>>>
>>> Not at all.
>>>
>>> Because market share is counted in dollars.
>>
>> It bloody well is not!
>>
>> Or are you saying that non-commercial distributions of Linux have zero
>> market share *by definition*?
> 
> No.
> 
> The HW is not free.

Don't be ridiculous, Arne. What is with you and never, ever, ever, ever 
admitting that you might be wrong?

How about the famous browser wars. And the antitrust suit. Market shares 
are discussed for Firefox, IE, Opera, Netscape, Chrome, etc. and have 
been (for a varying mix of browsers) for over a decade now. Most/all of 
those browsers are also freeware and several of them are open source. 
Their market share seems pretty clearly to be being given as a percentage 
of browsing instances (sites receiving hits), not in dollars.
0
Reply kwesson (107) 3/12/2011 3:08:58 AM

On Sun, 27 Feb 2011 20:14:27 +0000, Tom Anderson wrote:

> It's virtually a default presumption for me that new posters with
> questions mean something other than or more than what they say. People
> come here with problems they can't solve themselves, and we get two
> kinds of them (caricaturing somewhat): dumb people with easy problems,
> and smart people with difficult problems; it is not reasonable to assume
> that the former have expressed themselves completely and correctly, and
> sadly, they outnumber the latter.

None of this is relevant in this instance, since I didn't post any 
problems, "dumb" or "smart". I joined a few months ago, mostly lurked, 
and posted the occasional *answer*. And then suddenly, starting about a 
month ago, this pointless dogpile.

>> So I would expand your advice to add that one eschew assuming anything
>> outside the problem statement.
> 
> I hope you don't mind if i decide not to travel with your Logical
> Positivism Bus Company here; i would suggest that while we must not
> assume anything outside the problem statement, we can suggest and
> hypothesise it.

Doing so might very well offend some people who may (rightly!) feel they 
are being condescended to.

>> Beyond that, because this is a discussion forum and not a help desk, it
>> is entirely appropriate to discuss the general applicability of
>> principles elicited from a specific problem.  Thus, even if the OP did
>> want to speak only of log files, it is important and highly relevant to
>> point out that "text files" (about which they actually did ask) have a
>> wider and fuzzer meaning that certain ignorant trolls would believe.
> 
> Well, if not to point it out, to discuss the idea, at least :).

There is exactly one ignorant troll apparent in this thread: Arne. 
Ignorant: he refuses to listen to quite reasonable arguments about his so-
called "text" files can encode out-of-band data that cannot be 
represented natively in a String or anything similar; and he is fixated 
on his own dogmatic interpretation of the phrase "market share" that 
clearly is much narrower than the actual general usage. That's not even 
just ignorant it's *wilfully* ignorant; ignorant can usually be forgiven, 
wilful ignorance, not so much.

As for "troll", have you noticed the sheer amount of traffic he's 
generating? Most of it just relentless browbeating and nearly all of it 
clearly intended to provoke responses via deliberate rudeness, 
outrageously nonsensical statements, and I've even caught him in one or 
two instances of quote editing, for God's sake.
0
Reply kwesson (107) 3/12/2011 3:14:48 AM

On Sun, 27 Feb 2011 11:01:35 -0500, Arne Vajhøj wrote:

> You can easily verify be checking phones from the big producers like
> Nokia.

What?

>>                But even if there was some truth to that claim, where
>>                would
>> your typical user *get* these apps?
> 
> Via the phones browser and HTTP from any web server.

That's web site applets, not client side applications. What are you doing 
presuming to be an expert (rather than a clueless newbie) in 
comp.lang.java.programmer if you can't even grasp the difference between 
Java applets and Java applications?
0
Reply kwesson (107) 3/12/2011 3:16:38 AM

On Mon, 28 Feb 2011 08:04:31 -0500, Lew wrote:

> Leif Roar Moldskred wrote:
>> Ken Wesson<kwesson@gmail.com>  wrote:
>>>
>>> I remember when this newsgroup used to actually be about Java
>>> programming.
> 
> Up until "Ken Wesson" started trolling.

That would have been around half past never, then.

>> Oh, so you lurked first, then?
> 
> No, he just pulled the same stuff under different aliases, no doubt, or
> else he was busy being an idiot somewhere else.

Wrong. But this is a fascinating glimpse into the mind of, apparently, a 
paranoid schizophrenic.

> Given all the fantasies and flat-out baloney that "Ken Wesson" claims to
> remember, I wouldn't believe this claim either.

I'm not the one with the delusional fantasies here, psycho boy.

> at least not just because he claims it.  I've been using cljp long
> enough to know what the truth is, anyway.

You couldn't find the truth with both hands and a map, Lew; that's been 
apparent for quite some time. You, like Arne, have a complete inability 
to admit you're wrong and a strong compulsion to have the last word in 
any debate, combined with a lack of compunctions about stooping to 
vicious namecalling and invective directed at anyone who won't back down. 
You're like alpha male gorillas (or two-year-old humans) beating their 
chests and throwing tantrums when things don't go their way (e.g., 
someone on the internet proves something you said wrong).

> Personally I think he just lies and makes up stuff in order to argue. 

I think that that is precisely what you and Arne do, yes. That, or it's 
delusional psychosis of some stripe or another.

> Of course, I'm only reading the snippets people quote, so it's remotely
> possible I need a larger sample.  I only have a 99.99% confidence level.

In what, your request for a reservation at the nuthouse being accepted? I 
think you can safely bump that estimate up to 100%, Lew. You're gonzo.
0
Reply kwesson (107) 3/12/2011 3:21:07 AM

On Mon, 28 Feb 2011 14:19:08 +0000, Bent C Dalager wrote:

> On 2011-02-28, Lew <noone@lewscanon.com> wrote:
>> Leif Roar Moldskred wrote:
>>> Ken Wesson<kwesson@gmail.com>  wrote:
>>>>
>>>> I remember when this newsgroup used to actually be about Java
>>>> programming.
>>
>> Up until "Ken Wesson" started trolling.
> 
> I still think people should have let me keep containing him in the Great
> SWT Program thread.

What are you blathering about? I don't recall any such thread, nor can I 
recall reading, let alone responding, to any post by you in the two or 
three months I've been posting here or the four or five before that when 
I was purely lurking.

Oh, don't tell me -- *three* delusional paranoiacs in this newsgroup?
0
Reply kwesson (107) 3/12/2011 3:23:02 AM

On Sun, 27 Feb 2011 11:25:58 -0500, Arne Vajhøj wrote:

> On 27-02-2011 10:16, Ken Wesson wrote:
>> Just how many fucking yappers ARE there around here anyway? Are there
>> actually any *human beings*, aside from myself, that are actually
>> capable of engaging in civil discourse and even sometimes disagreeing
>> with what somebody says *without* turning it into a personal fight full
>> of barks, growls, name-calling, and general hystrionics of a most
>> unseemly nature?
> 
> I am pretty sure that you are the only one writing "bark".

The guy who yells "fire" isn't, himself, the fire, so I don't think you 
have a point here, Arne.
0
Reply kwesson (107) 3/12/2011 3:24:26 AM

On Mon, 28 Feb 2011 01:45:10 -0600, Leif Roar Moldskred wrote:

> Ken Wesson <kwesson@gmail.com> wrote:
>> 
>> Don't be ridiculous. There are exactly two app stores out there so far
>> for phones: the Android Market and the Apple App Store.
> 
> And an almost fanatical devotion to the pope.
> 
> Maybe you should come in again?

Non sequitur.

That makes *four* apparent delusional psychotics in this newsgroup. And 
counting.
0
Reply kwesson (107) 3/12/2011 3:25:46 AM

On Sun, 27 Feb 2011 18:20:07 -0500, Arne Vajhøj wrote:

> On 27-02-2011 16:37, Tom Anderson wrote:
>> Particularly since market share is apparently calculated by dollars,
> 
> OK - that is true.

No, it's not, as discussed elsewhere in this thread. Your definition is 
too narrow.
0
Reply kwesson (107) 3/12/2011 3:26:38 AM

On Sun, 27 Feb 2011 11:30:31 -0500, Arne Vajhøj wrote:

> On 27-02-2011 09:27, Ken Wesson wrote:
>> On Sat, 26 Feb 2011 14:26:18 -0500, Arne Vajhøj wrote:
>>> On 26-02-2011 05:56, Ken Wesson wrote:
>>>> On Fri, 25 Feb 2011 09:46:30 -0500, Arne Vajhøj wrote:
>>>>> On 25-02-2011 00:28, Ken Wesson wrote:
>>>>>> On Thu, 24 Feb 2011 21:00:20 -0500, Arne Vajhøj wrote:
>>>>>>> And it is a pretty good guess that the RandomAccessFile searching
>>>>>>> for CR and LF will fail on i also then.
>>>>>>
>>>>>> How fortunate that i runs on fewer than one in ten thousand
>>>>>> machines. Does Java even run on i?
>>>>>
>>>>> Yes.
>>>>
>>>> And what the hell is it used for on i?
>>>
>>> The same that Java is used for on other platforms.
>>
>> Evasion noted. But people don't use i to play games or fileshare or run
>> web services, which are the most usual places I see Java being used.
>> They play Java games on Java phones and Windows, fileshare with
>> Limewire on (mainly) Windows, and run Java web services on Unix.
>>
>> What do people do on i boxes?
> 
> As I wrote - same as for other platforms.
> 
> Phones, file shares and web services are a rather small part of Java
> (phone are growing though).
> 
> The majority of Java work is business apps (some UI - often web based,
> some business logic, some persistence in database etc.).
> 
>>>>>>> Linux will be either ISO-8859-1 or UTF-8 not ASCII.
>>>>>>
>>>>>> Both contain ASCII as a subset -- if you take a pure-ASCII file and
>>>>>> reencode it in either the result is the identical byte sequence.
>>>>>
>>>>> Yes
>>>>
>>>> There you go.
>>>
>>> Exactly.
>>
>> Then I do hope you will drop this particular part of the argument now.
> 
> Sure.

Please stop quote-editing, Arne. It is quite rude.
0
Reply kwesson (107) 3/12/2011 3:27:36 AM

On Sun, 27 Feb 2011 15:19:15 -0500, Arne Vajhøj wrote:

> Actually I think it is a bit weird to test if a file consists of text
> lines without the program being line aware.

It's not testing if a file consists of anything. It's just copying it.

> And the code is rather bad:
> - you are not calling close on rdr and wtr but those are easy to fix.

Your *opinion* has been noted. The operating system will close the file 
handles promptly anyway since the program terminates at that point; and 
it was a quick toss-off for a newspost, not intended for actual 
production use.

> And I am happy to inform you that the above program actually works with
> VMS variable length files.

For values of "works" that allows for copying "normal" text files but not 
ones with hidden data as I described previously.

The fact is, it cannot stuff 257 objects (8-bit characters plus record 
boundaries) or 65537 (16-bit characters plus boundaries) into 256 or 
65536 "char" values, respectively -- that's the pigeonhole principle at 
work, an ironclad law of mathematics, unless someone somewhere is 
cheating by converting into an escape-sequence-using format or binhexing 
or something similar.

>   007A6162 00030072 61620A6F 6F660007 ..foo.bar...baz. 000000

Does not compute.

007A6162 00030072 61620A6F 6F660007 -- four sets of four octets = 16 bytes
...foo.bar...baz.                    -- sixteen text characters = 16 bytes

but

007A6162 00030072 61620A6F 6F660007
.. . f o  o . b a  r . . .  b a z.

00 is unprintable, then lowercase o, then lowercase b. Lowercase b is 
later 6F. Etc. Obviously this is not any straight mapping from octets to 
character values, whether ASCII of EBCDIC or any similar system.

If your claims were correct it would be two probably-unprintable bytes to 
encode a length, followed by foo/nbar, followed by another encoded 
length, then baz. That ought to be 14 bytes. There's an extra byte 
between bar and baz and an extra byte at the end (the latter could be an 
explicit EOF marker, though) and there's the more serious matter that it 
is not a simple letter-substitution code like ASCII or EBCDIC. The extra 
byte between bar and baz and the non-one-to-one character->byte nature of 
the encoding seem pretty clearly to prove that some sort of escaping or 
other conversion is taking place -- i.e., that you cheated.

I proved you can't map those VMS thingys one-to-one onto any character 
set, and rather than simply admit it you tried to sustain your untenable 
claim by attempting what is known in the vernacular as "a fast one", it 
would seem. But either you faked the output (and unconvincingly!) or you 
didn't, and it betrays the very fact you were arguing against: that it 
can't be mapped one-to-one onto normal strings.

I also find it interesting that the first two bytes are 00 7A. That's 
either 122 or 31232, depending on endianness. The actual length of the 
first record, if the first record is "foo" 0x0A character "bar", is 7. So 
either the first two bytes are not a correct character count of the 
field, or the system does not use 1s or 2s-complement integer storage at 
all(!), or you're just plain lying.
0
Reply kwesson (107) 3/12/2011 3:42:58 AM

On Sun, 27 Feb 2011 18:05:52 -0400, Arved Sandstrom wrote:

> a stream of lines (which after all _is_ a text file)

No, a text file is a string. A stream of "lines" that can contain *any* 
character from a character set Q is not a string in the character set Q, 
because it has Q+1 distinguishable "characters" rather than Q.

0
Reply kwesson (107) 3/12/2011 3:45:02 AM

On Sun, 27 Feb 2011 18:25:27 -0500, Arne Vajhøj wrote:

> The program Ken provided is very stream oriented and not that easy to
> get to fit with something record oriented.

That's a very roundabout way of admitting that you were wrong, Arne.

My stab at a translation: "Because Ken's program embarrassingly proved me 
wrong, I had to cheat to 'get it to fit' with my preconceptions by 
massaging/escaping/otherwise transforming the input data, which 
unfortunately shows up in the hex dump quite clearly as characters not 
being in simple one-to-one correspondence with byte values in the data -- 
e.g. 6F corresponds to two different characters in two successive 
positions in the file and lowercase 'b' has two different byte 
representations at two different positions in the file."
0
Reply kwesson (107) 3/12/2011 3:48:38 AM

On Sun, 27 Feb 2011 16:30:21 -0800, Mike Schilling wrote:

> It's not that hard, really.  You open and manipulate the file with RMS,
> which takes care of the record format.  On write, you translate every LF
> to "end current record", while on read you translate the end of every
> record to an LF in what's returned.

And in so doing you will lose any information encoded by using both LF 
character values and record boundaries in the file.
0
Reply kwesson (107) 3/12/2011 3:49:47 AM

On Sun, 27 Feb 2011 19:54:48 -0500, Arne Vajhøj wrote:

> On 27-02-2011 19:30, Mike Schilling wrote:
>> It's not that hard, really. You open and manipulate the file with RMS,
>> which takes care of the record format. On write, you translate every LF
>> to "end current record", while on read you translate the end of every
>> record to an LF in what's returned.
> 
> It is a bit more complex than that. Because the line contained a LF.

Wow, that's two sleazy not-quite-admissions that you were wrong in a row, 
Arne.

> In languages like C or Java that are stream oriented and even may have
> special meaning for LF, then it becomes a bit messy.

In simpler language, you need to cheat to make it seem to treat your 
"text files" as it does real text files.

> For Java on VMS the recommendation is to use stream_lf files instead of
> variable length files.

In other words, to use real text files? :)

>>                              I've done similar things in C or C++
>> code that has to be portable among different OS's. (The tricky bit is
>> (or used to be) writing binary bag-of-bytes files on VMS. For Unix
>> compatibility, they need to know exactly how long they are, which may
>> be an odd number of bytes, but RMS can't write files that aren't an
>> even number of bytes, so you've got to use the XQP to update the file's
>> length after RMS has closed it. At least, that was true back in the
>> early 90s.)
> 
> Little has changed with RMS and SYS$QIO(W) since then.

How ugly. How much did you claim it would cost to replace all of these 
dinosaurs with something reasonably sane and modern?
0
Reply kwesson (107) 3/12/2011 3:51:50 AM

On Sun, 27 Feb 2011 15:25:06 -0500, Arne Vajhøj wrote:

> On 27-02-2011 09:55, Ken Wesson wrote:
>> Of course, neither are true text files -- both fail the TextFileCopier
>> test, in particular, and yours doesn't even pass the most elementary
>> sniff test -- calling that a text file would be like expecting
> 
> If the files can be read and written as text files by the shell,
> Fortran, Cobol, C, C++, Java etc. then they seems to be text by those
> that created those languages.

My entire point was that they cannot.

> And your copier program actually works on OpenVMS

Clarification: my copier program works if you do something to encode its 
input, such that it is no longer a straight ASCII (or EBCDIC or any other 
simple number-substitution cipher on the character set) sequence that is 
fed to the Java program, but one with escape sequences or other trickery.

>>>> Actually C is already broken here even on "normal" systems, because C
>>>> strings can't properly represent text containing NUL characters.
>>>
>>> By definition they can't be included in 'text files'
>>
>> They belong to the Unicode (and indeed even the base ASCII) character
>> set, so by definition they *can* be included in text files.
> 
> Your weird definition.

No, my normal definition.

> The rest of us expect text files to contain printable characters, which
> NUL is not.

A file consisting of 0x00 0x41 0x42 contains printable characters, namely 
AB (if interpreted as ASCII).

So does a file consisting of 0x0A 0x41 0x42 (newline A B). In fact both 
have the same number of printable characters (= characters with visible 
glyphs), and the same exact ones, preceded by one control character.

You apparently would consider the former not a text file and the latter 
to be a text file, and you claim the distinction is based on the presence 
of printable characters. The latter claim does not wash. Nor does the 
claim you likely intended instead, which would base the distinction on 
the *absence* of *un*printable characters.

Indeed, the distinction you are making seems to be purely arbitrary: 
"linefeeds are allowed but NULs are not".

Or maybe it's a bit simpler than that: "such characters are excluded 
that, if they were allowed, would prove Ken right and me wrong". ;)
0
Reply kwesson (107) 3/12/2011 3:57:32 AM

On Sun, 27 Feb 2011 11:33:30 -0500, Arne Vajhøj wrote:

>>> Neither a count prefix or the the line delimiter are part of the line
>>> itself.
>>
>> But you're looking at the wrong unit of granularity here. A line
>> delimiter is part of the *text* itself. But a count prefix is not. Read
>> a page of a novel. You will notice many line breaks, but no count
>> prefixes, if your selection was at all typical.
> 
> I suggest you look at Java BufferedReader readLine, Pascal readln etc. -
> they do not return the line break as part of the line.

Not relevant. It is quite common to split text at line boundaries for 
processing, and to make the line boundaries temporarily implicit. That 
doesn't make them not part of the text. Unless you're claiming that these 
are the same:

MADAM,
  Your Ladiships most faithful Servant,
      and passionate Friend,
          Orinda.

MADAM, Your Ladiships most faithful Servant, and passionate Friend, 
Orinda.

What are you going to claim next, that 
spacesarenotreallyapartofthetexteither?Mybvnprhpsvwls2?!Fckythn.
0
Reply kwesson (107) 3/12/2011 4:02:29 AM

On Mon, 28 Feb 2011 11:34:34 -0500, A Lurker wrote:

> Whoa, whoa, TIME OUT.
> 
> This is getting out of hand. How did a simple debate about text file
> formats turn into this mess?!

I believe "dogpile" is the technical term.

> Ken, Tom -- Obviously, reasonable people may disagree on where the
> boundary between text files and everything else lies.

Just so long as they agree that files that can't be represented as a 
simple string are not text files.

> Everyone that's flaming -- There's no need for making things personal
> and disparaging one another just because you disagree on where to draw
> the line between text and binary files.

That much I agree with.

> * "Ken is not keeping his cool". Ken seems to be keeping his cool more
> than at least two or three of the other participants, though that last
> round of replies replacing quoted material with "bark bark bark!" is a
> little bit dubious in that regard.

I thought it might be instructive to make the distinction between actual 
technical arguments and pointless invective very visible in my followups, 
as well as to make it clear that I was not treating the latter as 
containing any meaningful information content whatsoever. Unfortunately, 
that point seems to have been lost on my various interlocutors.

> * "Ken is not a programmer". I think
> http://groups.google.com/group/clojure/msg/912f8570aa350e99 suffices to
> disprove that claim, unless you're going to start throwing around
> conspiracy theories that his posts to that group are all ghost-written
> by somebody else.

Thanks for that. The archives for that group should furnish plenty of 
additional counterexamples to Arne's nasty and false claims in that vein.

> * "Ken is a troll". I checked Ken's history in this group and until
> recently he was making the occasional, mostly unnoticed constructive
> contribution. He only seems to have started generating dozens or
> argumentative posts when people started being unpleasant to him for
> whatever reason.

You're leaving out the fact that those people started making dozens of 
argumentative posts themselves. My argumentative replies are no more 
numerous than their argumentative posts that I'm replying to.

Arne alone has been making dozens of argumentative posts.

> The flashpoints being things like that "lmgtfy.com" link and similar
> instances where Ken said something that seems perfectly reasonable to
> me and someone's response was less than 100% technical, on-point, and
> devoid of personal insinuations regarding Ken.

True enough. Some people here seem to be unable to disagree with someone 
without including various subtle (and, often, not-so-subtle) put-downs in 
their posts. This is a sad thing and it makes them less effective 
communicators. In the instances where they're actually right, the person 
on the receiving end of their post might learn something, except that 
lacing that post with put-downs or even outright invective will make the 
recipient rather disinclined to accept anything it has to say. Instead of 
being a simple, bipartisan quest to find the truth, as soon as someone 
puts someone else down a debate becomes a battle of egos where whoever 
backs down first loses face. And then both sides will entrench and yield 
not an inch, and you get lots of unproductive barking at one another 
instead of meaningful debate.

I have tried though (with limited success) to avoid lowering myself to 
the level of barking invective, trying instead to steer things back to 
whatever technical points might remain.

> * The more outlandish conspiracy theories raised in the past 24 hours
> aren't even worth consideration.

Conspiracy theories rarely, if ever, are.

> So, it seems like further trouble can be avoided if:
> 
> 1. Ken tries not to make that big a deal out of minor insinuations such
>     as lmgtfy links.

I think it's a little late for that, now. I intend to ignore this entire 
newsgroup just as soon as I've disentangled myself from the present 
debate. Clearly I cannot expect fair treatment from several very vocal 
regular posters here, for whatever reason, and something tells me that at 
least some of them are the type to bear grudges for very long timespans.

> 2. Everyone else avoids making personal comments about Ken, or implying
>     same. In fact, everyone should be avoiding making or insinuating
>     personal comments about everyone else anyway, because as Ken has
>     pointed out that is not the topic of this newsgroup.

Agreed, for the most part. I reserve the right to make the occasional 
such wisecrack in reply to personal attacks or sufficiently egregious and 
off-topic nonsense (e.g., the afore-mentioned conspiracy theories) from 
others.

> 3. With regard to the three threads currently ablaze, everyone just
>     shut up. Let Ken have the last word -- that seems fairer than the
>     other options, as it's likely Ken will just make a few parting
>     remarks about his definition of text vs. binary files, replace a few
>     more quoted non-Java-related passages with "bark bark bark!", and
>     say a couple of more times that this is a Java newsgroup and not a
>     calling-people-names newsgroup and then drop it. Whereas just about
>     anyone else getting the last words in will probably make them "Ken
>     is a troll" or something similarly less desirable from the
>     standpoint of topicality.

I have no objection to this last suggestion. :)

>     Also, since the first (albeit weak) jabs were thrown at Ken, it
>     seems only fair to let Ken get an equal number of responses.

That, too, I find myself agreeing with.

> This may sound an awful lot like it boils down to "let Ken win", and
> actually it basically is, but in my opinion with good reason: Ken has
> tried to stick to the technical matters and has mostly avoided
> namecalling and similar behaviors, though he might have been better
> served just trimming the namecalling of others silently than calling
> more attention to it and replacing it with "bark bark bark!"

See above -- my intent was to draw attention to the fact that the posts 
in question contained unproductive, information-free invective. I had 
hoped (in vain, it is now clear) that those responsible would, on 
realizing how their words were being received, mend their ways and make 
their own efforts to stick to the technical subject matter while leaving 
their personal opinions of other people out of it.

It's similar to the technique used on problem children of taping their 
temper tantrums and then playing them back later to embarrass them into 
being better behaved in the future. I have used that, a time or two, on 
my own little ones. It's usually somewhat effective. But it wasn't very 
effective when used on Arne. :)

> whereas almost everyone else (except Tom and maybe Arved) has been
> caught flaming.

At this point I think it's safe to say that *everyone* in this thread has 
been caught flaming, including me. Though not all to the same extent, or 
anywhere close.

> I'd say they clearly ceded Ken the moral high ground by doing
> so.

I'd agree that they did so by flaming *first*, by intentionally 
escalating, and by flaming more, and more viciously, than I ever did.

> On the technical merits, call it a draw.

No freaking way. Arne is dead wrong.

> Ken makes a valid point that these files can be used to store
> information that will not be representable in common in-memory string
> formats, including java.lang.String;

Good, good.

> a valid counterpoint is

nonexistent, so far as I am aware.

> that this would be a quite unusual use of the files, and in non-
> pathological cases

Arne was concerned that my suggestion for the OP's log-peeking code would 
fail on obscure pathological corner cases like obsolete mainframe 
operating systems that nobody's ever likely to run his program on. It's 
only fair that I be concerned that his own claims about what constitutes 
a "text file" fail in obscure pathological corner cases.

If he's right to be concerned that a wacky OS nobody much uses will break 
the OP's code, then he's wrong to consider those weird file formats to be 
"text files" because certain obscure use cases for those files will be 
broken by normal text processing tools. He can't have it both ways, and 
that he was trying to marks him as a hypocrite.

> One could even make the argument that Ken's argument is analogous to
> "because JPEG compression would clobber information steganographically
> encoded in the low bits of the color channels of the pixels of a
> raster, JPEG and other lossy non-raster image formats aren't true
> picture files".

Lossy compression is not expected to be exact. Reading a file into a text 
editor and saving a copy back out again *is*.
0
Reply kwesson (107) 3/12/2011 4:22:07 AM

On Mon, 28 Feb 2011 10:09:48 -0800, Mike Schilling wrote:

> "A Lurker" <not.gonna.tell@you.com> wrote in message
> news:ikgiqp$sk6$1@speranza.aioe.org...
>>
>> On the technical merits, call it a draw. By Ken's fairly sensible
>> definition of text files, the record-oriented files on the mainframes
>> aren't text files.
> 
> Sure they are.

Bullshit.

> In fact, they were the traditional way to represent text before the
> Unix "a file is nothing but a sequence of bytes, with any further
> structure imposed on it by the applications that use it" philosophy
> became so prevalent.

In other words, we struggled for years with no true text file formats 
before someone finally sparked a revolution in filesystem design every 
bit as important as the notion that code and data are all the same stuff 
of information, and, before that, the very concept of the stored program 
itself.

The fact is, this "Unix" way is clearly vastly superior to what came 
before. For one thing, it allows us to actually have proper, true text 
files.

> Explicit records are isomorphic with records delimited by characters
> (CR, LF, or CR/LF being the popular choices).

That's mathematical balderdash. Explicit ASCII records, for example, are 
representable as strings on the set {0-127} + {<record boundary>} but not 
as strings on the set {0-127} whereas ASCII "records delimited by 
characters (CR, LF, or CR/LF)" are representable as strings on the set 
{0-127}.

The former set is strictly larger than the latter, so not isomorphic as 
to string algebra. (Strictly speaking, as simple mathematical sets they 
are isomorphic, both being isomorphic furthermore to the set of positive 
integers, as these all have the same cardinal, but that's the ability to 
encode via escape sequences or binhexing or the like sneaking in through 
the backdoor there.)
0
Reply kwesson (107) 3/12/2011 4:27:36 AM

Ken Wesson <kwesson@gmail.com> wrote:
> On Mon, 28 Feb 2011 01:45:10 -0600, Leif Roar Moldskred wrote:
>> 
>> And an almost fanatical devotion to the pope.
>> 
>> Maybe you should come in again?
> 
> Non sequitur.

Well, it is _now_ after you cut out the relevant part of my reply and only
left the Monty Python reference.

How's this for relevance? Some people have delusions of grandeur. Others,
of competence.

(Mind, I don't mean that as an ad homien. I'm not saying that you're wrong 
because  you're a nincompoop. I'm saying that you're wrong _and_ you're a
nincompoop.)

-- 
Leif Roar Moldskred
0
Reply leifm1143 (162) 3/12/2011 5:51:37 AM

On Fri, 11 Mar 2011 23:51:37 -0600, Leif Roar Moldskred wrote:

> Ken Wesson <kwesson@gmail.com> wrote:
>> On Mon, 28 Feb 2011 01:45:10 -0600, Leif Roar Moldskred wrote:
>>> 
>>> And an almost fanatical devotion to the pope.
>>> 
>>> Maybe you should come in again?
>> 
>> Non sequitur.
> 
> Well, it is

Thought so.

> How's this for relevance? Some people have delusions of grandeur.
> Others, of competence.

You appear to have delusions of relevance.

> (Mind, I don't mean that as an ad homien.

Delusions of spelling ability too, perhaps.

> I'm not saying that you're wrong because  you're a nincompoop. I'm
> saying that you're wrong _and_ you're a nincompoop.)

Bark bark bark bark.

Delusions of having a marked territory to defend as well. But that one 
you seem to share with some others here.

Oh, by the way, your last sentence also means you're wrong. The jury is 
still out on whether you're also a nincompoop.
0
Reply kwesson (107) 3/12/2011 6:53:02 AM

Ken Wesson <kwesson@gmail.com> wrote:
> 
> Thought so.
> 

Peer, du lyver!

-- 
Leif Roar 
0
Reply leifm1143 (162) 3/12/2011 7:43:26 AM

On Sat, 12 Mar 2011 01:43:26 -0600, Leif Roar Moldskred wrote:

> Peer, du lyver!

Error: undefined symbols "du" and "lyver".

Syntax error, line 1.

BUILD FAILED.

0
Reply kwesson (107) 3/12/2011 7:06:37 PM

On Mar 12, 3:16=A0am, Ken Wesson <kwes...@gmail.com> wrote:
> On Sun, 27 Feb 2011 11:01:35 -0500, Arne Vajh=F8j wrote:
> > You can easily verify be checking phones from the big producers like
> > Nokia.
>
> What?
>
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0But even if there was some truth to tha=
t claim, where
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0would
> >> your typical user *get* these apps?
>
> > Via the phones browser and HTTP from any web server.
>
> That's web site applets, not client side applications. What are you doing
> presuming to be an expert (rather than a clueless newbie) in
> comp.lang.java.programmer if you can't even grasp the difference between
> Java applets and Java applications?

Ken - I've got to ask.  Do you really believe what you write, or is it
some subtle experiment? In a few weeks will you be saying "Ha! Gotcha!
You all thought I was serious, didn't you?"

Oh, and please, reply to my post, replacing all of my words with
"bark". I'm just itching to find out what _that's_ all about. Some
carefully thought out psychological experiment, I would guess.
0
Reply paul.cager (63) 3/14/2011 1:50:58 AM

On Sun, 13 Mar 2011 18:50:58 -0700, Paul Cager wrote:

> Ken - I've got to ask.  Do you really believe what you write, or is it
> some subtle experiment? In a few weeks will you be saying "Ha! Gotcha!
> You all thought I was serious, didn't you?"
> 
> Oh, and please, reply to my post, replacing all of my words with "bark".
> I'm just itching to find out what _that's_ all about. Some carefully
> thought out psychological experiment, I would guess.

And another paranoiac outs himself.
0
Reply kwesson (107) 3/14/2011 3:45:06 AM


"Ken Wesson" <kwesson@gmail.com> wrote in message 
news:4d7d8f42$1@news.x-privat.org...
> On Sun, 13 Mar 2011 18:50:58 -0700, Paul Cager wrote:
>
>> Ken - I've got to ask.  Do you really believe what you write, or is it
>> some subtle experiment? In a few weeks will you be saying "Ha! Gotcha!
>> You all thought I was serious, didn't you?"
>>
>> Oh, and please, reply to my post, replacing all of my words with "bark".
>> I'm just itching to find out what _that's_ all about. Some carefully
>> thought out psychological experiment, I would guess.
>
> And another paranoiac outs himself.

The other day, I saw a couple dozen posts from you, all the ones I looked at 
(admittedly, not nearly all) were you defending yourself against attacks 
from a variety of sources.  Why, if everyone here is so crazy and vicious, 
do you not simply find another place to discuss Java and at most send one 
last "Screw yall!"? 

0
Reply mscottschilling (1976) 3/14/2011 4:20:55 AM

On Sun, 13 Mar 2011 21:20:55 -0700, Mike Schilling wrote:

> The other day, I saw a couple dozen posts from you, all the ones I
> looked at (admittedly, not nearly all) were you defending yourself
> against attacks from a variety of sources.  Why, if everyone here is so
> crazy and vicious, do you not simply find another place to discuss Java
> and at most send one last "Screw yall!"?

Perhaps I actually hold out hope that some people will see reason. Also, 
a parting shot and then a departure seems like an awfully childish way to 
respond. I'd much rather this ... disagreement could be resolved in a 
more adult fashion. Failing that, I'd much rather that any childish exits 
be by other people first. :)
0
Reply kwesson (107) 3/14/2011 4:31:17 AM

"Ken Wesson" <kwesson@gmail.com> wrote in message 
news:4d7d9a15$1@news.x-privat.org...
> On Sun, 13 Mar 2011 21:20:55 -0700, Mike Schilling wrote:
>
>> The other day, I saw a couple dozen posts from you, all the ones I
>> looked at (admittedly, not nearly all) were you defending yourself
>> against attacks from a variety of sources.  Why, if everyone here is so
>> crazy and vicious, do you not simply find another place to discuss Java
>> and at most send one last "Screw yall!"?
>
> Perhaps I actually hold out hope that some people will see reason.

Usenet flame wars eventually die out from lack of interest, but never in my 
decades of observing them have resulted in victory for either side.  If 
that's what you're waiting for, give up now.  It saves time. And there's no 
better way to get a permanent reputation as a loon than to continue them 
long after everyone else has lost interest.

Just a bit of free advice, probably worth every penny. And I won't be 
responding to you again until the discussion is something on-topic. 

0
Reply mscottschilling (1976) 3/14/2011 4:45:54 AM

On Sun, 13 Mar 2011 21:45:54 -0700, Mike Schilling wrote:

>> Perhaps I actually hold out hope that some people will see reason.
> 
> Usenet flame wars eventually die out from lack of interest, but never in
> my decades of observing them have resulted in victory for either side. 
> If that's what you're waiting for, give up now.  It saves time. And
> there's no better way to get a permanent reputation as a loon than to
> continue them long after everyone else has lost interest.

Mental note to self: once Arne and the others have given up, do not spend 
the next N months perpetually following up to my own postings. :)
0
Reply kwesson (107) 3/14/2011 8:26:48 AM

In article <slrnimnbms.f41.bcd@decibel.pvv.ntnu.no>,
Bent C Dalager  <bcd@pvv.ntnu.no> wrote:
> On 2011-02-28, Lew <noone@lewscanon.com> wrote:
> > Leif Roar Moldskred wrote:
> >> Ken Wesson<kwesson@gmail.com>  wrote:
> >>>
> >>> I remember when this newsgroup used to actually be about Java programming.
> >
> > Up until "Ken Wesson" started trolling.
> 
> I still think people should have let me keep containing him in the
> Great SWT Program thread. It may have been ugly, but at least it was
> only ugly in one single thread which could then be killfiled.

Or you could have done whatever was necessary to follow the
discussion into the other group into which that thread ended up
being redirected.  I did what I could, as did a couple of other
semi-regulars, but I guess eventually we all just ran out of steam.
If we'd had one more ally ....  Eh, probably not.  

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.
0
Reply blmblm (1187) 3/16/2011 8:26:51 PM

In article <4d7b184e$1@news.x-privat.org>,
Ken Wesson  <kwesson@gmail.com> wrote:
> On Fri, 11 Mar 2011 23:51:37 -0600, Leif Roar Moldskred wrote:
> 
> > Ken Wesson <kwesson@gmail.com> wrote:
> >> On Mon, 28 Feb 2011 01:45:10 -0600, Leif Roar Moldskred wrote:
> >>> 
> >>> And an almost fanatical devotion to the pope.
> >>> 
> >>> Maybe you should come in again?
> >> 
> >> Non sequitur.
> > 
> > Well, it is
> 
> Thought so.

Heh.  Quoting only some of what you're replying to [*], in a way that
misrepresents it, at least to some extent?  Now *that* brings back
some unhappy memories.

[*] Full sentence:

>> Well, it is _now_ after you cut out the relevant part of my reply and
>> only left the Monty Python reference.

[ snip ]

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.
0
Reply blmblm (1187) 3/16/2011 8:30:23 PM

On Wed, 16 Mar 2011 20:26:51 +0000, blmblm@myrealbox.com wrote:

> In article <slrnimnbms.f41.bcd@decibel.pvv.ntnu.no>, Bent C Dalager 
> <bcd@pvv.ntnu.no> wrote:
>> On 2011-02-28, Lew <noone@lewscanon.com> wrote:
>> > Leif Roar Moldskred wrote:
>> >> Ken Wesson<kwesson@gmail.com>  wrote:
>> >>>
>> >>> I remember when this newsgroup used to actually be about Java
>> >>> programming.
>> >
>> > Up until "Ken Wesson" started trolling.

Which never happened.

>> I still think people should have let me keep containing him in the
>> Great SWT Program thread. It may have been ugly, but at least it was
>> only ugly in one single thread which could then be killfiled.
> 
> Or you could have done whatever was necessary to follow the discussion
> into the other group into which that thread ended up being redirected. 
> I did what I could, as did a couple of other semi-regulars, but I guess
> eventually we all just ran out of steam. If we'd had one more ally .... 
> Eh, probably not.

I'd still like to know what the heck you're talking about here. What 
discussion? A Google search finds a thread with the name "Great SWT 
Program" in this group, but it's several years old and won't load for me
[1].

[1] http://groups.google.com/group/comp.lang.java.programmer/
browse_thread/thread/ce27f65ea7256d97/bf270adda3877a0f?#bf270adda3877a0f

Topic not found

We're sorry, but we were unable to find the topic you were looking for. 
Perhaps the URL you clicked on is out of date or broken?
0
Reply kwesson (107) 3/17/2011 10:24:00 PM

On Wed, 16 Mar 2011 20:30:23 +0000, blmblm@myrealbox.com wrote:

> In article <4d7b184e$1@news.x-privat.org>, Ken Wesson 
> <kwesson@gmail.com> wrote:
>> On Fri, 11 Mar 2011 23:51:37 -0600, Leif Roar Moldskred wrote:
>> 
>> > Ken Wesson <kwesson@gmail.com> wrote:
>> >> On Mon, 28 Feb 2011 01:45:10 -0600, Leif Roar Moldskred wrote:
>> >>> 
>> >>> And an almost fanatical devotion to the pope.
>> >>> 
>> >>> Maybe you should come in again?
>> >> 
>> >> Non sequitur.
>> > 
>> > Well, it is
>> 
>> Thought so.
> 
> Heh.  Quoting only some of what you're replying to [*], in a way that
> misrepresents it, at least to some extent?  Now *that* brings back some
> unhappy memories.
> 
> [*] Full sentence:
> 
>>> Well, it is _now_ after you cut out the relevant part of my reply and
>>> only left the Monty Python reference.

As with the other time you implied this sort of dishonesty on my part, I 
was merely trimming what I wasn't responding to. The out-of-left-field 
reference to fanatical Catholics in the middle of a discussion of text 
file formats was a non sequitur and, as such, cast the entire rest of the 
poster's reasoning process into significant question, to my mind at 
least. So, I quoted the apparent non sequitur and called it out as such.
0
Reply kwesson (107) 3/17/2011 10:27:35 PM

In article <4d828a00$1@news.x-privat.org>,
Ken Wesson  <kwesson@gmail.com> wrote:
> On Wed, 16 Mar 2011 20:26:51 +0000, blmblm@myrealbox.com wrote:
> 
> > In article <slrnimnbms.f41.bcd@decibel.pvv.ntnu.no>, Bent C Dalager 
> > <bcd@pvv.ntnu.no> wrote:
> >> On 2011-02-28, Lew <noone@lewscanon.com> wrote:
> >> > Leif Roar Moldskred wrote:
> >> >> Ken Wesson<kwesson@gmail.com>  wrote:
> >> >>>
> >> >>> I remember when this newsgroup used to actually be about Java
> >> >>> programming.
> >> >
> >> > Up until "Ken Wesson" started trolling.
> 
> Which never happened.
> 
> >> I still think people should have let me keep containing him in the
> >> Great SWT Program thread. It may have been ugly, but at least it was
> >> only ugly in one single thread which could then be killfiled.
> > 
> > Or you could have done whatever was necessary to follow the discussion
> > into the other group into which that thread ended up being redirected. 
> > I did what I could, as did a couple of other semi-regulars, but I guess
> > eventually we all just ran out of steam. If we'd had one more ally .... 
> > Eh, probably not.
> 
> I'd still like to know what the heck you're talking about here. What 
> discussion? A Google search finds a thread with the name "Great SWT 
> Program" in this group, but it's several years old and won't load for me
> [1].
> 
> [1] http://groups.google.com/group/comp.lang.java.programmer/
> browse_thread/thread/ce27f65ea7256d97/bf270adda3877a0f?#bf270adda3877a0f
> 
> Topic not found
> 
> We're sorry, but we were unable to find the topic you were looking for. 
> Perhaps the URL you clicked on is out of date or broken?

Yes, the discussion (thread) in question occurred several years ago.
Some people have long memories, I guess, and it *was* rather
memorable; I think by the time the regulars finally convinced
everyone to take the discussion elsewhere the total number of posts
was in the thousands.

The URL you posted doesn't work for me either (though I got a
different error message, which I was too lazy to record), and
an "advanced search" of Google's Usenet archives for posts in
comp.lang.java.programmer with subject "great swt program" returns
three hits, one in a different newsgroup and two that must represent
subsets of the whole thread, since between them they total only
80-something posts.  One would like to hope that the whole thread
exists in Google's archives, but whether it's possible to retrieve
it, who knows.

Cue usual complaints about GG, I guess.  I'm grateful that someone
took over when DejaNews folded, but -- well, maybe if they had
focused more on the archival aspects than on providing a posting
interface?  which is a whole other target for griping, but -- eh.
"Worth at least I paid for it", maybe.

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.
0
Reply blmblm 3/23/2011 12:05:24 PM

In article <4d828ad7@news.x-privat.org>, Ken Wesson  <kwesson@gmail.com> wrote:
> On Wed, 16 Mar 2011 20:30:23 +0000, blmblm@myrealbox.com wrote:
> 
> > In article <4d7b184e$1@news.x-privat.org>, Ken Wesson 
> > <kwesson@gmail.com> wrote:
> >> On Fri, 11 Mar 2011 23:51:37 -0600, Leif Roar Moldskred wrote:
> >> 
> >> > Ken Wesson <kwesson@gmail.com> wrote:
> >> >> On Mon, 28 Feb 2011 01:45:10 -0600, Leif Roar Moldskred wrote:
> >> >>> 
> >> >>> And an almost fanatical devotion to the pope.
> >> >>> 
> >> >>> Maybe you should come in again?
> >> >> 
> >> >> Non sequitur.
> >> > 
> >> > Well, it is
> >> 
> >> Thought so.
> > 
> > Heh.  Quoting only some of what you're replying to [*], in a way that
> > misrepresents it, at least to some extent?  Now *that* brings back some
> > unhappy memories.
> > 
> > [*] Full sentence:
> > 
> >>> Well, it is _now_ after you cut out the relevant part of my reply and
> >>> only left the Monty Python reference.
> 
> As with the other time you implied this sort of dishonesty on my part, I 
> was merely trimming what I wasn't responding to. 

Without any indication [*] that you had trimmed anything.  An 
indication that you had trimmed something would go a long way
toward convincing me that your motives were pure.  Absent that --
well, perhaps I'm overly suspicious as a result of a long and
contentious discussion some years ago with someone you rather
remind me of.

[*] "[ .... ]" or "[ snip ]" or something equivalent.

> The out-of-left-field 
> reference to fanatical Catholics in the middle of a discussion of text 
> file formats was a non sequitur and, as such, cast the entire rest of the 
> poster's reasoning process into significant question, to my mind at 
> least. So, I quoted the apparent non sequitur and called it out as such.

In a way that, in my opinion, made it appear that Leif was agreeing
more than he really was.  And indeed, his complaint ("it is _now"_)
seems to have been about your quoting only part of his previous post.
I'll admit that in the context of what he actually said:

>>>> Don't be ridiculous. There are exactly two app stores out there so far        
>>>> for phones: the Android Market and the Apple App Store.

>> And Nokia's Ovi Store. And Sony-Ericsson's eStore. And an almost
>> fanatical devotion to the pope.

it's not entirely clear to me how the last sentence (fragment) fits,
but I'm guessing that's because I don't have a clear enough memory of
some Monty Python movie or skit.

Well, whatever.  This is almost surely not a useful tangent and I'll
try not to pursue it past this post.

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.
0
Reply blmblm 3/23/2011 12:06:34 PM

On Wed, 23 Mar 2011 12:05:24 +0000, blmblm@myrealbox.com wrote:

....

> The URL you posted doesn't work for me either (though I got a different
> error message, which I was too lazy to record), and an "advanced search"
> of Google's Usenet archives for posts in comp.lang.java.programmer with
> subject "great swt program" returns three hits, one in a different
> newsgroup and two that must represent subsets of the whole thread, since
> between them they total only 80-something posts.  One would like to hope
> that the whole thread exists in Google's archives, but whether it's
> possible to retrieve it, who knows.

One would hope? Why? From the sounds of it, it wasn't a very useful 
thread. Whatever it was. :)
0
Reply Ken 3/23/2011 12:15:55 PM

On Wed, 23 Mar 2011 12:06:34 +0000, blmblm@myrealbox.com wrote:

> In article <4d828ad7@news.x-privat.org>, Ken Wesson  <kwesson@gmail.com>
> wrote:
>> As with the other time you implied this sort of dishonesty on my part,
>> I was merely trimming what I wasn't responding to.
> 
> Without any indication [*] that you had trimmed anything.  An indication
> that you had trimmed something would go a long way toward convincing me
> that your motives were pure.  Absent that -- well, perhaps I'm overly
> suspicious as a result of a long and contentious discussion some years
> ago with someone you rather remind me of.
> 
> [*] "[ .... ]" or "[ snip ]" or something equivalent.

I've never tended to bother much with that. Seemed like a waste of 
bandwidth.

>> The out-of-left-field
>> reference to fanatical Catholics in the middle of a discussion of text
>> file formats was a non sequitur and, as such, cast the entire rest of
>> the poster's reasoning process into significant question, to my mind at
>> least. So, I quoted the apparent non sequitur and called it out as
>> such.
> 
> In a way that, in my opinion, made it appear that Leif was agreeing more
> than he really was.

I don't think so. "Agreeing" isn't how I would describe "and a fanatical 
devotion to the pope", given that the topic had had nothing to do with 
religion. Nor "disagreeing".

>>>>> Don't be ridiculous. There are exactly two app stores out there so
>>>>> far for phones: the Android Market and the Apple App Store.
> 
>>> And Nokia's Ovi Store. And Sony-Ericsson's eStore. And an almost
>>> fanatical devotion to the pope.
> 
> it's not entirely clear to me how the last sentence (fragment) fits, but
> I'm guessing that's because I don't have a clear enough memory of some
> Monty Python movie or skit.

Well, they were given to absurdism, and mentioning two alleged "app 
stores" that nobody's ever heard of before seems to fit in with absurdism 
I suppose -- either they don't exist or they're small, obscure, and 
little-used and don't really count anyway. Only the Android Market and 
Apple Store come close to qualifying for household-name status.
0
Reply Ken 3/23/2011 12:19:37 PM

In article <4d89e47b$1@news.x-privat.org>,
Ken Wesson  <kwesson@gmail.com> wrote:
> On Wed, 23 Mar 2011 12:05:24 +0000, blmblm@myrealbox.com wrote:
> 
> ...
> 
> > The URL you posted doesn't work for me either (though I got a different
> > error message, which I was too lazy to record), and an "advanced search"
> > of Google's Usenet archives for posts in comp.lang.java.programmer with
> > subject "great swt program" returns three hits, one in a different
> > newsgroup and two that must represent subsets of the whole thread, since
> > between them they total only 80-something posts.  One would like to hope
> > that the whole thread exists in Google's archives, but whether it's
> > possible to retrieve it, who knows.
> 
> One would hope? Why? From the sounds of it, it wasn't a very useful 
> thread. Whatever it was. :)

Eh.  I'll pass on the implicit question in "whatever it was".

As for why I hope it's archived -- well, I'm of the opinion that
if Usenet posts are to be archived at all, the goal should probably
be to archive all of them, since I doubt one could get universal
agreement on what to include and what to leave out.  This particular
thread was for the most part wildly off-topic in a Java newsgroup,
but it had its interesting aspects.

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.
0
Reply blmblm 3/23/2011 1:00:04 PM

In article <4d89e559$1@news.x-privat.org>,
Ken Wesson  <kwesson@gmail.com> wrote:
> On Wed, 23 Mar 2011 12:06:34 +0000, blmblm@myrealbox.com wrote:
> 
> > In article <4d828ad7@news.x-privat.org>, Ken Wesson  <kwesson@gmail.com>
> > wrote:
> >> As with the other time you implied this sort of dishonesty on my part,
> >> I was merely trimming what I wasn't responding to.
> > 
> > Without any indication [*] that you had trimmed anything.  An indication
> > that you had trimmed something would go a long way toward convincing me
> > that your motives were pure.  Absent that -- well, perhaps I'm overly
> > suspicious as a result of a long and contentious discussion some years
> > ago with someone you rather remind me of.
> > 
> > [*] "[ .... ]" or "[ snip ]" or something equivalent.
> 
> I've never tended to bother much with that. Seemed like a waste of 
> bandwidth.

YMMV, maybe.  I'd rather put in the extra characters (only,
what, eight of them including spaces before and after?) than have
someone think I'm misrepresenting what I'm responding to by taking
something out of context.

> >> The out-of-left-field
> >> reference to fanatical Catholics in the middle of a discussion of text
> >> file formats was a non sequitur and, as such, cast the entire rest of
> >> the poster's reasoning process into significant question, to my mind at
> >> least. So, I quoted the apparent non sequitur and called it out as
> >> such.
> > 
> > In a way that, in my opinion, made it appear that Leif was agreeing more
> > than he really was.
> 
> I don't think so. "Agreeing" isn't how I would describe "and a fanatical 
> devotion to the pope", given that the topic had had nothing to do with 
> religion. Nor "disagreeing".

Say what?

Leif wrote "It is _now_ .... ", and you quoted only "It is", which to
me is a stronger statement of agreement.

> >>>>> Don't be ridiculous. There are exactly two app stores out there so
> >>>>> far for phones: the Android Market and the Apple App Store.
> > 
> >>> And Nokia's Ovi Store. And Sony-Ericsson's eStore. And an almost
> >>> fanatical devotion to the pope.
> > 
> > it's not entirely clear to me how the last sentence (fragment) fits, but
> > I'm guessing that's because I don't have a clear enough memory of some
> > Monty Python movie or skit.

I was curious enough to Google "Monty Python" and "fanatical
devotion to the pope".  It's from the Spanish Inquisition skit,
and the phrase ("fanatical devotion ....") appears as part of a
list that keeps growing.  So I'm guessing the relevance here is
the list that keeps growing.  Still a bit of a ....  Nope, not
going to give you something you can quote out of context.  

> Well, they were given to absurdism, and mentioning two alleged "app 
> stores" that nobody's ever heard of before seems to fit in with absurdism 
> I suppose -- either they don't exist or they're small, obscure, and 
> little-used and don't really count anyway. Only the Android Market and 
> Apple Store come close to qualifying for household-name status.

-- 
B. L. Massingill
ObDisclaimer:  I don't speak for my employers; they return the favor.
0
Reply blmblm 3/23/2011 1:01:09 PM