How do you code a string literal which spans lines?

  • Follow


C allows adjacent string literals such as

    "This is "
    "a long "
    "string"

to be seen as one string. This wouldn't fit my syntax so I need another 
way to allow long string literals. How do you guys allow this in your 
languages? Some options I have in mind follow. Is there a better - 
clearer - way? What would you recommend or prefer to use as a 
programmer?

    "This is \
    a long \
    string"

    "This is " +
    "a long " +
    "string"

    """This is
a long
string""" 


0
Reply James 6/25/2005 12:46:28 PM

"James Harris" <no.email.please> writes:

> C allows adjacent string literals such as
>
>     "This is "
>     "a long "
>     "string"
>
> to be seen as one string. This wouldn't fit my syntax so I need another 
> way to allow long string literals. How do you guys allow this in your 
> languages?

SML and Haskell do it thus:
   "This is \
   \a long \
   \string"
Whitespace between a pair of \'s is skipped.

My language Kogut does it thus:
   "This is \
   a long \
   string"
Leading indentation after \-newline is skipped. For rare cases when
you insist on starting a continuation line with a space, \s is space,
but usually it's enough to break after a space.

In Kogut when you write thus:
"   This is
   a long
   string
"
then newlines become a part of the string, and this time leading
spaces are not removed if present, so large chunks of text can be
embedded almost directly in the program, and only require escaping of
", \, TAB etc.).

In SML and Haskell literal newlines in strings are invalid; you
usually write \n\ at the end of the previous line and \ at the
beginning of the next one.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
0
Reply Marcin 6/25/2005 1:03:03 PM


"Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> wrote in message 
news:874qbm612w.fsf@qrnik.zagroda...
> "James Harris" <no.email.please> writes:
>
>> C allows adjacent string literals such as
>>
>>     "This is "
>>     "a long "
>>     "string"
>>
>> to be seen as one string. This wouldn't fit my syntax so I need 
>> another
>> way to allow long string literals. How do you guys allow this in your
>> languages?
>
> SML and Haskell do it thus:
>   "This is \
>   \a long \
>   \string"
> Whitespace between a pair of \'s is skipped.
>
> My language Kogut does it thus:
>   "This is \
>   a long \
>   string"
> Leading indentation after \-newline is skipped. For rare cases when
> you insist on starting a continuation line with a space, \s is space,
> but usually it's enough to break after a space.
>
> In Kogut when you write thus:
> "   This is
>   a long
>   string
> "
> then newlines become a part of the string, and this time leading
> spaces are not removed if present, so large chunks of text can be
> embedded almost directly in the program, and only require escaping of
> ", \, TAB etc.).
>
> In SML and Haskell literal newlines in strings are invalid; you
> usually write \n\ at the end of the previous line and \ at the
> beginning of the next one.
>
> -- 
>   __("<         Marcin Kowalczyk
>   \__/       qrczak@knm.org.pl
>    ^^     http://qrnik.knm.org.pl/~qrczak/ 


0
Reply James 6/25/2005 1:07:54 PM

"Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> wrote in message 
news:874qbm612w.fsf@qrnik.zagroda...
> "James Harris" <no.email.please> writes:
<snip>
> SML and Haskell do it thus:
>   "This is \
>   \a long \
>   \string"
> Whitespace between a pair of \'s is skipped.
>
> My language Kogut does it thus:
>   "This is \
>   a long \
>   string"
> Leading indentation after \-newline is skipped. For rare cases when
> you insist on starting a continuation line with a space, \s is space,
> but usually it's enough to break after a space.
>
> In Kogut when you write thus:
> "   This is
>   a long
>   string
> "
> then newlines become a part of the string, and this time leading
> spaces are not removed if present, so large chunks of text can be
> embedded almost directly in the program, and only require escaping of
> ", \, TAB etc.).
>
> In SML and Haskell literal newlines in strings are invalid; you
> usually write \n\ at the end of the previous line and \ at the
> beginning of the next one.

Thanks Marcin. I thought you might respond. My favourite so far is how 
you allow continuations in Kogut. Unlike Unix which says that the pair 
\<newline> gets stripped my thought is to say that if \ is the last 
nonblank of a line then anything between it and the first nonblank of 
the next line gets stripped. So

    "Dopple\
    ganger"

would be one word where

    "Dopple \
    ganger"

would be two words. I can't think of a reason for needing to begin the 
following line with a space. When do you need that?

The difficulty I can see with using a trailing backslash is in 
compilation - specifically the lexical phase. Do these long strings 
become one token or one-per-line? If one token then their line number 
and character offset would have to apply to the beginning of the token I 
guess. Then if the string is not closed properly where does the error 
get reported?



0
Reply James 6/25/2005 1:23:02 PM

"James Harris" <no.email.please> writes:

> Unlike Unix which says that the pair \<newline> gets stripped my
> thought is to say that if \ is the last nonblank of a line then
> anything between it and the first nonblank of the next line gets
> stripped.

Yes, I didn't mention that but this is actually how it behaves in
Kogut.

> I can't think of a reason for needing to begin the following line
> with a space. When do you need that?

A drawback of these rules is that all ways to split "foo\n bar"
(assuming that foo and bar are long texts) have a glitch:

      "foo
 bar"   // has a fixed indentation on the second line

      "foo\n\
      \sbar"   // requires escaping the space

      "foo\n \
      bar"   // the split is almost at the newline, but not quite

      "foo\
      \n bar"   // same again; a newline should belong to the previous line

> The difficulty I can see with using a trailing backslash is in
> compilation - specifically the lexical phase. Do these long strings
> become one token or one-per-line?

One token.

> If one token then their line number and character offset would have
> to apply to the beginning of the token I guess.

Yes.

> Then if the string is not closed properly where does the error get
> reported?

At the beginning of the string. Well, even if the fragments would
somehow be lexed separately, it would not help with this - perhaps
the actual error is way before the compiler detected it, so showing
a position which is closrer to the point of the detection wouldn't
have to be better.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
0
Reply Marcin 6/25/2005 1:49:18 PM

"Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> wrote in message 
news:87zmtepmw1.fsf@qrnik.zagroda...
> "James Harris" <no.email.please> writes:
>
<snip>
>> I can't think of a reason for needing to begin the following line
>> with a space. When do you need that?
>
> A drawback of these rules is that all ways to split "foo\n bar"
> (assuming that foo and bar are long texts) have a glitch:
>
>      "foo
> bar"   // has a fixed indentation on the second line
>
>      "foo\n\
>      \sbar"   // requires escaping the space
>
>      "foo\n \
>      bar"   // the split is almost at the newline, but not quite
>
>      "foo\
>      \n bar"   // same again; a newline should belong to the previous 
> line

That makes a lot of sense. Option 3 is clear but maybe not as 'bonny' as 
it could be. I can see why you allow \s.


0
Reply James 6/25/2005 2:13:23 PM

On Sat, 25 Jun 2005 12:46:28 -0000, "James Harris" <no.email.please>
wrote:

>
>C allows adjacent string literals such as
>
>    "This is "
>    "a long "
>    "string"
>
>to be seen as one string. This wouldn't fit my syntax so I need another 
>way to allow long string literals. How do you guys allow this in your 
>languages? Some options I have in mind follow. Is there a better - 
>clearer - way? What would you recommend or prefer to use as a 
>programmer?
>
>    "This is \
>    a long \
>    string"
>
>    "This is " +
>    "a long " +
>    "string"
>
>    """This is
>a long
>string""" 

An alternative is to have a multi-line text construct.  Here is an
example of what it might look like:

	text s =
	{
		This is
		a long
		string
	}

There are sundry issues to be dealt with in this scheme, e.g. trailing
white space, leading white space, special characters, etc.  The scheme
I am using in San looks much like this:

	begin text s
		|This is
		| a long
		| string.
		end

The first character in the text body is a formatting control
character.  There are two such characters at present, : and |.  The
vertical bar says that function evaluation and string substitution is
turned on.  The colon says that it is turned off.  The string text
within a line begins immediately after the control character and ends
with the last non-white space character.  Splicing is done without
intervening characters, e.g., no EOL characters are inserted.

The main features of the scheme are (a) it creates an assignment and
(b) everything within the text body is prefix delimited.



Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.
0
Reply cri 6/26/2005 3:26:32 PM

James Harris wrote:
> C allows adjacent string literals such as
> 
>     "This is "
>     "a long "
>     "string"
> 
> to be seen as one string. This wouldn't fit my syntax so I need another 
> way to allow long string literals. How do you guys allow this in your 
> languages? Some options I have in mind follow. Is there a better - 
> clearer - way? What would you recommend or prefer to use as a 
> programmer?
> 
>     "This is \
>     a long \
>     string"
> 
>     "This is " +
>     "a long " +
>     "string"
> 
>     """This is
> a long
> string""" 

The Curl language offers a number of different ways to do this:

       "This is " &
       "a long " &
       "string"

simply concatenates the pieces into "This is a long string"

        {stringify
            This is
            a long
            string
        }

Trims whitespace on left but keeps newlines giving "This is\na long\nstring". 
  In Curl this is used primarily when the contents contain a snippet of code.

        {message
            This is
            a long
            string
        }

compresses all adjacent whitespace into a single space: "This is a long 
message".  This form is intended for text.  I am leaving details about,
but it gives you the idea.

The latter two are macros.  If you give your language good macro support
then people can write their own macros to suit their purposes.

- Christopher
0
Reply Christopher 6/27/2005 1:56:04 PM

"James Harris" <no.email.please> wrote in message 
news:42bd5219$0$24082$db0fefd9@news.zen.co.uk...
>
> C allows adjacent string literals such as
>
>    "This is "
>    "a long "
>    "string"
>
my language uses this teqnique, primarily because in many ways it resembles 
c.
my language has a fixed token-length limit, and this teqnique allows parsing 
as multiple tokens and merging in the parser.


> to be seen as one string. This wouldn't fit my syntax so I need another 
> way to allow long string literals. How do you guys allow this in your 
> languages? Some options I have in mind follow. Is there a better - 
> clearer - way? What would you recommend or prefer to use as a programmer?
>
>    "This is \
>    a long \
>    string"
>
>    "This is " +
>    "a long " +
>    "string"
>
>    """This is
> a long
> string"""
I like the first of the three for literals.
one does have to be careful though about how their lexer/parser works, eg, 
there is the potential for buffer overflow.


the second option makes sense more if they are not a literal (or at least 
don't appear as one), eg, the language has string concatenation.
this approach is also possible in my language via 2 operators: + and &, both 
with equivalent behavior in the string+string case, but with differing 
behavior in the string+non-string case. string+int is defined for offset 
operations, other uses of + are undefined.
string&whatever is defined to allways stringify 'whatever'.

"foobar"+3 => "bar"
"foobar"&3 => "foobar3"

as for the third option, it is, imo, ugly.


0
Reply cr88192 6/29/2005 1:07:20 AM

8 Replies
149 Views

(page loaded in 0.152 seconds)

Similiar Articles:













7/16/2012 8:56:38 AM


Reply: