Why Is Escaping Data Considered So Magical?

  • Follow


Just been reading this article
<http://www.theregister.co.uk/2010/06/23/xxs_sql_injection_attacks_testing_remedy/>
which says that a lot of security holes are arising these days because
everybody is concentrating on unit testing of their own particular
components, with less attention being devoted to overall integration
testing.

Fair enough. But it’s disconcerting to see some of the advice being offered
in the reader comments, like “force everyone to use stored procedures”, or
“force everyone to use prepared/parametrized statements”, “never construct
ad-hoc SQL queries” and the like.

I construct ad-hoc queries all the time. It really isn’t that hard to do
safely. All you have to do is read the documentation—for example,
<http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>—and then write a
routine that takes arbitrary data and turns it into a valid string literal,
like this <http://www.codecodex.com/wiki/Useful_MySQL_Routines#Quoting>.

I’ve done this sort of thing for MySQL, for HTML and JavaScript (in both
Python and JavaScript itself), and for Bash. It’s not hard to verify you’ve
done it correctly. It lets you easily create table-updating code like the
following, which makes it so easy to update the code to track changes in the
database structure:

     sql.cursor.execute \
      (
            "update items set "
        +
            ", ".join
                (
                    tuple
                        (
                            "%(name)s = %(value)s"
                        %
                            {
                                "name" : field[0],
                                "value" : SQLString(Params.getvalue
                                  (
                                    "%s[%s]" % (field[1], urllib.quote(modify_id))
                                  ))
                            }
                        for field in
                            (
                                ("class_name", "modify_class"),
                                ("make", "modify_make"),
                                ("model", "modify_model"),
                                ("details", "modify_details"),
                                ("serial_nr", "modify_serial"),
                                ("inventory_nr", "modify_invent"),
                                ("when_purchased", "modify_when_purchased"),
                                ... you get the idea ...
                                ("location_name", "modify_location"),
                                ("comment", "modify_comment"),
                            )
                        )
                +
                    (
                        "last_modified = %d" % int(time.time()),
                    )
                )
        +
            " where inventory_nr = %s" % SQLString(modify_id)
      )

0
Reply Lawrence 6/25/2010 12:25:56 AM

In article <i00t2k$l07$1@lust.ihug.co.nz>,
 Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> wrote:

> I construct ad-hoc queries all the time. It really isn’t that hard to do
> safely. All you have to do is read the documentation

I get worried when people talk about how easy it is to do something 
safely.  Let me suggest a couple of things you might not have considered:

1) Somebody is running your application (or the database server) with 
the locale set to something unexpected.  This might change how numbers, 
dates, currency, etc, get formatted, which could change the meaning of 
your constructed SQL statement.

2) Somebody runs your application with a different PYTHONPATH, which 
causes a different (i.e. malicious) urllib module to get loaded, which 
makes urllib.quote() do something you didn't expect.
 
> I’ve done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash. It’s not hard to verify you’ve
> done it correctly. It lets you easily create table-updating code like the
> following, which makes it so easy to update the code to track changes in the
> database structure:
> 
>      sql.cursor.execute \
>       (
>             "update items set "
>         +
>             ", ".join
>                 (
>                     tuple
>                         (
>                             "%(name)s = %(value)s"
>                         %
>                             {
>                                 "name" : field[0],
>                                 "value" : SQLString(Params.getvalue
>                                   (
>                                     "%s[%s]" % (field[1], 
>                                 urllib.quote(modify_id))
>                                   ))
>                             }
>                         for field in
>                             (
>                                 ("class_name", "modify_class"),
>                                 ("make", "modify_make"),
>                                 ("model", "modify_model"),
>                                 ("details", "modify_details"),
>                                 ("serial_nr", "modify_serial"),
>                                 ("inventory_nr", "modify_invent"),
>                                 ("when_purchased", "modify_when_purchased"),
>                                 ... you get the idea ...
>                                 ("location_name", "modify_location"),
>                                 ("comment", "modify_comment"),
>                             )
>                         )
>                 +
>                     (
>                         "last_modified = %d" % int(time.time()),
>                     )
>                 )
>         +
>             " where inventory_nr = %s" % SQLString(modify_id)
>       )
0
Reply Roy 6/25/2010 1:02:48 AM


On 2010-06-24 21:02:48 -0400, Roy Smith said:

> In article <i00t2k$l07$1@lust.ihug.co.nz>,
>  Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> wrote:
> 
>> I construct ad-hoc queries all the time. It really isn’t that hard to do
>> safely. All you have to do is read the documentation
> 
> I get worried when people talk about how easy it is to do something
> safely.

First: I agree with this. While it's definitely possible to correctly 
escape a given SQL dialect under controlled conditions, it's not at all 
easy to get it right, and the real world is even more unfriendly than 
most people expect. Furthermore there's no reason to do it that way: 
Python's DB API spec effectively requires that placeholder parameters 
of *some* kind exist. Even if you feel the need to construct SQL, you 
can construct it with parameters almost as easily as you can construct 
it with the values baked in.

With that said...

> 2) Somebody runs your application with a different PYTHONPATH, which
> causes a different (i.e. malicious) urllib module to get loaded, which
> makes urllib.quote() do something you didn't expect.

Someone who can manipulate PYTHONPATH or otherwise add code to the 
runtime environment is already in a position to hose your database, 
independently of escaping-related issues. It's up to the sysadmin or 
user to ensure that their environment is sane, and it's on their head 
if they add broken code to a program's runtime environment.

Lawrence D'Oliveiro wrote:

> I'��ve done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash. It’s not hard to verify you’ve
> done it correctly. It lets you easily create table-updating code like the
> following, which makes it so easy to update the code to track changes in the
> database structure:
> 
>      sql.cursor.execute \
>       (
>             "update items set "
>         +
>             ", ".join
>                 (
>                     tuple
>                         (
>                             "%(name)s = %(value)s"
>                         %
>                             {
>                                 "name" : field[0],
>                                 "value" : SQLString(Params.getvalue
>                                   (
>                                     "%s[%s]" % (field[1], 
> urllib.quote(modify_id))
>                                   ))
>                             }
>                         for field in
>                             (
>                                 ("class_name", "modify_class"),
>                                 ("make", "modify_make"),
>                                 ("model", "modify_model"),
>                                 ("details", "modify_details"),
>                                 ("serial_nr", "modify_serial"),
>                                 ("inventory_nr", "modify_invent"),
>                                 ("when_purchased", "modify_when_purchased"),
>                                 ... you get the idea ...
>                                 ("location_name", "modify_location"),
>                                 ("comment", "modify_comment"),
>                             )
>                         )
>                 +
>                     (
>                         "last_modified = %d" % int(time.time()),
>                     )
>                 )
>         +
>             " where inventory_nr = %s" % SQLString(modify_id)
>       )

Why would I write this when SQLAlchemy, even without using its ORM 
features, can do it for me? It even uses the placeholder-generating 
strategy I mentioned above, where possible.

Finally, it's worth noting that MySQL is (almost) the only mainstream 
database that uses escaping for parameterization. PostgreSQL, SQL 
Server, Oracle, DB2, and most other databases support parameters 
natively in their communication protocols: parameters aren't injected 
into the query string, but are sent separately and processed separately 
within the DBMS. This neatly avoids encoding-related and 
quoting-related problems entirely, and it means the type of the 
parameter can be preserved if it's useful.

-o

0
Reply angrybaldguy (338) 6/25/2010 2:43:26 AM

In message <roy-30B881.21024824062010@news.panix.com>, Roy Smith wrote:

> 1) Somebody is running your application (or the database server) with
> the locale set to something unexpected.

Locales are under program control, so that won’t happen.

This is why I use UTF-8 encoding for everything.
0
Reply Lawrence 6/25/2010 3:34:38 AM

In message <2010062422432660794-angrybaldguy@gmailcom>, Owen Jacobson wrote:

> Why would I write this when SQLAlchemy, even without using its ORM
> features, can do it for me?

SQLAlchemy doesn’t seem very flexible. Looking at the code examples 
<http://www.sqlalchemy.org/docs/examples.html>, they’re very procedural: 
build object, then do a string of separate method calls to add data to it. I 
prefer the functional approach, as in my table-update example.
0
Reply Lawrence 6/25/2010 3:38:50 AM

On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:

> Just been reading this article
> ...
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.
> 
> Fair enough. But it’s disconcerting to see some of the advice being
> offered in the reader comments, like “force everyone to use stored
> procedures”, or “force everyone to use prepared/parametrized
> statements”, “never construct ad-hoc SQL queries” and the like.
> 
> I construct ad-hoc queries all the time. It really isn’t that hard to
> do safely.

Wrong.

Even if you get the quoting absolutely correct (which is a very big "if"),
you have to remember to perform it every time, without exception. And you
need to perform it exactly once. As the program gets more complex,
ensuring that it's done in the correct place, and only there, gets harder.

More generally, as a program gets more complex, "this will work so long as
we do X every time without fail" approaches "this won't work".

> All you have to do is read the documentation—for example,
> <http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>—and then
> write a routine that takes arbitrary data and turns it into a valid
> string literal, like this
> <http://www.codecodex.com/wiki/Useful_MySQL_Routines#Quoting>.

That's okay. Provided the documentation is accurate. And provided that you
update the escaping algorithm whenever the SQL dialect gets extended, or
you switch to a different back-end, or modify the program. IOW, it's not
even remotely okay.

"Unparsing" data so that you get the correct answer out of a subsequent
parsing step is objectively and obviously the wrong approach. The
correct approach is to skip both the unparsing and parsing steps
entirely.

Formal grammars are a useful way to represent graph-like data structures
in a human-readable and human-editable form. But for creation,
modification and use by a computer, it is invariably preferable to operate
upon the graph directly. Textual formats inherit all of the "issues" which
apply to the underlying data structure, then add a few of their own for
good measure.

> I've done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash.

And, of course, you're convinced that you got it right every time. That
attitude alone should set alarm bells ringing for anyone who's worked in
this industry for more than five minutes.

0
Reply nobody (4831) 6/25/2010 6:47:47 AM

Nobody <nobody@nowhere.com> writes:
> More generally, as a program gets more complex, "this will work so long as
> we do X every time without fail" approaches "this won't work".

QOTW
0
Reply Paul 6/25/2010 7:09:44 AM

On Fri, 2010-06-25, Lawrence D'Oliveiro wrote:
> Just been reading this article
> <http://www.theregister.co.uk/2010/06/23/xxs_sql_injection_attacks_testing_remedy/>
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.

I don't do SQL and I don't even understand the terminology properly
.... but the discussion around it bothers me.

Do those people really do this?
- accept untrusted user data
- try to sanitize the data (escaping certain characters etc)
- turn this data into executable code (SQL)
- executing it

Like the example in the article

  SELECT * FROM hotels WHERE city = '<untrusted>';

If so, its isomorphic with doing os.popen('zcat -f %s' % untrusted)
in Python (at least on Unix, where 'zcat ...' is executed as a shell
script).

I thought it was well-known that the solution is *not* to try to
sanitize the input -- it's to switch to an interface which doesn't
involve generating an intermediate executable.  In the Python example,
that would be something like os.popen2(['zcat', '-f', '--', untrusted]).

Am I missing something?  If not, I can go back to sleep -- and keep
avoiding SQL and web programming like the plague until that community
has entered the 21st century.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/25/2010 12:15:08 PM

On 6/25/2010 12:09 AM, Paul Rubin wrote:
> Nobody<nobody@nowhere.com>  writes:
>> More generally, as a program gets more complex, "this will work so long as
>> we do X every time without fail" approaches "this won't work".

    Yes.  I was just looking at some of my own code.  Out of about 100
SQL statements, I'd used manual escaping once, in code where the WHERE
clause is built up depending on what information is available for the
search.  It's done properly, using "MySQLdb.escape_string(s)", which
is what's used inside "cursor.execute".  Looking at the code, I
now realize that it would have been better to
add sections to the SQL string with standard escapes, and at the same
time, append the key items to a list.  Then the list can be
converted to a tuple for submission to "cursor.execute".

				John Nagle

0
Reply John 6/25/2010 6:58:51 PM

On Fri, 25 Jun 2010 12:15:08 +0000, Jorgen Grahn wrote:

> I don't do SQL and I don't even understand the terminology properly
> ... but the discussion around it bothers me.
> 
> Do those people really do this?

Yes. And then some.

Among web developers, the median level of programming knowledge amounts to
the first 3 chapters of "Learn PHP in 7 Days".

It doesn't help the the guy who wrote PHP itself wasn't much better.

> - accept untrusted user data
> - try to sanitize the data (escaping certain characters etc)
> - turn this data into executable code (SQL)
> - executing it
> 
> Like the example in the article
> 
>   SELECT * FROM hotels WHERE city = '<untrusted>';

Yep. Search the BugTraq archives for "SQL injection". And most of those
are for widely-deployed middleware; the zillions of bespoke site-specific
scripts are likely to be worse.

Also: http://xkcd.com/327/

> I thought it was well-known that the solution is *not* to try to
> sanitize the input

Well known by anyone with a reasonable understanding of the principles of
programming, but somewhat less well known by the other 98% of web
developers.

> Am I missing something?

There's a world of difference between a skilled chef and the people
flipping burgers for a minimum wage. And between a chartered civil
engineer and the people laying the asphalt. And between what you
probably consider a programmer and the people doing most web development.

> If not, I can go back to sleep -- and keep
> avoiding SQL and web programming like the plague until that community
> has entered the 21st century.

Don't hold your breath.

Of course, there's no fundamental reason why you can't apply sound
practices to web development. Well, other than the fact that you're
competing against an infinite number of (code-) monkeys for lowest-bidder
contracts.

To be fair, it isn't actually limited to web developers. I've seen the
following in scientific code written in C (or, more likely, ported to C
from Fortran) for Unix:

	sprintf(buff, "rm -f %s", filename);
	system(buff);

Why bother learning the Unix API when you already know system()?

0
Reply Nobody 6/25/2010 11:17:47 PM

On Fri, Jun 25, 2010 at 5:17 PM, Nobody <nobody@nowhere.com> wrote:
> To be fair, it isn't actually limited to web developers. I've seen the
> following in scientific code written in C (or, more likely, ported to C
> from Fortran) for Unix:
>
> =A0 =A0 =A0 =A0sprintf(buff, "rm -f %s", filename);
> =A0 =A0 =A0 =A0system(buff);

Tsk, tsk.  And it's so easy to fix, too:

    #define BUFSIZE 1000000
    char buff[BUFSIZE];
    if (snprintf(buff, BUFSIZE, "rm -f %s", filename) >=3D BUFSIZE) {
        printf("No buffer overflow for you!\n");
    } else {
        system(buff);
    }

There, that's much more secure.
0
Reply ian.g.kelly (1155) 6/26/2010 12:25:04 AM

In message <pan.2010.06.25.06.47.34.297000@nowhere.com>, Nobody wrote:

> On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:
> 
>> I construct ad-hoc queries all the time. It really isn’t that hard to
>> do safely.
> 
> Wrong.
> 
> Even if you get the quoting absolutely correct (which is a very big "if"),
> you have to remember to perform it every time, without exception.
> 
> More generally, as a program gets more complex, "this will work so long as
> we do X every time without fail" approaches "this won't work".

That’s a content-free claim. Why? Because it applies equally to everything. 
Replace “quoting” with something like “arithmetic”, and you’ll see what I 
mean:

    Even if you get the arithmetic absolutely correct (which is a very big
    "if"), you have to remember to perform it every time, without exception.

    More generally, as a program gets more complex, "this will work so long
    as we do X every time without fail" approaches "this won't work".

From which we can conclude, according to your logic, that one shouldn’t be 
doing arithmetic.

Next time, try to avoid fallacious arguments.

> And you need to perform it exactly once. As the program gets more complex,
> ensuring that it's done in the correct place, and only there, gets harder.

Nonsense. It only needs to be done at the boundary to the appropriate 
component (MySQL, HTML, JavaScript, whatever). That’s the only place which 
needs to have knowledge of what’s on the other side. Everything else can 
work with arbitrary data without having to worry about such things.

Go back to my example, and you’ll see this: the original updates two dozen 
different fields in a database table, yet it only needs two calls to 
SQLString: one deals with all the fields requiring updating, while the other 
one deals with the key-matching. That’s it. Instead of two dozen different 
places needing checking, you only have two.

That’s what “maintainability” is all about.
0
Reply ldo (2144) 6/26/2010 12:40:41 AM

In article <mailman.2117.1277511935.32709.python-list@python.org>,
 Ian Kelly <ian.g.kelly@gmail.com> wrote:

> On Fri, Jun 25, 2010 at 5:17 PM, Nobody <nobody@nowhere.com> wrote:
> > To be fair, it isn't actually limited to web developers. I've seen the
> > following in scientific code written in C (or, more likely, ported to C
> > from Fortran) for Unix:
> >
> > � � � �sprintf(buff, "rm -f %s", filename);
> > � � � �system(buff);
> 
> Tsk, tsk.  And it's so easy to fix, too:
> 
>     #define BUFSIZE 1000000
>     char buff[BUFSIZE];
>     if (snprintf(buff, BUFSIZE, "rm -f %s", filename) >= BUFSIZE) {
>         printf("No buffer overflow for you!\n");
>     } else {
>         system(buff);
>     }
> 
> There, that's much more secure.

I recently fixed a bug in some production code.  The programmer was 
careful to use snprintf() to avoid buffer overflows.  The only problem 
is, he wrote something along the lines of:

snprintf(buf, strlen(foo), foo);

I'm sure the code got reviewed originally, and probably looked at dozens 
of times over the years.  Nobody caught the problem until we ran a 
static code analysis tool (Coverity) over it.

To bring this back to something remotely Python related, the point of 
all this is that security is hard.  A lot of the security best practices 
(such as "don't compose SQL queries on the fly with externally tainted 
strings") exist because they address ways that people have gotten burned 
in the past.  It if foolish to think that you're smarter than everybody 
else and have thought of every possibility to avoid getting burned by 
doing the things that have gotten other people in trouble.
0
Reply Roy 6/26/2010 12:43:51 AM

In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn 
wrote:

> I thought it was well-known that the solution is *not* to try to
> sanitize the input -- it's to switch to an interface which doesn't
> involve generating an intermediate executable.  In the Python example,
> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).

That’s what I mean. Why do people consider input sanitization so hard?
0
Reply Lawrence 6/26/2010 12:49:09 AM

On 2010-06-25 20:49:09 -0400, Lawrence D'Oliveiro said:

> In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
> wrote:
> 
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input -- it's to switch to an interface which doesn't
>> involve generating an intermediate executable.  In the Python example,
>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
> 
> That’s what I mean. Why do people consider input sanitization so hard?

It's not hard. It's just begging for a visit from the fuckup fairy.

-o

0
Reply Owen 6/26/2010 2:56:02 AM

On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> In the Python example, that would be something like
>> os.popen2(['zcat', '-f', '--', untrusted]).
>
> That’s what I mean. Why do people consider input sanitization
> so hard?

It's hard because it requires thinking.  Sadly, many of the 
people I know who call themselves programmers couldn't code their 
way out of a paper bag, let alone think logically about the 
security implications of their code.[1]

-tkc


[1] much of which ends up being cargo-cult programming, 
cut-n-paste'd from Google search-results.





0
Reply Tim 6/26/2010 3:29:23 AM

On Thu, Jun 24, 2010 at 9:38 PM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <2010062422432660794-angrybaldguy@gmailcom>, Owen Jacobson wro=
te:
>
>> Why would I write this when SQLAlchemy, even without using its ORM
>> features, can do it for me?
>
> SQLAlchemy doesn=92t seem very flexible. Looking at the code examples
> <http://www.sqlalchemy.org/docs/examples.html>, they=92re very procedural=
:
> build object, then do a string of separate method calls to add data to it=
.. I
> prefer the functional approach, as in my table-update example.

Your example from the first post of the thread rewritten using sqlalchemy:

conn.execute(
    items.update()
         .where(items.c.inventory_nr =3D=3D modify_id)
         .values(
             dict(
                  (field[0], Params.getvalue("%s[%s]" % (field[1],
urllib.quote(modify_id))))
                  for field in [
                      (items.c.class_name, "modify_class"),
                      (items.c.make, "modify_make"),
                      (items.c.model, "modify_model"),
                      (items.c.details, "modify_details"),
                      (items.c.serial_nr, "modify_serial"),
                      (items.c.inventory_nr, "modify_invent"),
                      (items.c.when_purchased, "modify_when_purchased"),
                      ... you get the idea ...
                      (items.c.location_name, "modify_location"),
                      (items.c.comment, "modify_comment"),
                  ]
                 )
                )
         .values(last_modified =3D time.time())
)

Doesn't seem any less flexible to me, plus you don't have to worry
about calling your SQLString function at all.

Cheers,
Ian
0
Reply Ian 6/26/2010 6:33:16 AM

On 2010-06-25 19:49 , Lawrence D'Oliveiro wrote:
> In message<slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
> wrote:
>
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input -- it's to switch to an interface which doesn't
>> involve generating an intermediate executable.  In the Python example,
>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>
> That’s what I mean. Why do people consider input sanitization so hard?

It's not hard per se; it's just repetitive, prone to the occasional mistake, 
and, frankly, really boring. When faced with things like that, we do what we do 
everywhere else in programming: wrap up the repetitive bits into a simpler 
library API and use that everywhere. Wrapping up the escaping code into 
SQLString is a step in that direction. However, the standard SQL 
parameterization in most of the DB protocols or SQLAlchemy's query construction 
removes even more repetition and unnecessary typing. There's just no point in 
not using it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

0
Reply Robert 6/26/2010 7:39:03 AM

On Sat, 26 Jun 2010 12:40:41 +1200, Lawrence D'Oliveiro wrote:

>>> I construct ad-hoc queries all the time. It really isn’t that hard to
>>> do safely.
>> 
>> Wrong.
>> 
>> Even if you get the quoting absolutely correct (which is a very big "if"),
>> you have to remember to perform it every time, without exception.
>> 
>> More generally, as a program gets more complex, "this will work so long as
>> we do X every time without fail" approaches "this won't work".
> 
> That’s a content-free claim. Why? Because it applies equally to everything. 
> Replace “quoting” with something like “arithmetic”, and you’ll
> see what I mean:

If you omit the arithmetic, the program is likely to fail in very
obvious ways. Escaping is "almost" an identity function, which makes it
far more likely that omission or repetition will go unnoticed.

>> And you need to perform it exactly once. As the program gets more complex,
>> ensuring that it's done in the correct place, and only there, gets harder.
> 
> Nonsense. It only needs to be done at the boundary to the appropriate 
> component (MySQL, HTML, JavaScript, whatever).

That assumes that you have a well-defined "boundary", which isn't
necessarily the case.

In any case, you're still trying to make arguments about whether it's easy
or hard to get it right, which completely misses the point. Eliminating
the escaping entirely makes it impossible to get it wrong.

0
Reply Nobody 6/26/2010 10:49:18 AM

On Fri, 25 Jun 2010 20:43:51 -0400, Roy Smith wrote:

> To bring this back to something remotely Python related, the point of 
> all this is that security is hard.

Oh, this isn't solely a security issue.

Ask anyone with a surname like O'Neil, O'Connor, O'Leary, etc; they've
probably broken a lot of web apps *without even trying*.

0
Reply Nobody 6/26/2010 11:04:38 AM

In article <2010062522560231540-angrybaldguy@gmailcom>,
 Owen Jacobson <angrybaldguy@gmail.com> wrote:

> It's not hard. It's just begging for a visit from the fuckup fairy.

QOTD?
0
Reply Roy 6/26/2010 11:59:23 AM

On Sat, 26 Jun 2010 12:04:38 +0100
Nobody <nobody@nowhere.com> wrote:
> Ask anyone with a surname like O'Neil, O'Connor, O'Leary, etc; they've
> probably broken a lot of web apps *without even trying*.

At least it isn't a problem with the first name field.  Oh, wait...

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
0
Reply D 6/26/2010 12:07:11 PM

In message <mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase 
wrote:

> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
> ...

I see that you published my unobfuscated e-mail address on USENET for all to
see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
this was a momentary lapse of judgement, for which I expect an apology.
Otherwise, it becomes grounds for an abuse complaint to your ISP.

0
Reply Lawrence 6/27/2010 2:21:50 AM

In message <mailman.2126.1277534032.32709.python-list@python.org>, Ian Kelly 
wrote:

> Your example from the first post of the thread rewritten using sqlalchemy:
> 
> conn.execute(
>     items.update()
>          .where(items.c.inventory_nr == modify_id)
>          .values(
>              dict(
>                   (field[0], Params.getvalue("%s[%s]" % (field[1],
> urllib.quote(modify_id))))
>                   for field in [
>                       (items.c.class_name, "modify_class"),
>                       (items.c.make, "modify_make"),
>                       (items.c.model, "modify_model"),
>                       (items.c.details, "modify_details"),
>                       (items.c.serial_nr, "modify_serial"),
>                       (items.c.inventory_nr, "modify_invent"),
>                       (items.c.when_purchased, "modify_when_purchased"),
>                       ... you get the idea ...
>                       (items.c.location_name, "modify_location"),
>                       (items.c.comment, "modify_comment"),
>                   ]
>                  )
>                 )
>          .values(last_modified = time.time())
> )
> 
> Doesn't seem any less flexible to me, plus you don't have to worry
> about calling your SQLString function at all.

Except I only needed two calls to SQLString, while you need two dozen 
instances of that repetitive items.c boilerplate.

As a human, being repetitive is not my job. That’s what the computer is for.
0
Reply Lawrence 6/27/2010 2:31:59 AM

In message <2010062522560231540-angrybaldguy@gmailcom>, Owen Jacobson wrote:

> It's not hard. It's just begging for a visit from the fuckup fairy.

That’s the same fallacious argument I pointed out earlier.
0
Reply Lawrence 6/27/2010 2:33:57 AM

In message <pan.2010.06.26.10.49.02.156000@nowhere.com>, Nobody wrote:

> On Sat, 26 Jun 2010 12:40:41 +1200, Lawrence D'Oliveiro wrote:
> 
>>>> I construct ad-hoc queries all the time. It really isn’t that hard to
>>>> do safely.
>>> 
>>> Wrong.
>>> 
>>> Even if you get the quoting absolutely correct (which is a very big
>>> "if"), you have to remember to perform it every time, without exception.
>>> 
>>> More generally, as a program gets more complex, "this will work so long
>>> as we do X every time without fail" approaches "this won't work".
>> 
>> That’s a content-free claim. Why? Because it applies equally to
>> everything. Replace “quoting” with something like “arithmetic”, and
>> you’ll see what I mean:
> 
> If you omit the arithmetic, the program is likely to fail in very
> obvious ways. Escaping is "almost" an identity function, which makes it
> far more likely that omission or repetition will go unnoticed.

Maybe you need to go back and reread my original posting. The SQLString 
routine doesn’t just escape special characters, it generates a full MySQL 
string literal, complete with quotation marks. That makes it rather more 
likely for a syntax error to occur if I forget to use it, don’t you think?

>>> And you need to perform it exactly once. As the program gets more
>>> complex, ensuring that it's done in the correct place, and only there,
>>> gets harder.
>> 
>> Nonsense. It only needs to be done at the boundary to the appropriate
>> component (MySQL, HTML, JavaScript, whatever).
> 
> That assumes that you have a well-defined "boundary", which isn't
> necessarily the case.

It’s ALWAYS the case.

> In any case, you're still trying to make arguments about whether it's easy
> or hard to get it right, which completely misses the point. Eliminating
> the escaping entirely makes it impossible to get it wrong.

Except nobody has yet shown an alternative which is easier to get right.
0
Reply Lawrence 6/27/2010 2:36:10 AM

In article <i06cju$qqa$2@lust.ihug.co.nz>,
Lawrence D'Oliveiro  <ldo@geek-central.gen.new_zealand> wrote:
>In message <mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase 
>wrote:
>>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
>I see that you published my unobfuscated e-mail address on USENET for all to
>see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
>this was a momentary lapse of judgement, for which I expect an apology.
>Otherwise, it becomes grounds for an abuse complaint to your ISP.

You are double daft.  First, I completely disagree with you about it
being abuse; from my POV anyone posting to Usenet should do so with an
unobfuscated address.  Secondly, you are wrong about Tim publishing your
address unless you intended to follow up to a completely different post,
and you owe *him* an apology for a false accusation.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"If you don't know what your program is supposed to do, you'd better not
start writing it."  --Dijkstra
0
Reply aahz 6/27/2010 2:53:04 AM

On Sat, Jun 26, 2010 at 7:21 PM, Lawrence D'Oliveiro <> wrote:
> In message <mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
> wrote:
>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
> I see that you published my unobfuscated e-mail address on USENET for all to
> see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
> this was a momentary lapse of judgement, for which I expect an apology.
> Otherwise, it becomes grounds for an abuse complaint to your ISP.

Will you give it a rest already with these threatening messages? Why
are you still using this only-partially-obfuscated address with USENET
anyway? This has happened twice before, it will doubtless happen yet
again. Just use an /entirely invalid/ From address like some other
posters do.

I can't believe you have a form letter for this...

Regards,
Chris
--
Public addresses eventually going bad is a *fact of life*; plan ahead
accordingly.
0
Reply Chris 6/27/2010 2:55:49 AM

On 06/26/2010 09:21 PM, Lawrence D'Oliveiro wrote:
> In message<mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
> wrote:
>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
> I see that you published my unobfuscated e-mail address on USENET for all to
> see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
> this was a momentary lapse of judgement, for which I expect an apology.
> Otherwise, it becomes grounds for an abuse complaint to your ISP.

I'm sorry...you've got your knickers in a knot?  That your spam 
filters seem to be insufficient?  That you don't have a custom 
throwaway address for such public dialogs?  For preventing an 
"undeliverable" bounce message that your bogus address would have 
caused (if your mail provider is RFC-compliant; though your mail 
provider may kindly be breaking RFC by disabling "undeliverable" 
responses to prevent back-scatter spam)?

Is the abuse charge "waah, he replied to my actual email rather 
than the false one I spoofed"?

I'm not sure an abuse complaint to my ISP would net you anything 
since the exact out-bound headers show nothing abusive, only the 
correcting of an invalid TLD to prevent a bounce (and a distinct 
lack of USENET references in the original message that went to 
you and CC'ed python-list@python.org).

Having regularly used python.list@tim.thechases.com unobfuscated 
for easily over 5 years, the spam to this address has been almost 
negligible (or so effectively dealt with by Thunderbird's spam 
filters that I've never noticed it).

-tkc



0
Reply Tim 6/27/2010 3:23:53 AM

On 6/26/10 7:21 PM, Lawrence D'Oliveiro wrote:
> In message<mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
> wrote:
>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
> I see that you published my unobfuscated e-mail address on USENET for all to
> see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
> this was a momentary lapse of judgement, for which I expect an apology.
> Otherwise, it becomes grounds for an abuse complaint to your ISP.

Wow.

Way to be a douchebag.

I was going to say something about the realities of this forum and its 
dual-nature and conflicting netiquette and on. But I decided it really 
just had no point.

So, I'm left with: wow. You kinda suck*, man.

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

P.S. *Then again, I'm fairly sure anytime someone has a form letter 
which contains the words, "I expect an apology", there's some personal 
suck going on.

0
Reply Stephen 6/27/2010 3:27:52 AM

Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> writes:

> I see that you published my unobfuscated e-mail address on USENET for
> all to see. I obfuscated it for a reason, to keep the spammers away.
> I'm assuming this was a momentary lapse of judgement, for which I
> expect an apology. Otherwise, it becomes grounds for an abuse
> complaint to your ISP.

Er? On what grounds would you complain to their ISP? You might consider
the person rude, but that's not grounds for an abuse complaint. What
part of their ISP's terms of service do you think they have abused by
de-obfuscating information you freely posted to the internet?

-- 
 \           “If you do not trust the source do not use this program.” |
  `\                                —Microsoft Vista security dialogue |
_o__)                                                                  |
Ben Finney
0
Reply Ben 6/27/2010 3:50:41 AM

On Sat, Jun 26, 2010 at 8:50 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
> Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> writes:
>
>> I see that you published my unobfuscated e-mail address on USENET for
>> all to see. I obfuscated it for a reason, to keep the spammers away.
>> I'm assuming this was a momentary lapse of judgement, for which I
>> expect an apology. Otherwise, it becomes grounds for an abuse
>> complaint to your ISP.
>
> Er? On what grounds would you complain to their ISP? You might consider
> the person rude, but that's not grounds for an abuse complaint. What
> part of their ISP's terms of service do you think they have abused by
> de-obfuscating information you freely posted to the internet?

I routinely post my email on this and other mailing lists and have yet
to get a piece of spam in my inbox as a result. I suggest you get a
better spam filter rather than expecting the rest of the universe to
annoy itself for your benefit.

Geremy Condra
0
Reply geremy 6/27/2010 4:04:45 AM

In message <pan.2010.06.26.11.04.22.328000@nowhere.com>, Nobody wrote:

> Ask anyone with a surname like O'Neil, O'Connor, O'Leary, etc; they've
> probably broken a lot of web apps *without even trying*.

Last I checked, I couldn’t post comments on freedom-to-tinker.com.
0
Reply Lawrence 6/27/2010 4:15:18 AM

In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:

> I recently fixed a bug in some production code.  The programmer was
> careful to use snprintf() to avoid buffer overflows.  The only problem
> is, he wrote something along the lines of:
> 
> snprintf(buf, strlen(foo), foo);

A long while ago I came up with this macro:

    #define Descr(v) &v, sizeof v

making the correct version of the above become

    snprintf(Descr(buf), foo);

0
Reply Lawrence 6/27/2010 4:17:39 AM

On Sat, Jun 26, 2010 at 8:31 PM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> Except I only needed two calls to SQLString, while you need two dozen
> instances of that repetitive items.c boilerplate.
>
> As a human, being repetitive is not my job. That=92s what the computer is=
 for.

Then why do you have every parameter prefixed with "modify_"? 8-)

But seriously, if that bothers you, then fold the "items.c." portion
into the generator expression with a getattr call.  Or just change
them back to the same strings you had originally, and sqlalchemy will
be just as happy to accept them as-is.

Cheers,
Ian
0
Reply Ian 6/27/2010 7:31:17 AM

On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>
>> I recently fixed a bug in some production code. =C2=A0The programmer was
>> careful to use snprintf() to avoid buffer overflows. =C2=A0The only prob=
lem
>> is, he wrote something along the lines of:
>>
>> snprintf(buf, strlen(foo), foo);
>
> A long while ago I came up with this macro:
>
> =C2=A0 =C2=A0#define Descr(v) &v, sizeof v
>
> making the correct version of the above become
>
> =C2=A0 =C2=A0snprintf(Descr(buf), foo);
>

Not quite right.  If buf is a char array, as suggested by the use of
sizeof, then you're not passing a char* to snprintf.  You need to lose
the & in your macro.

--=20
regards,
kushal
0
Reply python2058 (92) 6/27/2010 8:15:40 AM

In message <mailman.2184.1277626565.32709.python-list@python.org>, Kushal 
Kumaran wrote:

> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>
>>> I recently fixed a bug in some production code.  The programmer was
>>> careful to use snprintf() to avoid buffer overflows.  The only problem
>>> is, he wrote something along the lines of:
>>>
>>> snprintf(buf, strlen(foo), foo);
>>
>> A long while ago I came up with this macro:
>>
>> #define Descr(v) &v, sizeof v
>>
>> making the correct version of the above become
>>
>> snprintf(Descr(buf), foo);
> 
> Not quite right.  If buf is a char array, as suggested by the use of
> sizeof, then you're not passing a char* to snprintf.

What am I passing, then?
0
Reply Lawrence 6/27/2010 11:46:36 AM

In message <mailman.2183.1277623909.32709.python-list@python.org>, Ian Kelly 
wrote:

> On Sat, Jun 26, 2010 at 8:31 PM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>> Except I only needed two calls to SQLString, while you need two dozen
>> instances of that repetitive items.c boilerplate.
>>
>> As a human, being repetitive is not my job. That’s what the computer is
>> for.
> 
> Then why do you have every parameter prefixed with "modify_"? 8-)

Touché :). Actually it’s because the same form can be used to add a new 
record to the table, so there’s a separate set of input fields for that.

> But seriously, if that bothers you, then fold the "items.c." portion
> into the generator expression with a getattr call.  Or just change
> them back to the same strings you had originally, and sqlalchemy will
> be just as happy to accept them as-is.

All this trouble, and it only gets rid of 2 of the 3 instances of data-
escaping in the example.
0
Reply Lawrence 6/27/2010 11:51:16 AM

On Sun, 27 Jun 2010 14:36:10 +1200, Lawrence D'Oliveiro wrote:

>> In any case, you're still trying to make arguments about whether it's easy
>> or hard to get it right, which completely misses the point. Eliminating
>> the escaping entirely makes it impossible to get it wrong.
> 
> Except nobody has yet shown an alternative which is easier to get right.

For SQL, use stored procedures or prepared statements. For HTML/XML, use a
DOM (or similar) interface.
0
Reply Nobody 6/27/2010 1:55:23 PM

On Sat, 2010-06-26, Lawrence D'Oliveiro wrote:
> In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn 
> wrote:
>
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input -- it's to switch to an interface which doesn't
>> involve generating an intermediate executable.  In the Python example,
>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>
> That???s what I mean. Why do people consider input sanitization so hard?

I'm not sure you understood me correctly, because I advocate
*not* doing input sanitization. Hard or not -- I don't want to know,
because I don't want to do it.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/27/2010 7:17:54 PM

On Fri, 2010-06-25, Nobody wrote:
> On Fri, 25 Jun 2010 12:15:08 +0000, Jorgen Grahn wrote:
>
>> I don't do SQL and I don't even understand the terminology properly
>> ... but the discussion around it bothers me.
>> 
>> Do those people really do this?
>
> Yes. And then some.
>
> Among web developers, the median level of programming knowledge amounts to
> the first 3 chapters of "Learn PHP in 7 Days".
>
> It doesn't help the the guy who wrote PHP itself wasn't much better.
>
>> - accept untrusted user data
>> - try to sanitize the data (escaping certain characters etc)
>> - turn this data into executable code (SQL)
>> - executing it
>> 
>> Like the example in the article
>> 
>>   SELECT * FROM hotels WHERE city = '<untrusted>';
>
> Yep. Search the BugTraq archives for "SQL injection". And most of those
> are for widely-deployed middleware; the zillions of bespoke site-specific
> scripts are likely to be worse.
>
> Also: http://xkcd.com/327/

Priceless!

As is often the case with xkcd, I learned something, too: there's a
widely used web application/portal/database thingy which silently
strips some characters from my input.  I thought it had to do with
HTML, but it's in fact exactly the sequences "'", ')', ';' and '--'
from the comic, and a few more like '>' and undoubtedly some I haven't
noticed yet.

That is surely "input sanitization" gone horribly wrong: I enter "6--8
slices of bread", but the system stores "68 slices of bread".

>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input
>
> Well known by anyone with a reasonable understanding of the principles of
> programming, but somewhat less well known by the other 98% of web
> developers.
>
>> Am I missing something?
>
> There's a world of difference between a skilled chef and the people
> flipping burgers for a minimum wage. And between a chartered civil
> engineer and the people laying the asphalt. And between what you
> probably consider a programmer and the people doing most web development.

I don't know them, so I wouldn't know ... What I would *expect* is
that safe tools are provided for them, not just workarounds so they
can keep using the unsafe tools. That's what Python did, with its
multitude of alternatives to os.system and os.popen.

Anyway, thanks. It's always nice to be able to map foreign terminology
like "SQL injection" to something you already know.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/27/2010 8:15:11 PM

On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>
>> I recently fixed a bug in some production code.  The programmer was
>> careful to use snprintf() to avoid buffer overflows.  The only problem
>> is, he wrote something along the lines of:
>> 
>> snprintf(buf, strlen(foo), foo);
>
> A long while ago I came up with this macro:
>
>     #define Descr(v) &v, sizeof v
>
> making the correct version of the above become
>
>     snprintf(Descr(buf), foo);

This is off-topic, but I believe snprintf() in C can *never* safely be
the only thing you do to the buffer: you also have to NUL-terminate it
manually in some corner cases. See the documentation.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/27/2010 8:30:48 PM

On Jun 24, 6:02=A0pm, Roy Smith <r...@panix.com> wrote:
> In article <i00t2k$l0...@lust.ihug.co.nz>,
> =A0Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
>
> > I construct ad-hoc queries all the time. It really isn=92t that hard to=
 do
> > safely. All you have to do is read the documentation
>
> I get worried when people talk about how easy it is to do something
> safely. =A0Let me suggest a couple of things you might not have considere=
d:
>
> 1) Somebody is running your application (or the database server) with
> the locale set to something unexpected. =A0This might change how numbers,
> dates, currency, etc, get formatted, which could change the meaning of
> your constructed SQL statement.
>
> 2) Somebody runs your application with a different PYTHONPATH, which
> causes a different (i.e. malicious) urllib module to get loaded, which
> makes urllib.quote() do something you didn't expect.

Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):

results =3D rdb_query(table =3D model,
                    columns =3D [model.name, model.number])

results =3D rdb_inner_join(tables =3D [records,tags],
                         joins =3D [(records.id,tags.record_id)]),
                         columns =3D [record.name, tag.name])

Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare.  Still, it hasn't stopped binary APIs in other kinds of
libraries.


Carl Banks
0
Reply Carl 6/27/2010 10:07:28 PM

In article 
<14e44c9c-04d9-452d-b544-498adfaf7d40@d8g2000yqf.googlegroups.com>,
 Carl Banks <pavlovevidence@gmail.com> wrote:

> Seriously, almost every other kind of library uses a binary API. What
> makes databases so special that they need a string-command based API?
> How about this instead (where this a direct binary interface to the
> library):
> 
> results = rdb_query(table = model,
>                     columns = [model.name, model.number])
> 
> results = rdb_inner_join(tables = [records,tags],
>                          joins = [(records.id,tags.record_id)]),
>                          columns = [record.name, tag.name])
> 
> Well, we know the real reason is that C, Java, and friends lack
> expressiveness and so constructing a binary query is an ASCII
> nightmare.  Still, it hasn't stopped binary APIs in other kinds of
> libraries.

Well, the answer to that one is simple.  SQL, in the hands of somebody 
like me, can be used to express a few pathetic joins and what I do with 
it could probably be handled with the kind of API you're describing.  
But, the language has far more expressivity than that, and a 
domain-specific language is really a good fit for what it can do.

The problem is not so much that SQL queries are described as text 
strings, but that the distinction between program and data gets lost if 
you build the query as one big string.  What you need (and which the 
Python API supplies) is the ability to clearly distinguish between "this 
text is my program" and "this text is a value which my program uses".

Python has the same problem.  If I had a text string, s, which I read 
from some external source, and wanted to interpret that string as an 
integer, I could do (at least) two different things.

# Thing 1
myInteger = int(s)

# Thing 2
myInteger = eval(s)

for properly formed input, either one works, but thing 2 loses the 
distinction between program and data and is thus dangerous.  Exactly 
like building a SQL query by smashing a bunch of strings together.
0
Reply roy (2043) 6/27/2010 10:20:02 PM

Carl Banks <pavlovevidence@gmail.com> writes:

> Seriously, almost every other kind of library uses a binary API.

Except for the huge number that deal with text protocols or languages.

> What makes databases so special that they need a string-command based
> API?

Because SQL is a text language.

-- 
 \              “In the long run, the utility of all non-Free software |
  `\      approaches zero. All non-Free software is a dead end.” —Mark |
_o__)                                                    Pilgrim, 2006 |
Ben Finney
0
Reply Ben 6/27/2010 11:35:53 PM

On 2010-06-26 22:33:57 -0400, Lawrence D'Oliveiro said:

> In message <2010062522560231540-angrybaldguy@gmailcom>, Owen Jacobson wrote:
> 
>> It's not hard. It's just begging for a visit from the fuckup fairy.
> 
> That’s the same fallacious argument I pointed out earlier.

In the sense that "using correct manual escaping leads to SQL injection 
vulnerabilities", yes, that's a fallacious argument on its own. 
However, as sites like BUGTRAQ amply demonstrate, generating SQL 
through string manipulation is a risky development practice[0]. You can 
continue to justify your choice to do so however you want, and you may 
even be the One True Developer capable of getting it absolutely right 
under all circumstances, but I'd still reject patches that introduced a 
SQLString-like function and ask that you resubmit them using the 
database API's parameterization tools instead.

Assuming for the sake of discussion that your SQLString function 
perfectly captures the transformation required to turn an arbitrary str 
into a MySQL string literal. How do you address the following issues?

1. Other (possibly inexperienced) developers reading your source who 
may not have the skills to correctly implement the same transform 
correctly learn from your programs that writing your own query munger 
is okay.
1a. Other (possibly inexperienced) developers decide to copy and paste 
your function without fully understanding how it works, in tandem with 
any of the other issues below. (If you think this is rare, I invite you 
to visit stackoverflow or roseindia some time.)

2. MySQL changes the quoting and escaping rules to address a 
bug/feature request/developer whim, introducing a new set of corner 
cases into your function and forcing you to re-learn the escaping and 
quoting rules. (For people using DB API parameters, this is a matter of 
upgrading the DB adapter module to a version that supports the modified 
rules.)

3. You decide to switch from MySQL to a more fully-featured RDBMS, 
which may have different quoting and escaping rules around string 
literals.
3a. *Someone else* decides to port your program to a different RDBMS, 
and may not understand that SQLString implements MySQL's quoting and 
escaping rules only.

4. MySQL AB finally get off their collective duffs and adds real 
parameter separation to the MySQL wire protocol, and implements real 
prepared statements to massive speed gains in scenarios that are 
relevant to your interests; string-based query construction gets left 
out in the cold.
4a. As with case 3, except that instead of the rules changing when you 
move to a new RDBMS, it's the relative performance of submitting new 
queries versus reusing a parameterized query that changes.

On top of the obvious issue of completely avoiding quoting bugs, using 
query parameters rather than escaping and string manipulation neatly 
saves you from having to address any of these problems (and a multitude 
of others) -- the DB API implementation will handle things for you, and 
you are propagating good practice in an easy-to-understand form.

I am honestly at a loss trying to understand your position. There is a 
huge body of documentation out there about the weaknesses of 
string-manipulation-based approaches to query construction, and the use 
of query parameters is so compellingly the Right Thing that I have a 
very hard time comprehending why anyone would opt not to use it except 
out of pure ignorance of their existence. Generating executable code -- 
including SQL -- from untrusted user input introduces an large 
vulnerability surface for very little benefit.

You don't handle function parameters by building up python-language 
strs containing the values as literals and eval'ing them, do you?

-o

[0] If you want to be *really* pedantic, string-manipulation-based 
query construction is strongly correlated with the occurrence of SQL 
injection vulnerabilities and bugs, which is in turn not strongly 
correlated with very many other practices. Happy?

0
Reply angrybaldguy (338) 6/28/2010 1:49:11 AM

On Jun 27, 4:35=A0pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> Carl Banks <pavlovevide...@gmail.com> writes:
> > Seriously, almost every other kind of library uses a binary API.
>
> Except for the huge number that deal with text protocols or languages.

No, not really.  Almost all types of libraries have binary APIs,
including those that deal with text protocols or language.  Any
control with string commands is something that's built on top of the
binary API.  And culturally, programmers interfacing those libraries
expect to and are expected to use the binary API for low-level
programming.

RDBs, as a whole, either don't have binary APIs or they have them but
no one really uses them.


> > What makes databases so special that they need a string-command based
> > API?
>
> Because SQL is a text language.

Circular logic.  I'm disappointed, usually when you sit on your
reinforced soapbox and pretense the air of infinite expertise you at
least use reasonable logic.

Also, I was asking about databases.  "SQL is a text language" is not
the answer to the question "Why do RDBs use string commands instead of
binary APIs"?


Carl Banks
0
Reply Carl 6/28/2010 2:35:05 AM

On Jun 27, 3:20=A0pm, Roy Smith <r...@panix.com> wrote:
> In article
> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
> =A0Carl Banks <pavlovevide...@gmail.com> wrote:
>
>
>
> > Seriously, almost every other kind of library uses a binary API. What
> > makes databases so special that they need a string-command based API?
> > How about this instead (where this a direct binary interface to the
> > library):
>
> > results =3D rdb_query(table =3D model,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 columns =3D [model.name, model.=
number])
>
> > results =3D rdb_inner_join(tables =3D [records,tags],
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0joins =3D [(records.=
id,tags.record_id)]),
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0columns =3D [record.=
name, tag.name])
>
> > Well, we know the real reason is that C, Java, and friends lack
> > expressiveness and so constructing a binary query is an ASCII
> > nightmare. =A0Still, it hasn't stopped binary APIs in other kinds of
> > libraries.
>
> Well, the answer to that one is simple. =A0SQL, in the hands of somebody
> like me, can be used to express a few pathetic joins and what I do with
> it could probably be handled with the kind of API you're describing. =A0
> But, the language has far more expressivity than that, and a
> domain-specific language is really a good fit for what it can do.

I'm not the biggest expert on SQL ever, but the only thing I can think
of is expressions.  Statements don't express anything very complex,
and could straightforwardly be represented by function calls.  But
it's a fair point.


> The problem is not so much that SQL queries are described as text
> strings,

No, it is the problem, or part of it.  String commands are inherently
prone to injection attacks, that's the main problem with them.


> but that the distinction between program and data gets lost if
> you build the query as one big string.

That too.


Carl Banks
0
Reply pavlovevidence (1338) 6/28/2010 2:51:59 AM

On 2010-06-27 22:51:59 -0400, Carl Banks said:

> On Jun 27, 3:20�pm, Roy Smith <r...@panix.com> wrote:
>> In article
>> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
>> �Carl Banks <pavlovevide...@gmail.com> wrote:
>> 
>> 
>> 
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>>> How about this instead (where this a direct binary interface to the
>>> library):
>> 
>>> results = rdb_query(table = model,
>>> � � � � � � � � � � columns = [model.name, model.number])
>> 
>>> results = rdb_inner_join(tables = [records,tags],
>>> � � � � � � � � � � � � �joins = [(records.id,tags.record_id)]),
>>> � � � � � � � � � � � � �columns = [record.name, tag.name])
>> 
>>> Well, we know the real reason is that C, Java, and friends lack
>>> expressiveness and so constructing a binary query is an ASCII
>>> nightmare. �Still, it hasn't stopped binary APIs in other kinds of
>>> libraries.
>> 
>> Well, the answer to that one is simple. �SQL, in the hands of somebody
>> like me, can be used to express a few pathetic joins and what I do with
>> it could probably be handled with the kind of API you're describing. �
>> But, the language has far more expressivity than that, and a
>> domain-specific language is really a good fit for what it can do.
> 
> I'm not the biggest expert on SQL ever, but the only thing I can think
> of is expressions.  Statements don't express anything very complex,
> and could straightforwardly be represented by function calls.  But
> it's a fair point.

Off the top of my head, I can think of a few things that would be 
tricky to turn into an API:

 * Aggregation (GROUP BY, aggregate functions over arbitrary 
expressions, HAVING clauses).
 * CASE expressions.
 * Subqueries.
 * Recursive queries (in DBMSes that support them).
 * Window clauses (likewise).
 * Set operations between queries (UNION, DIFFERENCE, INTERSECT).
 * A surprisingly rich set of JOIN clauses beyond the obvious inner 
natural joins.
 * Various DBMS-specific locking hints.
 * Computed inserts and updates.
 * Updates and deletes that include joins.
 * RETURNING lists on modification queries.
 * Explicit (DBMS-side) cursors.

This is by no means an exhaustive list.

Of course, it's possible to represent all of this via an API rather 
than a language, and libraries like SQLAlchemy make a reasonable 
attempt�at doing just that. However, not every programming language has 
the kind of structural flexibility to do that well: a library similar 
to SQLalchemy would be incredibly clunky (if it worked at all) in, say, 
Java or C#, and it'd be nearly impossible to pull off in C. Even LDAP, 
which is defined more in terms of APIs than languages, forgoes trying 
to define a predicate API and uses a domain-specific filtering language 
instead.

There's certainly a useful subset of SQL that could be trivially 
replaced with an API. Simple by-the-numbers CRUD queries don't exercise 
much of SQL's power. In fact, we can do that already: any ORM can 
handle that level just fine.

-o

0
Reply Owen 6/28/2010 3:18:25 AM

On 2010-06-27 22:51:59 -0400, Carl Banks said:

> On Jun 27, 3:20�pm, Roy Smith <r...@panix.com> wrote:
>> In article
>> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
>> �Carl Banks <pavlovevide...@gmail.com> wrote:
>> 
>> 
>> 
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>>> How about this instead (where this a direct binary interface to the
>>> library):
>> 
>>> results = rdb_query(table = model,
>>> � � � � � � � � � � columns = [model.name, model.number])
>> 
>>> results = rdb_inner_join(tables = [records,tags],
>>> � � � � � � � � � � � � �joins = [(records.id,tags.record_id)]),
>>> � � � � � � � � � � � � �columns = [record.name, tag.name])
>> 
>>> Well, we know the real reason is that C, Java, and friends lack
>>> expressiveness and so constructing a binary query is an ASCII
>>> nightmare. �Still, it hasn't stopped binary APIs in other kinds of
>>> libraries.
>> 
>> Well, the answer to that one is simple. �SQL, in the hands of somebody
>> like me, can be used to express a few pathetic joins and what I do with
>> it could probably be handled with the kind of API you're describing. �
>> But, the language has far more expressivity than that, and a
>> domain-specific language is really a good fit for what it can do.
> 
> I'm not the biggest expert on SQL ever, but the only thing I can think
> of is expressions.  Statements don't express anything very complex,
> and could straightforwardly be represented by function calls.  But
> it's a fair point.

Off the top of my head, I can think of a few things that would be 
tricky to turn into an API:

 * Aggregation (GROUP BY, aggregate functions over arbitrary 
expressions, HAVING clauses).
 * CASE expressions.
 * Subqueries.
 * Recursive queries (in DBMSes that support them).
 * Window clauses (likewise).
 * Set operations between queries (UNION, DIFFERENCE, INTERSECT).
 * A surprisingly rich set of JOIN clauses beyond the obvious inner 
natural joins.
 * Various DBMS-specific locking hints.
 * Computed inserts and updates.
 * Updates and deletes that include joins.
 * RETURNING lists on modification queries.
 * Explicit (DBMS-side) cursors.

This is by no means an exhaustive list.

Of course, it's possible to represent all of this via an API rather 
than a language, and libraries like SQLAlchemy make a reasonable 
attempt�at doing just that. However, not every programming language has 
the kind of structural flexibility to do that well: a library similar 
to SQLalchemy would be incredibly clunky (if it worked at all) in, say, 
Java or C#, and it'd be nearly impossible to pull off in C. Even LDAP, 
which is defined more in terms of APIs than languages, forgoes trying 
to define a predicate API and uses a domain-specific filtering language 
instead.

There's certainly a useful subset of SQL that could be trivially 
replaced with an API. Simple by-the-numbers CRUD queries don't exercise 
much of SQL's power. In fact, we can do that already: any ORM can 
handle that level just fine.

-o

0
Reply Owen 6/28/2010 3:19:37 AM

Carl Banks <pavlovevidence@gmail.com> writes:

> On Jun 27, 4:35 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> > Carl Banks <pavlovevide...@gmail.com> writes:
> > > Seriously, almost every other kind of library uses a binary API.
> >
> > Except for the huge number that deal with text protocols or languages.
>
> No, not really.  Almost all types of libraries have binary APIs,
> including those that deal with text protocols or language.  Any
> control with string commands is something that's built on top of the
> binary API.

I don't know what you mean by this.

Are you referring to the operating system's function call API? It's
trivially true that the OS function call API is “binary”, but that
doesn't seem useful for distinguishing; by that definiition, SQL isn't a
“library API” at all. So I assumed you didn't mean that.

Rather, I was taking you to mean the network API used for communicating
with the server; and it's in that context that I'm saying there are a
huge number of text-based network APIs.

If that's not what you mean either, then I need you to explain.

> I'm disappointed, usually when you sit on your reinforced soapbox and
> pretense the air of infinite expertise you at least use reasonable
> logic.

Kindly stop inventing straw men to attack; I deny the position you're
painting for me.

> Also, I was asking about databases. "SQL is a text language" is not
> the answer to the question "Why do RDBs use string commands instead of
> binary APIs"?

To that question, I'd say that SQL isn't a library API, but rather a
network API and a command API, and is thus well implemented with textual
commands.

-- 
 \     “[W]e are still the first generation of users, and for all that |
  `\      we may have invented the net, we still don't really get it.” |
_o__)                                                   —Douglas Adams |
Ben Finney
0
Reply Ben 6/28/2010 3:33:23 AM

On Jun 27, 8:19=A0pm, Owen Jacobson <angrybald...@gmail.com> wrote:
> On 2010-06-27 22:51:59 -0400, Carl Banks said:
> > On Jun 27, 3:20 pm, Roy Smith <r...@panix.com> wrote:
> >> In article
> >> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
> >> Carl Banks <pavlovevide...@gmail.com> wrote:
>
> >>> Seriously, almost every other kind of library uses a binary API. What
> >>> makes databases so special that they need a string-command based API?
> >>> How about this instead (where this a direct binary interface to the
> >>> library):
>
> >>> results =3D rdb_query(table =3D model,
> >>> columns =3D [model.name, model.number])
>
> >>> results =3D rdb_inner_join(tables =3D [records,tags],
> >>> joins =3D [(records.id,tags.record_id)]),
> >>> columns =3D [record.name, tag.name])
>
> >>> Well, we know the real reason is that C, Java, and friends lack
> >>> expressiveness and so constructing a binary query is an ASCII
> >>> nightmare. Still, it hasn't stopped binary APIs in other kinds of
> >>> libraries.
>
> >> Well, the answer to that one is simple. SQL, in the hands of somebody
> >> like me, can be used to express a few pathetic joins and what I do wit=
h
> >> it could probably be handled with the kind of API you're describing.
> >> But, the language has far more expressivity than that, and a
> >> domain-specific language is really a good fit for what it can do.
>
> > I'm not the biggest expert on SQL ever, but the only thing I can think
> > of is expressions. =A0Statements don't express anything very complex,
> > and could straightforwardly be represented by function calls. =A0But
> > it's a fair point.
>
> Off the top of my head, I can think of a few things that would be
> tricky to turn into an API:
>
> =A0* Aggregation (GROUP BY, aggregate functions over arbitrary
> expressions, HAVING clauses).
> =A0* CASE expressions.
> =A0* Subqueries.
> =A0* Recursive queries (in DBMSes that support them).
> =A0* Window clauses (likewise).
> =A0* Set operations between queries (UNION, DIFFERENCE, INTERSECT).
> =A0* A surprisingly rich set of JOIN clauses beyond the obvious inner
> natural joins.
> =A0* Various DBMS-specific locking hints.
> =A0* Computed inserts and updates.
> =A0* Updates and deletes that include joins.
> =A0* RETURNING lists on modification queries.
> =A0* Explicit (DBMS-side) cursors.
>
> This is by no means an exhaustive list.

I don't know the exact details of all of these, but I'm going to opine
that at least some of these are easily expressible with a function
call API.  Perhaps more naturally than with string queries.  For
instance, set operations:

query1 =3D rdb_query(...)
query2 =3D rdb_query(...)

final_query =3D rdb_union(query1,query2)

or

final_query =3D query1 & query2

I'm not sure why GROUP BY couldn't be expressed by a keyword
argument.  The complexity of aggregate functions and computed inserts
comes mainly from expressions (which Roy Smith already mentioned), the
actual statements are simple.


> Of course, it's possible to represent all of this via an API rather
> than a language, and libraries like SQLAlchemy make a reasonable
> attempt at doing just that. However, not every programming language has
> the kind of structural flexibility to do that well: a library similar
> to SQLalchemy would be incredibly clunky (if it worked at all) in, say,
> Java or C#, and it'd be nearly impossible to pull off in C.

Yeah, which was kind of my original theory.


Carl Banks
0
Reply pavlovevidence (1338) 6/28/2010 3:48:07 AM

On 6/27/10 7:51 PM, Carl Banks wrote:
> I'm not the biggest expert on SQL ever, but the only thing I can think
> of is expressions.  Statements don't express anything very complex,
> and could straightforwardly be represented by function calls.

See, there's really two kinds of SQL out there.

There's the layman's SQL which is pretty straight-forward. Sure, it can 
start looking a little complicated if you get multiple clauses in the 
WHERE line (and maybe you're ambitious and do a simple inner join), but 
its probably still not bad. That can get translated into an API pretty 
easily.

Then there's the type of SQL that results in DBA's having jobs-- and 
deservedly so. Its *really* a very flexible and powerful language 
capable of doing quite a lot to bend, flex, twist, and interleave that 
data in the server while building up a result set for you.

I'm honestly only really in the former camp with a toe into the latter 
(I use aggregation and windowing functions over some interesting joins 
on occasion, but it takes effort). So I can't give a lot of serious 
examples to *prove* I'm right.

So I just have to say: based on my experience and admittedly limited 
imagination, converting the full expressive power of SQL into a regular 
sort of API would be a very, very, very hairy sort of mess. SQLAlchemy 
can do the layman's SQL, and can *kind of* do a *little bit* of the 
advanced stuff-- but usually, it does the advanced stuff by just making 
it very easy for you to shove it out of the way and do SQL directly.

But still: that's the structured part of SQL which belongs in a string. 
The data does not. It should be obvious that when a database provides 
you a mechanism to pass data in such that it doesn't need sanitization* 
at all, that's preferable to actually doing sanitization, even if you're 
divinely capable of perfect sanitization and even if sanitization is a 
trivial task that a monkey should be able to handle.


-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

P.S. *My computer /swears/ sanitization is spelled wrong. Either I'm 
high or it's high. Stupid old school mac mini.
0
Reply Stephen 6/28/2010 3:52:01 AM

On 6/27/10 8:48 PM, Carl Banks wrote:
> I don't know the exact details of all of these, but I'm going to opine
> that at least some of these are easily expressible with a function
> call API.  Perhaps more naturally than with string queries.  For
> instance, set operations:
>
> query1 = rdb_query(...)
> query2 = rdb_query(...)
>
> final_query = rdb_union(query1,query2)
>
> or
>
> final_query = query1&  query2

But, see, that's not actually what's going on behind the scenes in the 
database. Unless your "query1" and "query2" objects are opaque 
pseudo-objects which do not actually represent results -- the query 
planners do a *lot* of stuff by looking at the whole query and computing 
just how to go about executing all of the instructions.

The engine of a SQL database is a pretty sophisticated little pieces of 
coding. Because SQL is declarative, the engine is able to optimize just 
how to do everything when it looks at the full query, and even try out a 
few different ideas at first before deciding on just which path to take. 
(This is an area where parametrized queries is even more important: but 
I'm not sure if MySQL does proper prepared queries and caching of 
execution plans).

If you go and API it, then you're actually imposing an order on how it 
processes the query... unless your API is just a sort of opaque wrapper 
for some underlining declarative structure. (Like ORM's try to be)

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

0
Reply Stephen 6/28/2010 4:02:57 AM

On Jun 27, 8:52=A0pm, Stephen Hansen <me+list/pyt...@ixokai.io> wrote:
> Then there's the type of SQL that results in DBA's having jobs-- and
> deservedly so. Its *really* a very flexible and powerful language
> capable of doing quite a lot to bend, flex, twist, and interleave that
> data in the server while building up a result set for you.

All right, I get it.

I'm not talking about SQL, I'm talking about RDBs.  But I guess it is
important for serious RDBs to support queries complex enough that a
language like SQL is really needed to express it--even if being called
from an expressive language like Python.  Not everything is a simple
inner joins.  I defer to the community then, as my knowledge of
advanced SQL is minimal.

We'll just have accept the risk of injection attacks as a trade off,
and try to educate people to use placeholders when writing SQL.


Carl Banks
0
Reply Carl 6/28/2010 4:12:30 AM

On 2010-06-28 00:02:57 -0400, Stephen Hansen said:

> On 6/27/10 8:48 PM, Carl Banks wrote:
>> I don't know the exact details of all of these, but I'm going to opine
>> that at least some of these are easily expressible with a function
>> call API.  Perhaps more naturally than with string queries.  For
>> instance, set operations:
>> 
>> query1 = rdb_query(...)
>> query2 = rdb_query(...)
>> 
>> final_query = rdb_union(query1,query2)
>> 
>> or
>> 
>> final_query = query1&  query2
> 
> But, see, that's not actually what's going on behind the scenes in the 
> database. Unless your "query1" and "query2" objects are opaque 
> pseudo-objects which do not actually represent results -- the query 
> planners do a *lot* of stuff by looking at the whole query and 
> computing just how to go about executing all of the instructions.

I believe that *is* his point: that we can replace the SQL language 
with a "query object model" that lets us specify what we want without 
resorting to string-whacking when our needs are dynamic, without 
changing the rest of the workflow. This is obviously true: each RDBMS 
does something very much like what Carl is proposing, internally. 
However, implementing such an API usefully (never mind comfortably) in 
a cross-language way is... difficult, and an RDBMS that can only be 
used from Python (or even from Python and other Smalltalk-like 
languages) is not terribly useful at all.

-o

0
Reply Owen 6/28/2010 4:25:59 AM

On Jun 27, 8:33=A0pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> Carl Banks <pavlovevide...@gmail.com> writes:
> > I'm disappointed, usually when you sit on your reinforced soapbox and
> > pretense the air of infinite expertise you at least use reasonable
> > logic.
>
> Kindly stop inventing straw men to attack; I deny the position you're
> painting for me.

No, this is not a straw man, you are 100% percent guilty of circular
logic as I accused you of.

Plus, I will not kindly do anything for you unless you kindly stop
being condescending and self-righteous when answering questions and
start treating people with respect.  It's not just me, you do it to
newbies who have reasonable questions and you end up making them feel
like assholes just for asking.  You don't just act that way to newbies
that deserve it.  You are part of the reason people are here accusing
the Python community of being unfriendly and unhelpful.

And that's not a strawman, either.


Carl Banks
0
Reply Carl 6/28/2010 4:28:33 AM

On Jun 27, 9:02=A0pm, Stephen Hansen <me+list/pyt...@ixokai.io> wrote:
> On 6/27/10 8:48 PM, Carl Banks wrote:
>
> > I don't know the exact details of all of these, but I'm going to opine
> > that at least some of these are easily expressible with a function
> > call API. =A0Perhaps more naturally than with string queries. =A0For
> > instance, set operations:
>
> > query1 =3D rdb_query(...)
> > query2 =3D rdb_query(...)
>
> > final_query =3D rdb_union(query1,query2)
>
> > or
>
> > final_query =3D query1& =A0query2
>
> But, see, that's not actually what's going on behind the scenes in the
> database. Unless your "query1" and "query2" objects are opaque
> pseudo-objects which do not actually represent results

That's exactly what they are.  Nothing is actually sent to the
database until the user starts retrieving results.  This is fairly
common thing for some interfaces to do.

For instance, OpenGL almost always returns immediately after a command
is posted without doing anything.  The driver will queue the command
in memory until some event happens to trigger it (maybe a signal from
the graphics that is is done processing commands, or the queue being
full, or an explicit flush request from the user).

Incidentally, OpenGL has its own DSL for per-vertex and per-pixel
operations (known as vertex and fragment shaders) that replaces an
older binary API.  I daresay it's a little less at risk for an
injection attack, seeing that the shaders run on the GPU and only run
simple math operations.  But you never know.


Carl Banks
0
Reply Carl 6/28/2010 4:43:14 AM

On Sun, Jun 27, 2010 at 5:16 PM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <mailman.2184.1277626565.32709.python-list@python.org>, Kushal
> Kumaran wrote:
>
>> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
>> <ldo@geek-central.gen.new_zealand> wrote:
>>
>>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>>
>>>> I recently fixed a bug in some production code. =C2=A0The programmer w=
as
>>>> careful to use snprintf() to avoid buffer overflows. =C2=A0The only pr=
oblem
>>>> is, he wrote something along the lines of:
>>>>
>>>> snprintf(buf, strlen(foo), foo);
>>>
>>> A long while ago I came up with this macro:
>>>
>>> #define Descr(v) &v, sizeof v
>>>
>>> making the correct version of the above become
>>>
>>> snprintf(Descr(buf), foo);
>>
>> Not quite right. =C2=A0If buf is a char array, as suggested by the use o=
f
>> sizeof, then you're not passing a char* to snprintf.
>
> What am I passing, then?

Here's what gcc tells me (I declared buf as char buf[512]):
sprintf.c:8: warning: passing argument 1 of =E2=80=98snprintf=E2=80=99 from
incompatible pointer type
/usr/include/stdio.h:363: note: expected =E2=80=98char * __restrict__=E2=80=
=99 but
argument is of type =E2=80=98char (*)[512]=E2=80=99

You just need to lose the & from the macro.

--=20
regards,
kushal
0
Reply python2058 (92) 6/28/2010 4:47:56 AM

Carl Banks <pavlovevidence@gmail.com> writes:

> On Jun 27, 8:33 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> > Carl Banks <pavlovevide...@gmail.com> writes:
> > > I'm disappointed, usually when you sit on your reinforced soapbox and
> > > pretense the air of infinite expertise you at least use reasonable
> > > logic.
> >
> > Kindly stop inventing straw men to attack; I deny the position you're
> > painting for me.
>
> No, this is not a straw man, you are 100% percent guilty of circular
> logic as I accused you of.

The straw man you attacked is as I quoted above.

The claim of circular logic is a separate point, and I addressed it in
the rest of the message. Like you, I stripped the part of the message
that I was not responding to specifically.

> Plus, I will not kindly do anything for you unless you kindly stop
> being condescending and self-righteous when answering questions and
> start treating people with respect.

I always endeavour to treat people with respect, and I leave it to the
independent reader to decide how successful I am in that endeavour.

Respect for a person, though, entails subjecting that person's
statements to criticism where appropriate. Don't mistake exposure of
flaws for self-righteousness, nor criticism for condescension.

This isn't a forum for discussing my style, so I'll limit this message
to merely addressing these slurs.

-- 
 \           “The long-term solution to mountains of waste is not more |
  `\      landfill sites but fewer shopping centres.” —Clive Hamilton, |
_o__)                                                _Affluenza_, 2005 |
Ben Finney
0
Reply Ben 6/28/2010 4:53:27 AM

On Mon, Jun 28, 2010 at 2:00 AM, Jorgen Grahn <grahn+nntp@snipabacken.se> w=
rote:
> On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>
>>> I recently fixed a bug in some production code. =C2=A0The programmer wa=
s
>>> careful to use snprintf() to avoid buffer overflows. =C2=A0The only pro=
blem
>>> is, he wrote something along the lines of:
>>>
>>> snprintf(buf, strlen(foo), foo);
>>
>> A long while ago I came up with this macro:
>>
>> =C2=A0 =C2=A0 #define Descr(v) &v, sizeof v
>>
>> making the correct version of the above become
>>
>> =C2=A0 =C2=A0 snprintf(Descr(buf), foo);
>
> This is off-topic, but I believe snprintf() in C can *never* safely be
> the only thing you do to the buffer: you also have to NUL-terminate it
> manually in some corner cases. See the documentation.
>

snprintf goes to great lengths to be safe, in fact.  You might be
thinking of strncpy.

--=20
regards,
kushal
0
Reply python2058 (92) 6/28/2010 4:54:23 AM

On Jun 27, 9:54=A0pm, Kushal Kumaran <kushal.kumaran+pyt...@gmail.com>
wrote:
> On Mon, Jun 28, 2010 at 2:00 AM, Jorgen Grahn <grahn+n...@snipabacken.se>=
 wrote:
> > On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
> >> In message <roy-854954.20435125062...@news.panix.com>, Roy Smith wrote=
:
>
> >>> I recently fixed a bug in some production code. =A0The programmer was
> >>> careful to use snprintf() to avoid buffer overflows. =A0The only prob=
lem
> >>> is, he wrote something along the lines of:
>
> >>> snprintf(buf, strlen(foo), foo);
>
> >> A long while ago I came up with this macro:
>
> >> =A0 =A0 #define Descr(v) &v, sizeof v
>
> >> making the correct version of the above become
>
> >> =A0 =A0 snprintf(Descr(buf), foo);
>
> > This is off-topic, but I believe snprintf() in C can *never* safely be
> > the only thing you do to the buffer: you also have to NUL-terminate it
> > manually in some corner cases. See the documentation.
>
> snprintf goes to great lengths to be safe, in fact. =A0You might be
> thinking of strncpy.

Indeed, strncpy does not copy that final NUL if it's at or beyond the
nth element.  Probably the most mind-bogglingly stupid thing about the
standard C library, which has lots of mind-boggling stupidity.

Whenever I do an audit of someone's C code the first thing I do is
search for strncpy and see if they set the nth character to 0.  (They
usually didn't.)

Carl Banks
0
Reply Carl 6/28/2010 5:07:10 AM

On Mon, 2010-06-28, Kushal Kumaran wrote:
> On Mon, Jun 28, 2010 at 2:00 AM, Jorgen Grahn <grahn+nntp@snipabacken.se> wrote:
>> On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
>>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>>
>>>> I recently fixed a bug in some production code. �The programmer was
>>>> careful to use snprintf() to avoid buffer overflows. �The only problem
>>>> is, he wrote something along the lines of:
>>>>
>>>> snprintf(buf, strlen(foo), foo);
>>>
>>> A long while ago I came up with this macro:
>>>
>>> � � #define Descr(v) &v, sizeof v
>>>
>>> making the correct version of the above become
>>>
>>> � � snprintf(Descr(buf), foo);
>>
>> This is off-topic, but I believe snprintf() in C can *never* safely be
>> the only thing you do to the buffer: you also have to NUL-terminate it
>> manually in some corner cases. See the documentation.
>
> snprintf goes to great lengths to be safe, in fact.  You might be
> thinking of strncpy.

Yes, it was indeed strncpy I was thinking of. Thanks.

But actually, the snprintf(3) man page I have is not 100% clear on
this issue, so last time I used it, I added a manual NUL-termination
plus a comment saying I wasn't sure it was needed.  I normally use C++
or Python, so I am a bit rusty on these things.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/28/2010 7:58:34 AM

Carl Banks wrote:

> Indeed, strncpy does not copy that final NUL if it's at or beyond the
> nth element.  Probably the most mind-bogglingly stupid thing about the
> standard C library, which has lots of mind-boggling stupidity.

I don't think it was as stupid as that back when C was
designed. Every byte of memory was precious in those days,
and if you had, say, 10 bytes allocated for a string, you
wanted to be able to use all 10 of them for useful data.

So the convention was that a NUL byte was used to mark
the end of the string *if it didn't fill all the available
space*. Functions such as strncpy and snprintf are designed
for use with strings that follow this convention. Proper
usage requires being cognizant of the maximum length and
using appropriate length-limited functions for all operations
on such strings.

-- 
Greg
0
Reply Gregory 6/28/2010 9:44:34 AM

Gregory Ewing <greg.ewing@canterbury.ac.nz> writes:
> I don't think it was as stupid as that back when C was
> designed. Every byte of memory was precious in those days,
> and if you had, say, 10 bytes allocated for a string, you
> wanted to be able to use all 10 of them for useful data.

No I don't think so.  Traditional C strings simply didn't carry length
info except for the nul byte at the end.  Most string functions expected
the nul to be there.  The nul byte convention (instead of having a
header word with a length) arguably saved some space both by eliminating
a multi-byte header and by allowing trailing substrings to be
represented as pointers into a larger string.  In retrospect it seems
like a big error.
0
Reply Paul 6/28/2010 9:48:34 AM

On Sun, 27 Jun 2010 21:02:57 -0700, Stephen Hansen
<me+list/python@ixokai.io> declaimed the following in
gmane.comp.python.general:


> (This is an area where parametrized queries is even more important: but 
> I'm not sure if MySQL does proper prepared queries and caching of 
> execution plans).

	MySQL version 5 finally added prepared statements and a discrete
parameter passing mechanism...

	However, since there likely are many MySQL v4.x installations out
there, which only work with complete string SQL, MySQLdb still formats
full SQL statements (and it uses the Python % string interpolation to do
that, after converting/escaping parameters -- which is why %s is the
only allowed placeholder; even a numeric parameter has been converted to
a quoted string before being inserted in the SQL).

	It would be nice if MySQLdb could become version aware in a future
release, and use prepared statements on v5 engines... I doubt it can
drop the existing string based queries any time soon... Consider the
arguments about how long Python 2.x will be in use (I'm still on 2.5)...
Imagine the sluggishness in having database engines converted
(especially in a shared provider environment, where the language
specific adapters also need updating -- ODBC drivers, etc.)
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

0
Reply Dennis 6/28/2010 10:07:29 AM

On Sun, 27 Jun 2010 21:12:30 -0700 (PDT), Carl Banks
<pavlovevidence@gmail.com> declaimed the following in
gmane.comp.python.general:

> I'm not talking about SQL, I'm talking about RDBs.  But I guess it is
> important for serious RDBs to support queries complex enough that a
> language like SQL is really needed to express it--even if being called
> from an expressive language like Python.  Not everything is a simple
> inner joins.  I defer to the community then, as my knowledge of
> advanced SQL is minimal.
>
	SQL is almost a hybrid of relational algebra and relational
calculus, though typically considered more in the latter category (the
simplistic definition of the two is the one specifies /how to/ obtain a
result in RA, whereas in RC one specifies /what/ the result should look
like and let the engine figure out how to generate it.

"Select field, field, ..., field from ..." is algebra "project"
operation... In RA you'd have to specify the steps...

x1 = join(t1, t2)
x2 = restrict(x1, t1.fld1 = t2.fld3)
result = select(x2, field, ..., field)

SQL:
select field, ..., field, from t1, t2 where t1.fld1 = t2.fld3

(implicit join, just as the algebra is a full cross product)

	The classical example of RC is IBM's QBE (query by example) -- which
drew single record tables on the screen, and one filled in a result
table with references to fields in the sources, and included (somehow)
the join criteria...

	Somewhere in storage I should have a 400 page text on relational
database theory, which covers relational algebra and calculus, but
predates SQL.
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

0
Reply Dennis 6/28/2010 10:07:29 AM

On Sun, 27 Jun 2010 21:49:11 -0400, Owen Jacobson
<angrybaldguy@gmail.com> declaimed the following in
gmane.comp.python.general:


> 4. MySQL AB finally get off their collective duffs and adds real 
> parameter separation to the MySQL wire protocol, and implements real 
> prepared statements to massive speed gains in scenarios that are 

	They did with version 5 of MySQL... Also added triggers and stored
procedures as I recall (though possibly limited functionality). But
MySQLdb is still compatible with versions 3.x and 4.x (with some
difficulties in the connection string password handling). IT is what is
not version aware and uses the old established "complete SQL string"
query system.

	Does MySQL AB still exist? I thought Sun absorbed MySQL, and Oracle
has absorbed Sun...
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

0
Reply wlfraed (4456) 6/28/2010 10:07:29 AM

On Sun, 27 Jun 2010 19:35:05 -0700 (PDT), Carl Banks
<pavlovevidence@gmail.com> declaimed the following in
gmane.comp.python.general:


> 
> Also, I was asking about databases.  "SQL is a text language" is not
> the answer to the question "Why do RDBs use string commands instead of
> binary APIs"?
>
	Try this: Why do RDBMs use SQL?

	Prior to SQL (and relational databases) become common, one had to
learn an interface that was specific to each database engine (and had
quite different look&feel if the underlying engine was hierarchical or
DBTG network [relational was mostly a theoretical view for manipulating
databases stored under hierarchical or network engines]). If one was
lucky, there was even an interactive query language processor.

	Coding for something like a DBTG network database did not allow for
easy changes in queries... What would be a simple join in SQL was
traversing a circular linked list in the DBTG database my college
taught. EG: loop get next "master" record; loop get next sub-record
[etc. until all needed data retrieved] until back to master; until back
to top of database. 

	SQL started as an interactive query language, meant to typed by
(knowledgeable) users at a command prompt. But since it melded with
relational databases so well it became a de facto standard query
language not only for interactive queries but as a common semi-portable
API for embedding into code -- no DBMS specific procedural function
library needed, just one interface to send a query, and one to retrieve
result records.

(Ever notice the cyclic history -- 50s lots of mixed flat files, 60s
hierarchical databases [in which some master record type has links to
related records -- but the data is stored in a tree so finding data fast
really needed careful database design to avoid having to traverse too
much of the tree; imagine needing to read department information records
to access personnel records to access promotion/pay-raise records to
find the current pay rate to produce the weekly paycheck for an
employee; and if the employee changes department you have to move all
their personnel data [promotion history, etc] from one link to another;
or duplicate the personnel record saving an "end date" in the first
department record and an effective start date on the new department
copy], 70s with network [easier to traverse as each record type could
link to any other record type -- doing payroll did not require reading
department records], 80s relational wherein nothing is linked via
pointers but only by logical comparisons of fields [and so easily
implemented as sets of flat files again <G>])

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

0
Reply wlfraed (4456) 6/28/2010 10:07:29 AM

In message <mailman.2231.1277700501.32709.python-list@python.org>, Kushal 
Kumaran wrote:

> On Sun, Jun 27, 2010 at 5:16 PM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>>In message <mailman.2184.1277626565.32709.python-list@python.org>, Kushal
>> Kumaran wrote:
>>
>>> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
>>> <ldo@geek-central.gen.new_zealand> wrote:
>>>
>>>> A long while ago I came up with this macro:
>>>>
>>>> #define Descr(v) &v, sizeof v
>>>>
>>>> making the correct version of the above become
>>>>
>>>> snprintf(Descr(buf), foo);
>>>
>>> Not quite right.  If buf is a char array, as suggested by the use of
>>> sizeof, then you're not passing a char* to snprintf.
>>
>> What am I passing, then?
> 
> Here's what gcc tells me (I declared buf as char buf[512]):
> sprintf.c:8: warning: passing argument 1 of ‘snprintf’ from
> incompatible pointer type
> /usr/include/stdio.h:363: note: expected ‘char * __restrict__’ but
> argument is of type ‘char (*)[512]’
> 
> You just need to lose the & from the macro.

Why does this work, then:

ldo@theon:hack> cat test.c
#include <stdio.h>

int main(int argc, char ** argv)
  {
    char buf[512];
    const int a = 2, b = 3;
    snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
    fprintf(stdout, buf);
    return
        0;
  } /*main*/
ldo@theon:hack> ./test
2 + 3 = 5

0
Reply Lawrence 6/29/2010 12:26:43 AM

In message
<14e44c9c-04d9-452d-b544-498adfaf7d40@d8g2000yqf.googlegroups.com>, Carl 
Banks wrote:

> Seriously, almost every other kind of library uses a binary API. What
> makes databases so special that they need a string-command based API?

HTML is also effectively a string-based API. And what about regular 
expressions? And all the functionality available through the subprocess 
module and its predecessors?

The reality is, embedding one language within another is a fact of life. I 
think it’s important for programmers to be able to deal correctly with it.
0
Reply Lawrence 6/29/2010 12:30:36 AM

In message <pan.2010.06.27.13.55.04.500000@nowhere.com>, Nobody wrote:

> On Sun, 27 Jun 2010 14:36:10 +1200, Lawrence D'Oliveiro wrote:
> 
>> Except nobody has yet shown an alternative which is easier to get right.
> 
> For SQL, use stored procedures or prepared statements.

So feel free to rewrite my example using either stored procedures or 
prepared statements, to prove how much easier it is.
0
Reply Lawrence 6/29/2010 12:32:19 AM

In article <7xmxuffpxp.fsf@ruckus.brouhaha.com>,
 Paul Rubin <no.email@nospam.invalid> wrote:

> Gregory Ewing <greg.ewing@canterbury.ac.nz> writes:
> > I don't think it was as stupid as that back when C was
> > designed. Every byte of memory was precious in those days,
> > and if you had, say, 10 bytes allocated for a string, you
> > wanted to be able to use all 10 of them for useful data.
> 
> No I don't think so.  Traditional C strings simply didn't carry length
> info except for the nul byte at the end.  Most string functions expected
> the nul to be there.  The nul byte convention (instead of having a
> header word with a length) arguably saved some space both by eliminating
> a multi-byte header and by allowing trailing substrings to be
> represented as pointers into a larger string.  In retrospect it seems
> like a big error.

Null-terminated strings predate C.  Various assembler languages had 
ASCIIZ (or similar) directives long before that.

The nice thing about null-terminated strings is how portable they have 
been over various word lengths.  Life would have been truly inconvenient 
if K&R had picked, say, a 16-bit length field, and then we needed to 
bump that up to 32 bits in the 80's, and again to 64 bits in the 90's.
0
Reply Roy 6/29/2010 12:55:53 AM

On Mon, 28 Jun 2010 20:55:53 -0400, Roy Smith wrote:

> The nice thing about null-terminated strings is how portable they have
> been over various word lengths.  Life would have been truly inconvenient
> if K&R had picked, say, a 16-bit length field, and then we needed to
> bump that up to 32 bits in the 80's, and again to 64 bits in the 90's.

Or a Pascal 8 bit length field.

However the cost of null-terminated strings is that they can't store 
binary data, and worse, they're slow. In fact, according to some, null-
terminated strings are the *worst* way to implement a string type.

http://www.joelonsoftware.com/articles/fog0000000319.html



-- 
Steven
0
Reply Steven 6/29/2010 2:07:50 AM

On Mon, 28 Jun 2010 03:07:29 -0700, Dennis Lee Bieber wrote:
> 	Coding for something like a DBTG network database did not allow for
> easy changes in queries... What would be a simple join in SQL was
> traversing a circular linked list in the DBTG database my college
> taught. EG: loop get next "master" record; loop get next sub-record
> [etc. until all needed data retrieved] until back to master; until back
> to top of database. 

We'll also note that most of these you'd have to map out where each
field in a record was by hand, any time you wanted to open the file.
Often several times, because there would be multiple record layouts per
file.

-- 
67. No matter how many shorts we have in the system, my guards will be 
    instructed to treat every surveillance camera malfunction as a 
    full-scale emergency.
	--Peter Anspach's list of things to do as an Evil Overlord
0
Reply Peter 6/29/2010 3:25:02 AM

On Tue, Jun 29, 2010 at 5:56 AM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <mailman.2231.1277700501.32709.python-list@python.org>, Kushal
> Kumaran wrote:
>
>> On Sun, Jun 27, 2010 at 5:16 PM, Lawrence D'Oliveiro
>> <ldo@geek-central.gen.new_zealand> wrote:
>>
>>>In message <mailman.2184.1277626565.32709.python-list@python.org>, Kusha=
l
>>> Kumaran wrote:
>>>
>>>> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
>>>> <ldo@geek-central.gen.new_zealand> wrote:
>>>>
>>>>> A long while ago I came up with this macro:
>>>>>
>>>>> #define Descr(v) &v, sizeof v
>>>>>
>>>>> making the correct version of the above become
>>>>>
>>>>> snprintf(Descr(buf), foo);
>>>>
>>>> Not quite right. =C2=A0If buf is a char array, as suggested by the use=
 of
>>>> sizeof, then you're not passing a char* to snprintf.
>>>
>>> What am I passing, then?
>>
>> Here's what gcc tells me (I declared buf as char buf[512]):
>> sprintf.c:8: warning: passing argument 1 of =E2=80=98snprintf=E2=80=99 f=
rom
>> incompatible pointer type
>> /usr/include/stdio.h:363: note: expected =E2=80=98char * __restrict__=E2=
=80=99 but
>> argument is of type =E2=80=98char (*)[512]=E2=80=99
>>
>> You just need to lose the & from the macro.
>
> Why does this work, then:
>
> ldo@theon:hack> cat test.c
> #include <stdio.h>
>
> int main(int argc, char ** argv)
> =C2=A0{
> =C2=A0 =C2=A0char buf[512];
> =C2=A0 =C2=A0const int a =3D 2, b =3D 3;
> =C2=A0 =C2=A0snprintf(&buf, sizeof buf, "%d + %d =3D %d\n", a, b, a + b);
> =C2=A0 =C2=A0fprintf(stdout, buf);
> =C2=A0 =C2=A0return
> =C2=A0 =C2=A0 =C2=A0 =C2=A00;
> =C2=A0} /*main*/
> ldo@theon:hack> ./test
> 2 + 3 =3D 5
>

By accident.  I hope your compiler warned you about your snprintf call.

Reading these threads might help you understand how char* and char
(*)[512] are different:

http://groups.google.com/group/comp.lang.c++/browse_thread/thread/24708a920=
4061ce/848ceaf5ec774d81

http://groups.google.com/group/comp.lang.c.moderated/browse_thread/thread/f=
e264c550947a2e5/32b330cdf8aba3d6

--=20
regards,
kushal
0
Reply python2058 (92) 6/29/2010 4:19:13 AM

On Tue, 29 Jun 2010 03:25:02 GMT, "Peter H. Coffin"
<hellsop@ninehells.com> declaimed the following in
gmane.comp.python.general:


> We'll also note that most of these you'd have to map out where each
> field in a record was by hand, any time you wanted to open the file.
> Often several times, because there would be multiple record layouts per
> file.

	Ah yes -- you did have to know the entire record structure
beforehand to create the "image" for processing...

	And the database engine on the Xerox Sigma running CP/V really got
nasty -- you had to preallocate the expected disk space for the entire
database ahead of time. The engine used was CP/V* called a "random" file
-- you asked for a CONTIGUOUS chunk of disk space, and the OS maintained
NO information about the contents (not even an equivalent of EOF).


* CP/V had some interesting features: file types of "consecutive",
"keyed", and "random". As mentioned, "random" also implied contiguous
disk space allocation; "consecutive" and "keyed" could be disjoint disk
sectors. "Consecutive" is closest to the UNIX "stream"; start from the
beginning and just read... "Keyed" were ISAM files -- and were the most
common file type! The line editor (mid-70s here, editor was line
oriented) used "line numbers" as ISAM keys, so even source code was
being stored in an ISAM file (and the FORTRAN direct access I/O  "record
number", as in
				read(unit, rec=#) buffer
was not the more common record_length * (#-1) offset; it was an ISAM
key!)

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

0
Reply Dennis 6/29/2010 7:14:32 AM

On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:

>> Seriously, almost every other kind of library uses a binary API. What
>> makes databases so special that they need a string-command based API?
> 
> HTML is also effectively a string-based API.

HTML is a data format. The sane way to construct or manipulate HTML is via
the DOM, not string operations.

> And what about regular expressions?

What about them? As the saying goes:

	Some people, when confronted with a problem, think
	"I know, I'll use regular expressions."
	Now they have two problems.

They have some uses, e.g. defining tokens[1]. Using them to match more
complex constructs is error-prone and should generally be avoided unless
you're going to manually verify the result. Oh, and you should never
generate regexps dynamically; that way madness lies.

[1] Assuming that the language's tokens can be described by a regular
grammar. This isn't always the case, e.g. you can't tokenise PostScript
using regexps, as string literals can contain nested parentheses.

> And all the functionality available through the subprocess 
> module and its predecessors?

The main reason why everyone recommends subprocess over its predecessors
is that it allows you to bypass the shell, which is one of the most
common sources of the type of error being discussed in this thread.

IOW, rather than having to construct a shell command which (hopefully)
will pass the desired arguments to the child, you just pass the desired
arguments to the child directly, without involving the shell.

> The reality is, embedding one language within another is a fact of life. I 
> think it’s important for programmers to be able to deal correctly with it.

That depends upon what you mean by "embedding". The correct way to use
code written in one language from code written in another is to make the
first accept parameters and make the second pass them, not to have the
second (try to) generate the former dynamically.

Sometimes dynamic code generation is inevitable (e.g. if you're writing a
compiler, you probably need to generate assembler or C code), but it's not
to be done lightly, and it's unwise to take shortcuts (e.g. ad-hoc string
substitutions).

0
Reply Nobody 6/29/2010 9:35:43 AM

Owen Jacobson <angrybaldguy@gmail.com> wrote:

> However, not every programming language has 
> the kind of structural flexibility to do that well: a library similar 
> to SQLalchemy would be incredibly clunky (if it worked at all) in, 
say, 
> Java or C#, and it'd be nearly impossible to pull off in C.

I guess you've never used LINQ in C# then?

Microsoft did a pretty impressive job with LINQ: they provided a set of 
methods that may be used to query SQL databases and the same methods 
also work on any other sequence-like types. They also produced a DSL 
that compiles into the LINQ method calls which means that those who 
prefer SQL syntax can use it to process non-SQL data.

A LINQ expression produces a generator that allows you to iterate over 
the result set (and you can re-use the generator so that if it depends 
on the values of other variables or attributes each time you iterate you 
get a different set of results).

When you use LINQ on a SQL database internally it generates the correct 
SQL to produce the result set on the SQL server, when you use it on an 
array or other such sequence it uses generic functions compiled for the 
appropriate data types. In order to be able to do this they changed the 
language to allow expressions to compile either to executable code or to 
a parse tree. For example:

var participants =
    Competition.GetParticipants()
    .Where(participant=> participant.Score > 80)
    .OrderByDescending(participant => participant.Score)
    .Select(participant => new { participant.Id,
                                 Name=participant.Name });

If this is operating on a database table the Where method is overloaded 
to accept a parse tree as its argument and it can then use that to 
generate SQL, but for .Net objects the Where method simply uses the 
lambda expression as a callable delegate.

(example cribbed from 
http://geekswithblogs.net/shahed/archive/2008/01/28/118992.aspx)

-- 
Duncan Booth http://kupuguy.blogspot.com
0
Reply Duncan 6/29/2010 10:32:49 AM

Nobody <nobody@nowhere.com> wrote:

> > And what about regular expressions?
> 
> What about them? As the saying goes:
> 
> 	Some people, when confronted with a problem, think
> 	"I know, I'll use regular expressions."
> 	Now they have two problems.

That's silly.  RE is a good tool.  Like all good tools, it is the right 
tool for some jobs and the wrong tool for others.

I've noticed over the years a significant anti-RE sentiment in the 
Python community.  One reason, I suppose, is because Python gives you 
some good string manipulation tools, i.e. split(), startswith(), 
endswith(), and the 'in' operator, which cover many of the common RE use 
cases.  But there are still plenty of times when a RE is the best tool 
and it's worth investing the effort to learn how to use them effectively.

One tool that Python gives you which makes RE a pleasure is raw strings.  
Getting rid of all those extra backslashes really helps improve 
readability.

Another great feature is VERBOSE.  I've written some truly complicated 
REs using that, and still been able to figure out what they meant the 
next day :-)
0
Reply Roy 6/29/2010 12:41:03 PM

On 6/29/10 5:41 AM, Roy Smith wrote:
> Nobody<nobody@nowhere.com>  wrote:
>
>>> And what about regular expressions?
>>
>> What about them? As the saying goes:
>>
>> 	Some people, when confronted with a problem, think
>> 	"I know, I'll use regular expressions."
>> 	Now they have two problems.
>
> That's silly.  RE is a good tool.  Like all good tools, it is the right
> tool for some jobs and the wrong tool for others.

There's nothing silly about it.

It is an exaggeration though: but it does represent a good thing to keep 
in mind.

Yes, re is a tool -- and a useful one at that. But its also a tool which 
/seems/ like an omnitool capable of tackling everything.

Regular expressions are a complicated mini-language well suited towards 
extensive use in a unix type environment where you want to embed certain 
logic of 'what to operate on' into many different commands that aren't 
languages at all -- and perl embraced it to make it perl's answer to 
text problems. Which is fine.

In Python, certainly it has its uses: many of them in fact, and in many 
it really is the best solution.

Its not just that its the right tool for some jobs and the wrong tool 
for others, or that -- as you said also -- that Python provides a rather 
rich string type which can do many common tasks natively and better, but 
that regular expressions live in the front of the mind for so many 
people coming to the language that its the first thing they even think 
of, and what should be simple becomes difficult.

So people quote that proverb. Its a good proverb. As all proverbs, its 
not perfectly applicable to all situations. But it does has an important 
lesson to it: you should generally not consider re to be the solution 
you're looking for until you are quite sure there's nothing else to 
solve the same task.

It obviously applies less to the guru's who know all about regular 
expressions and their subtleties including potential pathological behavior.

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

0
Reply Stephen 6/29/2010 2:11:48 PM

On 29/06/2010 01:55, Roy Smith wrote:

[snips]

> The nice thing about null-terminated strings is how portable they have
> been over various word lengths.

The bad thing about null-terminated strings is the number of off-by-one 
errors they've helped to create.  I obviously have never created an 
off-by-one error myself. :)

Kindest regards.

Mark Lawrence.


0
Reply Mark 6/29/2010 2:31:30 PM

In message <mailman.2332.1277785175.32709.python-list@python.org>, Kushal 
Kumaran wrote:

> On Tue, Jun 29, 2010 at 5:56 AM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>> Why does this work, then:
>>
>> ldo@theon:hack> cat test.c
>> #include <stdio.h>
>>
>> int main(int argc, char ** argv)
>>  {
>>    char buf[512];
>>    const int a = 2, b = 3;
>>    snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>    fprintf(stdout, buf);
>>    return
>>        0;
>>  } /*main*/
>> ldo@theon:hack> ./test
>> 2 + 3 = 5
> 
> By accident.

I have yet to find an architecture or C compiler where it DOESN’T work.

Feel free to try and prove me wrong.
0
Reply Lawrence 6/30/2010 12:25:11 AM

In message <slrni2f8v2.j19.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn 
wrote:

> On Sat, 2010-06-26, Lawrence D'Oliveiro wrote:
>
>> In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
>> wrote:
>>
>>> I thought it was well-known that the solution is *not* to try to
>>> sanitize the input -- it's to switch to an interface which doesn't
>>> involve generating an intermediate executable.  In the Python example,
>>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>>
>> That’s what I mean. Why do people consider input sanitization so hard?
> 
> I'm not sure you understood me correctly, because I advocate
> *not* doing input sanitization. Hard or not -- I don't want to know,
> because I don't want to do it.

But no-one has yet managed to come up with an alternative that involves less 
work.
0
Reply Lawrence 6/30/2010 12:26:11 AM

On Jun 28, 3:07=A0am, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> On Sun, 27 Jun 2010 21:02:57 -0700, Stephen Hansen
> <me+list/pyt...@ixokai.io> declaimed the following in
> gmane.comp.python.general:
>
> > (This is an area where parametrized queries is even more important: but
> > I'm not sure if MySQL does proper prepared queries and caching of
> > execution plans).
>
> =A0 =A0 =A0 =A0 MySQL version 5 finally added prepared statements and a d=
iscrete
> parameter passing mechanism...
>
> =A0 =A0 =A0 =A0 However, since there likely are many MySQL v4.x installat=
ions out
> there, which only work with complete string SQL, MySQLdb still formats
> full SQL statements (and it uses the Python % string interpolation to do
> that, after converting/escaping parameters -- which is why %s is the
> only allowed placeholder; even a numeric parameter has been converted to
> a quoted string before being inserted in the SQL).
>
> =A0 =A0 =A0 =A0 It would be nice if MySQLdb could become version aware in=
 a future
> release, and use prepared statements on v5 engines... I doubt it can
> drop the existing string based queries any time soon... Consider the
> arguments about how long Python 2.x will be in use (I'm still on 2.5)...
> Imagine the sluggishness in having database engines converted
> (especially in a shared provider environment, where the language
> specific adapters also need updating -- ODBC drivers, etc.)

Thanks, your replies to this subthread have been most enlightening.

Carl Banks
0
Reply pavlovevidence (1338) 6/30/2010 3:24:36 AM

On 06/29/2010 06:25 PM, Lawrence D'Oliveiro wrote:
> I have yet to find an architecture or C compiler where it DOESN’T work.
> 
> Feel free to try and prove me wrong.

Okay, I will. Your code passes a char** when a char* is expected.  Every
compiler I know of will give you a *warning*.  Mistaking char*, char**,
and char[] is a common mistake that almost every C program makes in the
beginning.  Now for the proof:

Consider this variation where I use a dynamically allocated buffer
instead of static:

#include <stdio.h>

int main(int argc, char ** argv)
{
	char *buf = malloc(512 * sizeof(char));
	const int a = 2, b = 3;
	snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
	fprintf(stdout, buf);
	free(buf);
	return 0;
} /*main*/

On my machine, an immediate segfault (stack overrun).  Your code only
works because your buf is statically allocated, which means &buf==buf.
But this equivalance does not hold for any other situation.  If your
buffer was dynamically allocated on the heap, instead of passing a
pointer to the buffer (which *is* what buf itself is), you are passing a
pointer to the pointer, which is where buf is stored on the stack, but
not the buffer itself.  Instant stack corruption.
0
Reply Michael 6/30/2010 4:05:17 AM

On 06/29/2010 06:26 PM, Lawrence D'Oliveiro wrote:
>> I'm not sure you understood me correctly, because I advocate
>> *not* doing input sanitization. Hard or not -- I don't want to know,
>> because I don't want to do it.
> 
> But no-one has yet managed to come up with an alternative that involves less 
> work.

Your case is still not persuasive.

How is using the DB API's placeholders and parameterization more work?
It's the same amount of keystrokes, perhaps even less.  You would just
be substituting the API's parameter placeholders for Python's.  In fact
with Psycopg2 and the mysql python db apis, it's almost a matter of
simply removing the "%" and putting in a comma, turning python's string
substitution into a method call.  And you can leave out the quotes
around where the variables go.  If I have to sanitize every input, I
have to do it on each and every field on each and every form action.
With the DB API doing the work I just do it once, in one place.  Is this
not easier that manually escaping everything and then embedding it in
the query string?

I've not used sqlalchemy, but it looks similarly easy.

0
Reply Michael 6/30/2010 4:11:16 AM

On 06/29/2010 10:05 PM, Michael Torrie wrote:
> #include <stdio.h>
> 
> int main(int argc, char ** argv)
> {
> 	char *buf = malloc(512 * sizeof(char));
> 	const int a = 2, b = 3;
> 	snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
                       ^^^^^^^^^^
Make that 512*sizeof(buf)

Still segfaults though.

> 	fprintf(stdout, buf);
> 	free(buf);
> 	return 0;
> } /*main*/
0
Reply Michael 6/30/2010 4:17:17 AM

On 06/29/2010 10:17 PM, Michael Torrie wrote:
> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>> #include <stdio.h>
>>
>> int main(int argc, char ** argv)
>> {
>> 	char *buf = malloc(512 * sizeof(char));
>> 	const int a = 2, b = 3;
>> 	snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>                        ^^^^^^^^^^
> Make that 512*sizeof(buf)

Sigh.  Try again.  How about "512 * sizeof(char)" ?  Still doesn't make
a different.  The code still crashes because the &buf is incorrect.

Another reason python programming is just so much funner and easier!

This little diversion is fun though.  C is pretty powerful and I enjoy
it, but it sure keeps one on one's toes.  I made a similar mistake to
the &buf thing years ago when I thought I could return strings (char *)
from functions on the stack the way Pascal and BASIC could.  It was only
by pure luck that my code worked as the part of the stack being accessed
was invalid and could have been overwritten.

>> 	fprintf(stdout, buf);
>> 	free(buf);
>> 	return 0;
>> } /*main*/

0
Reply Michael 6/30/2010 4:28:25 AM

On Jun 28, 2:44=A0am, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote:
> Carl Banks wrote:
> > Indeed, strncpy does not copy that final NUL if it's at or beyond the
> > nth element. =A0Probably the most mind-bogglingly stupid thing about th=
e
> > standard C library, which has lots of mind-boggling stupidity.
>
> I don't think it was as stupid as that back when C was
> designed. Every byte of memory was precious in those days,
> and if you had, say, 10 bytes allocated for a string, you
> wanted to be able to use all 10 of them for useful data.
>
> So the convention was that a NUL byte was used to mark
> the end of the string *if it didn't fill all the available
> space*.

I can't think of any function in the standard library that observes
that convention, which inclines me to disbelieve this convention ever
really existed.  If it did, there would be functions to support it.

For that matter, I'm not really inclined to believe bytes were *that*
precious in those days.

> Functions such as strncpy and snprintf are designed
> for use with strings that follow this convention. Proper
> usage requires being cognizant of the maximum length and
> using appropriate length-limited functions for all operations
> on such strings.

Well, no.  Being cognizant of the string's maximum length doesn't make
you able to pass it to printf, or system, or any other C function.

The obvious rationale behind strncpy's stupid behavior is that it's
not a string function at all, but a memory block function, that stops
at a NUL in case you don't care what's after the NUL in a block.  But
it leads you to believe it's a string function by it's name.


Carl Banks
0
Reply Carl 6/30/2010 4:49:20 AM

On Wed, 2010-06-30, Michael Torrie wrote:
> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>> #include <stdio.h>
>>>
>>> int main(int argc, char ** argv)
>>> {
>>> 	char *buf = malloc(512 * sizeof(char));
>>> 	const int a = 2, b = 3;
>>> 	snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>                        ^^^^^^^^^^
>> Make that 512*sizeof(buf)
>
> Sigh.  Try again.  How about "512 * sizeof(char)" ?  Still doesn't make
> a different.  The code still crashes because the &buf is incorrect.

I haven't tried to understand the rest ... but never write
'sizeof(char)' unless you might change the type later. 'sizeof(char)'
is by definition 1 -- even on odd-ball architectures where a char is
e.g. 16 bits.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/30/2010 9:00:17 AM

On Wed, 2010-06-30, Carl Banks wrote:
> On Jun 28, 2:44�am, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote:
>> Carl Banks wrote:
>> > Indeed, strncpy does not copy that final NUL if it's at or beyond the
>> > nth element. �Probably the most mind-bogglingly stupid thing about the
>> > standard C library, which has lots of mind-boggling stupidity.
>>
>> I don't think it was as stupid as that back when C was
>> designed. Every byte of memory was precious in those days,
>> and if you had, say, 10 bytes allocated for a string, you
>> wanted to be able to use all 10 of them for useful data.
>>
>> So the convention was that a NUL byte was used to mark
>> the end of the string *if it didn't fill all the available
>> space*.
>
> I can't think of any function in the standard library that observes
> that convention,

Me neither, except strncpy(), according to above.

> which inclines me to disbelieve this convention ever
> really existed.  If it did, there would be functions to support it.

Maybe others existed, but got killed off early. That would make
strncpy() a living fossil, like the Coelacanth ...

> For that matter, I'm not really inclined to believe bytes were *that*
> precious in those days.

It's somewhat believable. If I handled thousands of student names in a
big C array char[30][], I would resent the fact that 1/30 of the
memory was wasted on NUL bytes.  I'm sure plenty of people have done what
Gregory suggests ... but it's not clear that strncpy() was designed to
support those people.

I suppose it's all lost in history.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/30/2010 9:18:28 AM

On Tue, 29 Jun 2010 08:41:03 -0400, Roy Smith wrote:

>> > And what about regular expressions?
>> 
>> What about them? As the saying goes:
>> 
>> 	Some people, when confronted with a problem, think
>> 	"I know, I'll use regular expressions."
>> 	Now they have two problems.
> 
> That's silly.  RE is a good tool.  Like all good tools, it is the right 
> tool for some jobs and the wrong tool for others.

"When all you have is a hammer, everything looks like a nail" ;)

Except, REs are more like a turbocharged angle grinder: bloody
dangerous in the hands of a novice.

[I was going to say "hole hawg", but then realised that most of my post
would be a quotation explaining it. The reference is to Neal Stephenson's
essay "In the Beginning was the Command Line":
<http://www.cryptonomicon.com/beginning.html>]

> I've noticed over the years a significant anti-RE sentiment in the 
> Python community.  

IMHO, the sentiment isn't so much against REs per se, but against
excessive or inappropriate use. Apart from making it easy to write
illegible code, they also make it easy to write code that "mostly sort-of
works" but somewhat harder to write code which is actually correct.

It doesn't help that questions on REs often start out by stating a problem
for which REs are inappropriate, e.g. parsing a context-free (or higher)
language, and in the same sentence indicate the the poster is already
predisposed to using REs.

0
Reply Nobody 6/30/2010 12:22:15 PM

On 06/30/2010 03:00 AM, Jorgen Grahn wrote:
> On Wed, 2010-06-30, Michael Torrie wrote:
>> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>>> #include <stdio.h>
>>>>
>>>> int main(int argc, char ** argv)
>>>> {
>>>> 	char *buf = malloc(512 * sizeof(char));
>>>> 	const int a = 2, b = 3;
>>>> 	snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>>                        ^^^^^^^^^^
>>> Make that 512*sizeof(buf)
>>
>> Sigh.  Try again.  How about "512 * sizeof(char)" ?  Still doesn't make
>> a different.  The code still crashes because the &buf is incorrect.
> 
> I haven't tried to understand the rest ... but never write
> 'sizeof(char)' unless you might change the type later. 'sizeof(char)'
> is by definition 1 -- even on odd-ball architectures where a char is
> e.g. 16 bits.

You're right.  I normally don't use sizeof(char).  This is obviously a
contrived example; I just wanted to make the example such that there's
no way the original poster could argue that the crash is caused by
something other than &buf.

Then again, it's always a bad idea in C to make assumptions about
anything.  If you're on Windows and want to use the unicode versions of
everything, you'd need to do sizeof().  So using it here would remind
you that when you move to the 16-bit Microsoft unicode versions of
snprintf need to change the sizeof(char) lines as well to sizeof(wchar_t).
0
Reply Michael 6/30/2010 2:02:47 PM

On Tue, 2010-06-29, Stephen Hansen wrote:
> On 6/29/10 5:41 AM, Roy Smith wrote:
>> Nobody<nobody@nowhere.com>  wrote:
>>
>>>> And what about regular expressions?
>>>
>>> What about them? As the saying goes:
>>>
>>> 	Some people, when confronted with a problem, think
>>> 	"I know, I'll use regular expressions."
>>> 	Now they have two problems.
>>
>> That's silly.  RE is a good tool.  Like all good tools, it is the right
>> tool for some jobs and the wrong tool for others.
>
> There's nothing silly about it.
>
> It is an exaggeration though: but it does represent a good thing to keep 
> in mind.

Not an exaggeration: it's an absolute. It literally says that any time
you try to solve a problem with a regex, (A) it won't solve the problem
and (B) it will in itself become a problem.  And it doesn't tell you
why: you're supposed to accept or reject this without thinking.

How can that be a good thing to keep in mind?

I wouldn't normally be annoyed by the quote, but it is thrown around a
lot in various places, not just here.

> Yes, re is a tool -- and a useful one at that. But its also a tool which 
> /seems/ like an omnitool capable of tackling everything.

That's more like my attitude towards them.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 6/30/2010 2:14:38 PM

On 6/30/10 7:14 AM, Jorgen Grahn wrote:
> On Tue, 2010-06-29, Stephen Hansen wrote:
>> On 6/29/10 5:41 AM, Roy Smith wrote:
>>> Nobody<nobody@nowhere.com>   wrote:
>>>
>>>>> And what about regular expressions?
>>>>
>>>> What about them? As the saying goes:
>>>>
>>>> 	Some people, when confronted with a problem, think
>>>> 	"I know, I'll use regular expressions."
>>>> 	Now they have two problems.
>>>
>>> That's silly.  RE is a good tool.  Like all good tools, it is the right
>>> tool for some jobs and the wrong tool for others.
>>
>> There's nothing silly about it.
>>
>> It is an exaggeration though: but it does represent a good thing to keep
>> in mind.
>
> Not an exaggeration: it's an absolute. It literally says that any time
> you try to solve a problem with a regex, (A) it won't solve the problem
> and (B) it will in itself become a problem.  And it doesn't tell you
> why: you're supposed to accept or reject this without thinking.
>
> How can that be a good thing to keep in mind?

That it speaks in absolutes is what makes it an exaggeration. Yes, it 
literally says something kind of like that (Your 'a' is a 
mischaracterization).

It's still a very good thing to keep in mind.

Its a "saying" -- a proverb, an expression. Since when are the wise 
remarks of our ancient forefathers literal? Not last I checked.

Reading into a saying as not a guide or suggestion or cautionary tale 
but instead a doctrinal absolute is where we run into problems, not in 
the repeating of them.

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

0
Reply Stephen 6/30/2010 3:23:24 PM

On 6/30/2010 8:22 AM, Nobody wrote:

>> I've noticed over the years a significant anti-RE sentiment in the
>> Python community.
>
> IMHO, the sentiment isn't so much against REs per se, but against
> excessive or inappropriate use. Apart from making it easy to write
> illegible code, they also make it easy to write code that "mostly sort-of
> works" but somewhat harder to write code which is actually correct.
>
> It doesn't help that questions on REs often start out by stating a problem
> for which REs are inappropriate, e.g. parsing a context-free (or higher)
> language, and in the same sentence indicate the the poster is already
> predisposed to using REs.

They also often start with a problem that is 'sub-relational-grammar' 
and easily solved with string methods, and again the OP proposes to use 
the overkill of REs. In other words, people ask "How do I do this with 
an RE" rather than "What tool should I use for this, and how".

If people asked "How do I push a pin into a corkboard with a (standard) 
hammer" or "How do I break up a concrete sidewalk with a (standard) 
hammer), it would not be 'anti-hammer sentiment' to suggest another 
tool, like pliers or a jackhammer.

-- 
Terry Jan Reedy

0
Reply Terry 6/30/2010 4:56:48 PM

Terry Reedy wrote:
> On 6/30/2010 8:22 AM, Nobody wrote:
> 
>>> I've noticed over the years a significant anti-RE sentiment in the
>>> Python community.
>>
>> IMHO, the sentiment isn't so much against REs per se, but against
>> excessive or inappropriate use. Apart from making it easy to write
>> illegible code, they also make it easy to write code that "mostly sort-of
>> works" but somewhat harder to write code which is actually correct.
>>
>> It doesn't help that questions on REs often start out by stating a 
>> problem
>> for which REs are inappropriate, e.g. parsing a context-free (or higher)
>> language, and in the same sentence indicate the the poster is already
>> predisposed to using REs.
> 
> They also often start with a problem that is 'sub-relational-grammar' 
> and easily solved with string methods, and again the OP proposes to use 
> the overkill of REs. In other words, people ask "How do I do this with 
> an RE" rather than "What tool should I use for this, and how".
> 
> If people asked "How do I push a pin into a corkboard with a (standard) 
> hammer" or "How do I break up a concrete sidewalk with a (standard) 
> hammer), it would not be 'anti-hammer sentiment' to suggest another 
> tool, like pliers or a jackhammer.

I took the time to learn REs about a year ago.  It was well worth it, 
even though I've only used REs a handful of times since, because when 
you need them there is no good substitute.  But when you don't, there 
are plenty.  ;)

~Ethan~

0
Reply Ethan 6/30/2010 5:38:11 PM

Jorgen Grahn <grahn+nntp@snipabacken.se> writes:
> It's somewhat believable. If I handled thousands of student names in a
> big C array char[30][], I would resent the fact that 1/30 of the
> memory was wasted on NUL bytes. 

But you'd be wasting even more of the memory on bytes left unused when
the student's name is less than 30 chars.  If memory is that scarce you
need a different representation.
0
Reply Paul 6/30/2010 7:17:40 PM

On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:

> On Tue, 2010-06-29, Stephen Hansen wrote:
>> On 6/29/10 5:41 AM, Roy Smith wrote:
>>> Nobody<nobody@nowhere.com>  wrote:
>>>
>>>>> And what about regular expressions?
>>>>
>>>> What about them? As the saying goes:
>>>>
>>>> 	Some people, when confronted with a problem, think "I know, I'll 
>>>>    use regular expressions." Now they have two problems.
>>>
>>> That's silly.  RE is a good tool.  Like all good tools, it is the
>>> right tool for some jobs and the wrong tool for others.
>>
>> There's nothing silly about it.
>>
>> It is an exaggeration though: but it does represent a good thing to
>> keep in mind.
> 
> Not an exaggeration: it's an absolute. It literally says that any time
> you try to solve a problem with a regex, (A) it won't solve the problem
> and (B) it will in itself become a problem.  And it doesn't tell you
> why: you're supposed to accept or reject this without thinking.

It's a *two sentence* summary, not a reasoned and nuanced essay on the 
pros and cons for REs.

Sheesh, I can just imagine you as a child, arguing with your teacher on 
being told not to run with scissors -- "but teacher, there may be 
circumstances where running with scissors is the right thing to do, you 
are guilty of over-simplifying a complex topic into a single simplified 
sound-byte, instead of providing a detailed, rich heuristic for analysing 
each and every situation in full before making the decision whether or 
not to run with scissors".

If you look at the quote carefully, instead of making a knee-jerk 
reaction, you will see that it is *literally* correct. Given some 
problem, having decided to solve it with a regex, you DO have two 
problems:

(1) Merely making the decision "use REs" doesn't actually solve the 
original problem, any more than "use a hammer" solves the problem of "how 
do I build a table?". You've decided on an approach and a tool, but your 
original problem still applies.

(2) AND you now have the additional problem of dealing with regular 
expressions, which are notoriously hard to write, harder to debug, 
difficult to maintain, often slow, incapable of solving certain common 
problems (such as parsing nested parentheses).

So it might be a short, simplified quip, but it *is* literally correct.



> How can that be a good thing to keep in mind?

Because many people consider REs to be some sort of panacea for solving 
every text-based problem, and it's a good thing to open their eyes.



-- 
Steven
0
Reply steve9679 (1985) 6/30/2010 8:30:38 PM

In message <mailman.2369.1277870727.32709.python-list@python.org>, Michael 
Torrie wrote:

> Okay, I will. Your code passes a char** when a char* is expected.

No it doesn’t.

> Consider this variation where I use a dynamically allocated buffer
> instead of static:

And so you misunderstand the difference between a C array and a pointer.
0
Reply Lawrence 7/1/2010 12:36:45 AM

On 06/30/2010 06:36 PM, Lawrence D'Oliveiro wrote:
> In message <mailman.2369.1277870727.32709.python-list@python.org>,
> Michael Torrie wrote:
> 
>> Okay, I will. Your code passes a char** when a char* is expected.
> 
> No it doesn’t.

You're right; it doesn't.  Your code passes char (*)[512].

warning: passing argument 1 of ‘snprintf’ from incompatible pointer type
/usr/include/stdio.h:385: note: expected ‘char * __restrict__’ but
argument is of type ‘char (*)[512]’

> And so you misunderstand the difference between a C array and a
> pointer.

You make a pretty big assumption.

Given "char buf[512]", buf's type is char * according to the compiler
and every C textbook I know of.  With a static char array, there's no
need to take it's address since it *is* the address of the first
element.  Taking the address can lead to problems if you ever substitute
a dynamically-allocated buffer for the statically-allocated one.  For
one-dimensional arrays at least, static arrays and pointers are
interchangeable when calling snprinf.  You do not agree?

Anyway, this is far enough away from Python.
0
Reply Michael 7/1/2010 5:40:06 AM

On Wed, 2010-06-30, Steven D'Aprano wrote:
> On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
>
>> On Tue, 2010-06-29, Stephen Hansen wrote:
>>> On 6/29/10 5:41 AM, Roy Smith wrote:
>>>> Nobody<nobody@nowhere.com>  wrote:
>>>>
>>>>>> And what about regular expressions?
>>>>>
>>>>> What about them? As the saying goes:
>>>>>
>>>>> 	Some people, when confronted with a problem, think "I know, I'll 
>>>>>    use regular expressions." Now they have two problems.
>>>>
>>>> That's silly.  RE is a good tool.  Like all good tools, it is the
>>>> right tool for some jobs and the wrong tool for others.
>>>
>>> There's nothing silly about it.
>>>
>>> It is an exaggeration though: but it does represent a good thing to
>>> keep in mind.
>> 
>> Not an exaggeration: it's an absolute. It literally says that any time
>> you try to solve a problem with a regex, (A) it won't solve the problem
>> and (B) it will in itself become a problem.  And it doesn't tell you
>> why: you're supposed to accept or reject this without thinking.
>
> It's a *two sentence* summary, not a reasoned and nuanced essay on the 
> pros and cons for REs.

Well, perhaps you cannot say anything useful about REs in general in
two sentences, and should use either more words, or not say anything
at all.

The way it was used in the quoted text above is one example of what I
mean. (Unless other details have been trimmed -- I can't check right
now.) If he meant to say "REs aren't really a good solution for this
kind of problem, even though they look tempting", then he should have
said that.

> Sheesh, I can just imagine you as a child, arguing with your teacher on 
> being told not to run with scissors -- "but teacher, there may be 
> circumstances where running with scissors is the right thing to do, you 
> are guilty of over-simplifying a complex topic into a single simplified 
> sound-byte, instead of providing a detailed, rich heuristic for analysing 
> each and every situation in full before making the decision whether or 
> not to run with scissors".

When I was a child I expected that kind of argumentation from adults.
I expect something more as an adult.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 7/1/2010 6:58:38 AM

On Wed, 2010-06-30, Michael Torrie wrote:
> On 06/30/2010 03:00 AM, Jorgen Grahn wrote:
>> On Wed, 2010-06-30, Michael Torrie wrote:
>>> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>>>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>>>> #include <stdio.h>
>>>>>
>>>>> int main(int argc, char ** argv)
>>>>> {
>>>>> 	char *buf = malloc(512 * sizeof(char));
>>>>> 	const int a = 2, b = 3;
>>>>> 	snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>>>                        ^^^^^^^^^^
>>>> Make that 512*sizeof(buf)
>>>
>>> Sigh.  Try again.  How about "512 * sizeof(char)" ?  Still doesn't make
>>> a different.  The code still crashes because the &buf is incorrect.
>> 
>> I haven't tried to understand the rest ... but never write
>> 'sizeof(char)' unless you might change the type later. 'sizeof(char)'
>> is by definition 1 -- even on odd-ball architectures where a char is
>> e.g. 16 bits.
>
> You're right.  I normally don't use sizeof(char).  This is obviously a
> contrived example; I just wanted to make the example such that there's
> no way the original poster could argue that the crash is caused by
> something other than &buf.
>
> Then again, it's always a bad idea in C to make assumptions about
> anything.

There are some things you cannot assume, others which few fellow
programmers can care to memorize, and others which you often can get
away with (like assuming an int is more than 16 bits, when your code
is tied to a modern Unix anyway).

But sizeof(char) is always 1.

> If you're on Windows and want to use the unicode versions of
> everything, you'd need to do sizeof().  So using it here would remind
> you that when you move to the 16-bit Microsoft unicode versions of
> snprintf need to change the sizeof(char) lines as well to sizeof(wchar_t).

Yes -- see "unless you might change the type later" above.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .
0
Reply Jorgen 7/1/2010 7:09:57 AM

On 6/30/10 11:58 PM, Jorgen Grahn wrote:
> On Wed, 2010-06-30, Steven D'Aprano wrote:
>> On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
>>> On Tue, 2010-06-29, Stephen Hansen wrote:
>>>>
>>>> There's nothing silly about it.
>>>>
>>>> It is an exaggeration though: but it does represent a good thing to
>>>> keep in mind.
>>>
>>> Not an exaggeration: it's an absolute. It literally says that any time
>>> you try to solve a problem with a regex, (A) it won't solve the problem
>>> and (B) it will in itself become a problem.  And it doesn't tell you
>>> why: you're supposed to accept or reject this without thinking.
>>
>> It's a *two sentence* summary, not a reasoned and nuanced essay on the
>> pros and cons for REs.
>
> Well, perhaps you cannot say anything useful about REs in general in
> two sentences, and should use either more words, or not say anything
> at all.
>
> The way it was used in the quoted text above is one example of what I
> mean. (Unless other details have been trimmed -- I can't check right
> now.) If he meant to say "REs aren't really a good solution for this
> kind of problem, even though they look tempting", then he should have
> said that.

The way it is used above (Even with more stripping) is exactly where it 
is legitimate.

Regular expressions are a powerful tool.

The use of a powerful tool when a simple tool is available that achieves 
the same end is inappropriate, because power *always* has a cost.

The entire point of the quote is that when you look at a problem, you 
should *begin* from the position that a complex, powerful tool is not 
what you need to solve it.

You should always begin from a position that a simple tool will suffice 
to do what you need.

The quote does not deny the power of regular expressions; it challenges 
widely held assumption and belief that comes from *somewhere* that they 
are the best way to approach any problem that is text related.

Does it come off as negative towards regular expressions? Certainly. But 
not because of any fault of re's on their own, but because there is this 
widespread perception that they are the swiss army knife that can solve 
any problem by just flicking out the right little blade.

Its about redefining perception.

Regular expressions are not the go-to solution for anything to do with 
text. Regular expressions are the tool you reach for when nothing else 
will work.

Its not your first step; its your last (or, at least, one that happens 
way later then most people come around expecting it to be).

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

0
Reply python3307 (206) 7/1/2010 7:19:09 AM

On Wed, 30 Jun 2010 23:40:06 -0600, Michael Torrie wrote:

> Given "char buf[512]", buf's type is char * according to the compiler
> and every C textbook I know of.

No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
use "buf" as an rvalue (rather than an lvalue), it will be implicitly
converted to char*.

If you take its address, you'll get a "pointer to array of 512 chars",
i.e. a pointer to the array rather than to the first element. Converting
this to a char* will yield a pointer to the first element.

If buf was declared "char *buf", then taking its address will yield a
char**, and converting this to a char* will produce a pointer to the first
byte of the pointer, which is unlikely to be useful.

0
Reply Nobody 7/1/2010 7:24:06 AM

Stephen Hansen wrote:
> On 6/30/10 11:58 PM, Jorgen Grahn wrote:
>> On Wed, 2010-06-30, Steven D'Aprano wrote:
>>> On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
>>>> On Tue, 2010-06-29, Stephen Hansen wrote:
>>>>>
>>>>> There's nothing silly about it.
>>>>>
>>>>> It is an exaggeration though: but it does represent a good thing to
>>>>> keep in mind.
>>>>
>>>> Not an exaggeration: it's an absolute. It literally says that any time
>>>> you try to solve a problem with a regex, (A) it won't solve the 
>>>> problem
>>>> and (B) it will in itself become a problem.  And it doesn't tell you
>>>> why: you're supposed to accept or reject this without thinking.
>>>
>>> It's a *two sentence* summary, not a reasoned and nuanced essay on the
>>> pros and cons for REs.
>>
>> Well, perhaps you cannot say anything useful about REs in general in
>> two sentences, and should use either more words, or not say anything
>> at all.
>>
>> The way it was used in the quoted text above is one example of what I
>> mean. (Unless other details have been trimmed -- I can't check right
>> now.) If he meant to say "REs aren't really a good solution for this
>> kind of problem, even though they look tempting", then he should have
>> said that.
>
> The way it is used above (Even with more stripping) is exactly where 
> it is legitimate.
>
> Regular expressions are a powerful tool.
>
> The use of a powerful tool when a simple tool is available that 
> achieves the same end is inappropriate, because power *always* has a 
> cost.
>
> The entire point of the quote is that when you look at a problem, you 
> should *begin* from the position that a complex, powerful tool is not 
> what you need to solve it.
>
> You should always begin from a position that a simple tool will 
> suffice to do what you need.
>
> The quote does not deny the power of regular expressions; it 
> challenges widely held assumption and belief that comes from 
> *somewhere* that they are the best way to approach any problem that is 
> text related.
>
> Does it come off as negative towards regular expressions? Certainly. 
> But not because of any fault of re's on their own, but because there 
> is this widespread perception that they are the swiss army knife that 
> can solve any problem by just flicking out the right little blade.
>
> Its about redefining perception.
>
> Regular expressions are not the go-to solution for anything to do with 
> text. Regular expressions are the tool you reach for when nothing else 
> will work.
>
> Its not your first step; its your last (or, at least, one that happens 
> way later then most people come around expecting it to be).
>

Guys, this dogmatic discussion already took place in this list. Why 
start again ?
Re is part of the python standard library, for some purpose I guess.

JM




0
Reply jeanmichel (477) 7/1/2010 10:03:24 AM

Stephen Hansen <me+list/python@ixokai.io> wrote:

> The quote does not deny the power of regular expressions; it challenges 
> widely held assumption and belief that comes from *somewhere* that they 
> are the best way to approach any problem that is text related.

Well, that assumption comes from historical unix usage where traditional 
tools like awk, sed, ed, and grep, made heavy use of regex, and 
therefore people learned to become proficient at them and use them all 
the time.  Somewhat later, the next generation of tools such as vi and 
perl continued that tradition.  Given the tools that were available at 
the time, regex was indeed likely to be the best tool available for most 
text-related problems.

Keep in mind that in the early days, people were working on hard-copy 
terminals [[http://en.wikipedia.org/wiki/ASR-33]] so economy of 
expression was a significant selling point for regexes.

Not trying to further this somewhat silly debate, just adding a bit of 
historical viewpoint to answer the implicit question you ask as to where 
the assumption came from.
0
Reply Roy 7/1/2010 12:11:03 PM

On 7/1/10 3:03 AM, Jean-Michel Pichavant wrote:
> Re is part of the python standard library, for some purpose I guess.

No, *really*?

So all those people who have been advocating its useless and shouldn't 
be are already too late?

Damn.

Well, there goes *that* whole crusade we were all out on. Since we can't 
destroy re, maybe we can go club baby seals.

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

0
Reply Stephen 7/1/2010 2:18:53 PM

On 7/1/10 5:11 AM, Roy Smith wrote:
> Stephen Hansen<me+list/python@ixokai.io>  wrote:
>
>> The quote does not deny the power of regular expressions; it challenges
>> widely held assumption and belief that comes from *somewhere* that they
>> are the best way to approach any problem that is text related.
>
> Well, that assumption comes from historical unix usage where traditional
> tools like awk, sed, ed, and grep, made heavy use of regex, and
> therefore people learned to become proficient at them and use them all
> the time.

Oh, I'm fully aware of the history of re's -- but its not those old hats 
and even their students and the unix geeks I'm talking about.

It's the newbies and people wandering into the language with absolutely 
no idea about the history of unix, shell scripting and such, who so 
often arrive with the idea firmly planted in their head, that I wonder 
at. Sure, there's going to be a certain amount of cross-polination from 
unix-geeks to students-of-students-of-students-of unix geeks to spread 
the idea, but it seems more pervasive for that. I just picture a 
re-vangelist camping out in high schools and colleges selling the party 
line or something :)

-- 

    ... Stephen Hansen
    ... Also: Ixokai
    ... Mail: me+list/python (AT) ixokai (DOT) io
    ... Blog: http://meh.ixokai.io/

P.S. And no, unix geeks is not a pejorative term.
0
Reply Stephen 7/1/2010 2:27:23 PM

Nobody wrote:
> On Wed, 30 Jun 2010 23:40:06 -0600, Michael Torrie wrote:
>> Given "char buf[512]", buf's type is char * according to the compiler
>> and every C textbook I know of.

References from Kernighan & Ritchie _The C Programming Language_ second 
edition:

> No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
> use "buf" as an rvalue (rather than an lvalue), it will be implicitly
> converted to char*.

K&R2 A7.1

> If you take its address, you'll get a "pointer to array of 512 chars",
> i.e. a pointer to the array rather than to the first element. Converting
> this to a char* will yield a pointer to the first element.

K&R2 A7.4.2


��������Mel.

0
Reply Mel 7/1/2010 3:36:46 PM

On 07/01/2010 01:24 AM, Nobody wrote:
> No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
> use "buf" as an rvalue (rather than an lvalue), it will be implicitly
> converted to char*.

Yes this is true.  I misstated.  I meant that most text books I've seen
say to just use the variable in an *rvalue* as a pointer (can't think of
any lvalue use of an array).

K&R states that arrays (in C anyway) are always *passed* by pointer,
hence when you pass an array to a function it automatically decays into
a pointer.  Which is what you said.  So no need for & and the compiler
warning you get with it.  That's all.

If the OP was striving for pedantic correctness, he would use &buf[0].
0
Reply Michael 7/1/2010 4:10:16 PM

On 7/1/2010 8:36 AM, Mel wrote:
> Nobody wrote:
>> On Wed, 30 Jun 2010 23:40:06 -0600, Michael Torrie wrote:
>>> Given "char buf[512]", buf's type is char * according to the compiler
>>> and every C textbook I know of.
>
> References from Kernighan&  Ritchie _The C Programming Language_ second
> edition:
>
>> No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
>> use "buf" as an rvalue (rather than an lvalue), it will be implicitly
>> converted to char*.

    Yes, unfortunately.  The approach to arrays in C is just broken,
for historical reasons.  To understand C, you have to realize that
in the early versions, function declarations weren't visible when 
function calls were compiled.  That came later, in ANSI C. So
parameter passing in C is very dumb.  Billions of crashes due
to buffer overflows later, we're still suffering from that mistake.

    But this isn't a Python issue.

					John Nagle
0
Reply John 7/1/2010 5:17:02 PM

In message <mailman.2370.1277871088.32709.python-list@python.org>, Michael 
Torrie wrote:

> On 06/29/2010 06:26 PM, Lawrence D'Oliveiro wrote:
>>> I'm not sure you understood me correctly, because I advocate
>>> *not* doing input sanitization. Hard or not -- I don't want to know,
>>> because I don't want to do it.
>> 
>> But no-one has yet managed to come up with an alternative that involves
>> less work.
> 
> Your case is still not persuasive.

So persuade me. I have given an example of code written the way I do it. Now 
let’s see you rewrite it using your preferred technique, just to prove that 
your way is simpler and easier to understand.

Enough hand-waving, let’s see some code!
0
Reply Lawrence 7/1/2010 11:47:00 PM

In message <4c2ccd9c$0$1643$742ec2ed@news.sonic.net>, John Nagle wrote:

> The approach to arrays in C is just broken, for historical reasons.

Nevertheless, it it at least self-consistent. To return to my original 
macro:

    #define Descr(v) &v, sizeof v

As written, this works whatever the type of v: array, struct, whatever.

> So parameter passing in C is very dumb.

Nothing to do with the above issue.
0
Reply Lawrence 7/1/2010 11:50:59 PM

On Thursday 01 July 2010 16:50:59 Lawrence D'Oliveiro wrote:
> Nevertheless, it it at least self-consistent. To return to my original
> macro:
> 
>     #define Descr(v) &v, sizeof v
> 
> As written, this works whatever the type of v: array, struct, whatever.
> 

Doesn't seem to, sorry. Using Michael Torrie's code example, slightly 
modified...

[rami@tigris ~]$ cat example.c 
#include <stdio.h>

#define Descr(v) &v, sizeof v

int main(int argc, char ** argv)
{
        char *buf = malloc(512 * sizeof(char));
        const int a = 2, b = 3;
        snprintf(Descr(buf), "%d + %d = %d\n", a, b, a + b);
        fprintf(stdout, buf);
        free(buf);
        return 0;
} /*main*/

[rami@tigris ~]$ clang example.c 
example.c:11:18: warning: incompatible pointer types passing 'char **', expected 
'char *' [-pedantic]
        snprintf(Descr(buf), "%d + %d = %d\n", a, b, a + b);
                 ^~~~~~~~~~
example.c:4:18: note: instantiated from:                                                                                               
#define Descr(v) &v, sizeof v
                 ^~~~~~~~~~~~
    <<snip>>
[rami@tigris ~]$ ./a.out 
Segmentation fault


----
Rami Chowdhury
"Passion is inversely proportional to the amount of real information available."
-- Benford's Law of Controversy
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
0
Reply Rami 7/2/2010 3:17:55 AM

In message <mailman.136.1278040489.1673.python-list@python.org>, Rami 
Chowdhury wrote:

> On Thursday 01 July 2010 16:50:59 Lawrence D'Oliveiro wrote:
>
>> Nevertheless, it it at least self-consistent. To return to my original
>> macro:
>> 
>>     #define Descr(v) &v, sizeof v
>> 
>> As written, this works whatever the type of v: array, struct, whatever.
> 
> Doesn't seem to, sorry. Using Michael Torrie's code example, slightly
> modified...
> 
>         char *buf = malloc(512 * sizeof(char));

Again, you misunderstand the difference between a C array and a pointer. 
Study the following example, which does work, and you might grasp the point:

ldo@theon:hack> cat test.c
#include <stdio.h>

int main(int argc, char ** argv)
  {
    char buf[512];
    const int a = 2, b = 3;
    snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
    fprintf(stdout, buf);
    return
        0;
  } /*main*/
ldo@theon:hack> ./test
2 + 3 = 5

0
Reply Lawrence 7/3/2010 2:20:26 AM

On Friday 02 July 2010 19:20:26 Lawrence D'Oliveiro wrote:
> In message <mailman.136.1278040489.1673.python-list@python.org>, Rami
> Chowdhury wrote:
> > On Thursday 01 July 2010 16:50:59 Lawrence D'Oliveiro wrote:
> >> Nevertheless, it it at least self-consistent. To return to my original
> >> 
> >> macro:
> >>     #define Descr(v) &v, sizeof v
> >> 
> >> As written, this works whatever the type of v: array, struct, whatever.
> > 
> > Doesn't seem to, sorry. Using Michael Torrie's code example, slightly
> > modified...
> > 
> >         char *buf = malloc(512 * sizeof(char));
> 
> Again, you misunderstand the difference between a C array and a pointer.
> Study the following example, which does work, and you might grasp the
> point:
> 
> ldo@theon:hack> cat test.c
> #include <stdio.h>
> 
> int main(int argc, char ** argv)
>   {
>     char buf[512];
>     const int a = 2, b = 3;
>     snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>     fprintf(stdout, buf);
>     return
>         0;
>   } /*main*/
> ldo@theon:hack> ./test
> 2 + 3 = 5

I'm sorry, perhaps you've misunderstood what I was refuting. You posted:
> >> macro:
> >>     #define Descr(v) &v, sizeof v
> >> 
> >> As written, this works whatever the type of v: array, struct, whatever.

With my code example I found that, as others have pointed out, unfortunately it 
doesn't work if v is a pointer to a heap-allocated area. 

----
Rami Chowdhury
"A man with a watch knows what time it is. A man with two watches is never
sure". -- Segal's Law
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
0
Reply Rami 7/3/2010 3:07:24 AM

On Mon, Jun 28, 2010 at 6:44 PM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Carl Banks wrote:
>
>> Indeed, strncpy does not copy that final NUL if it's at or beyond the
>> nth element. =C2=A0Probably the most mind-bogglingly stupid thing about =
the
>> standard C library, which has lots of mind-boggling stupidity.
>
> I don't think it was as stupid as that back when C was
> designed

Actually, strncpy had a very specific use case when it was introduced
(dealing with limited-size entries in very old unix filesystem). It
should never be used for C string handling, and I don't think it is
fair to say it is stupid: it does exactly what it was designed for. It
just happens that most people don't know what it was designed for.

David
0
Reply David 7/3/2010 3:09:45 AM

In message <mailman.182.1278126257.1673.python-list@python.org>, Rami 
Chowdhury wrote:

> I'm sorry, perhaps you've misunderstood what I was refuting. You posted:
>> >> macro:
>> >>     #define Descr(v) &v, sizeof v
>> >> 
>> >> As written, this works whatever the type of v: array, struct,
>> >> whatever.
> 
> With my code example I found that, as others have pointed out,
> unfortunately it doesn't work if v is a pointer to a heap-allocated area.

It still correctly passes the address and size of that pointer variable. It 
that’s not what you intended, you shouldn’t use it.
0
Reply Lawrence 7/4/2010 1:23:21 AM

In message <mailman.2128.1277537954.32709.python-list@python.org>, Robert 
Kern wrote:

> On 2010-06-25 19:49 , Lawrence D'Oliveiro wrote:
>
>> Why do people consider input sanitization so hard?
> 
> It's not hard per se; it's just repetitive, prone to the occasional
> mistake, and, frankly, really boring.

But as a programmer, I’m not in the habit of doing “repetitive” and 
“boring”. Look at the example I posted, and you’ll see. It’s the ones trying 
to come up with alternatives to my code who produce things that look 
“reptitive” and “boring”.
0
Reply Lawrence 7/4/2010 1:28:20 AM

In message <pan.2010.06.29.09.35.18.594000@nowhere.com>, Nobody wrote:

> On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
> 
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>> 
>> HTML is also effectively a string-based API.
> 
> HTML is a data format. The sane way to construct or manipulate HTML is via
> the DOM, not string operations.

What is this “DOM” of which you speak? I looked here 
<http://docs.python.org/library/>, but can find nothing that sounds like 
that, that is relevant to HTML.

>> And what about regular expressions?
> 
> What about them? As the saying goes:
> 
> Some people, when confronted with a problem, think
> "I know, I'll use regular expressions."
> Now they have two problems.
> 
> They have some uses, e.g. defining tokens[1]. Using them to match more
> complex constructs is error-prone ...

What if they’re NOT more complex, but they can simply contain user-entered 
data?

>> And all the functionality available through the subprocess
>> module and its predecessors?
> 
> The main reason why everyone recommends subprocess over its predecessors
> is that it allows you to bypass the shell, which is one of the most
> common sources of the type of error being discussed in this thread.

How would you deal with this, then: I wrote a script called ExtractMac, to 
convert various old Macintosh-format documents accumulated over the years 
(stored in AppleDouble form by uploading to a Netatalk server) to more 
cross-platform formats. This has a table of conversion commands to use. For 
example, the entries for PICT and TEXT Macintosh file types look like this:

    "PICT" :
      {
        "type" : "image",
        "ext" : ".png",
        "act" : "convert %(src)s %(dst)s",
      },
    "TEXT" :
      {
        "type" : "text",
        "ext" : ".txt",
        "act" : "LineEndings unix <%(src)s >%(dst)s",
      },
 
The conversion code that uses this table looks like

    Cmd = \
      (
            Act.get("act", "cp -p %(src)s %(dst)s")
        %
          {
            "src" : ShellEscape(Src),
            "dst" : ShellEscape(DstFileName),
          }
      )
    sys.stderr.write("Doing: %s\n" % Cmd)
    Status = os.system(Cmd)

How much simpler would your alternative be? I don’t think it would be 
simpler at all.
0
Reply ldo (2144) 7/4/2010 2:33:44 AM

On Saturday 03 July 2010 19:33:44 Lawrence D'Oliveiro wrote:
> In message <pan.2010.06.29.09.35.18.594000@nowhere.com>, Nobody wrote:
> > On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
> >>> Seriously, almost every other kind of library uses a binary API. What
> >>> makes databases so special that they need a string-command based API?
> >>=20
> >> HTML is also effectively a string-based API.
> >=20
> > HTML is a data format. The sane way to construct or manipulate HTML is
> > via the DOM, not string operations.
>=20
> What is this =E2=80=9CDOM=E2=80=9D of which you speak? I looked here
> <http://docs.python.org/library/>, but can find nothing that sounds like
> that, that is relevant to HTML.
>=20

The Document Object Model - I don't think the standard library has an HTML =
DOM=20
module but there's certainly one for XML (and XHTML):=20
http://docs.python.org/library/xml.dom.html

=2D---
Rami Chowdhury
"Any sufficiently advanced incompetence is indistinguishable from malice."
=2D- Grey's Law
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
0
Reply rami.chowdhury (138) 7/4/2010 2:43:44 AM

Seeking industry expert candidates

I=92m Justin Smith, Director of Tech Recruiting at Express Seattle.  I
am currently seeking candidates to fill Tech Positions for multiple A-
List Clients:
=95	Quality Assurance Engineer,
=95	Senior Data Engineer, Search Experience
=95	Senior Software Development Engineer, UX / UI
=95	Software Dev Engineer
=95	Software Dev TEST Engineer
=95	Software Development Manager,
=95	Sr Applications Engineer =96 Strong Linux Systems Administrator
=95	SR Technical PM, -
=95	Web Designer/Developer =96 strong tech and art background
=95	Business Analyst,

Many of our Clients work within a Linux environment.  For greatest
impact, on your resume highlight relevant skills and technologies used
in an environment supported by Linux, languages that show you
understand and know object oriented development, have experience with
high volume sites that are notable and are continually learning new
skills.

Hot List that gets our attention =96 LAMP Stack Experience, Linux, Perl
and Java/JavaScript Experts that are current in the use and show
expertise.  Microsoft environment and dot net technologies are not
added attractors to many of our clients.

If you are interested in these roles, send me your resume, cover
letter highlighting noteworthy skills and projects with expected base
salary to justin.smith@expresspros.com and I can submit it ASAP.
Justin(dot)Smith(at)ExpressPros(dot)com   DO FEEL FREE TO REFER this
on to a friend or colleague with strong skills as well.

Qualifications:
- Computer Science degree or equivalent work experience (5+ years).
- Expert level fluency in at least one mainstream object-oriented
programming language (C++, Java, Ruby, Python).
- Proven coding skills in C++ and or Java on Unix/Linux platforms is a
must.
- Experience with MySQL or Oracle databases a plus.
- Linux or LAMP Stack experience preferred.
- Experience with HTML5, XML, XSD, WSDL, and SOAP and a history
working with web client software
- Experience with scalable distributed systems is a positive.

Added value attractors if the qualifications are available:
+ Experience with the iPhone SDK and Objective-C. =96 published app that
is stable, engaging
+ Experience with the BlackBerry SDK and/or J2ME. =96 published app that
is stable, engaging
+ Experience with the Android SDK. =96 published app that is stable,
engaging

If you are interested in these roles, send me your resume, cover
letter highlighting noteworthy skills and projects with expected base
salary to justin.smith@expresspros.com and I can submit it ASAP.
Justin(dot)Smith(at)ExpressPros(dot)com DO FEEL FREE TO REFER this on
to a friend or colleague with strong skills as well.


On Jul 1, 7:18=A0am, Stephen Hansen <me+list/pyt...@ixokai.io> wrote:
> On 7/1/10 3:03 AM, Jean-Michel Pichavant wrote:
>
> > Re is part of the python standard library, for some purpose I guess.
>
> No, *really*?
>
> So all those people who have been advocating its useless and shouldn't
> be are already too late?
>
> Damn.
>
> Well, there goes *that* whole crusade we were all out on. Since we can't
> destroy re, maybe we can go club baby seals.
>
> --
>
> =A0 =A0 ... Stephen Hansen
> =A0 =A0 ... Also: Ixokai
> =A0 =A0 ... Mail: me+list/python (AT) ixokai (DOT) io
> =A0 =A0 ... Blog:http://meh.ixokai.io/

0
Reply justin2009smith (2) 7/26/2010 11:19:59 PM

Justin Smith <justin2009smith@gmail.com> writes:

> Seeking industry expert candidates

Please don't reply in an existing thread with an unrelated message. If
you want to start a new discussion, compose a new message, not a reply.

For job advertisements, please don't use this forum at all; instead use
the Python Jobs Board <URL:http://www.python.org/community/jobs/>.

-- 
 \      “We are stuck with technology when what we really want is just |
  `\                                 stuff that works.” —Douglas Adams |
_o__)                                                                  |
Ben Finney
0
Reply python6 (872) 7/27/2010 3:26:46 AM

On 7/26/2010 4:19 PM, Justin Smith wrote:
> Seeking industry expert candidates
>
> I�m Justin Smith, Director of Tech Recruiting at Express Seattle.  I
> am currently seeking candidates to fill Tech Positions for multiple A-
> List Clients:

    Spammer detected.
    Injection-Info: r27g2000yqb.googlegroups.com;
	posting-host=63.170.35.94;
	posting-account=XlBkJgkAAAC7JNUw8ZEYCvz12vv6mGCK
    Reverse DNS: "franchisevpn.expresspersonnel.com"
    Site analysis: Domain "www.expresspersonnel.com"
	redirected to different domain "www.expresspros.com"
    Site analysis:
	From Secure certificate (Secure certificate, high confidence)
	Express Personnel Services, Inc.
	Oklahoma City, OK
	UNITED STATES
    Oklahoma corporation search:
	EXPRESS SERVICES, INC.
	Filing Number: 2400436307
	Name Type: Legal Name
	Status: In Existence
	Corp type: Foreign For Profit Business Corporation
	Jurisdiction: COLORADO
	Formation Date: 28 Aug 1985
     Colorado corporation search:
	ID: 19871524232
	Name: 	EXPRESS SERVICES, INC.
	Principal Street Address: 8516 NW Expressway,
		Oklahoma City, OK 73162, United States
     Target coordinates:
	35.56973,-97.668001
     Corporate class: Franchiser
0
Reply nagle (1023) 7/27/2010 6:40:16 PM

John Nagle <nagle@animats.com> writes:

> On 7/26/2010 4:19 PM, Justin Smith wrote:
>> Seeking industry expert candidates
>>
>> I’m Justin Smith, Director of Tech Recruiting at Express Seattle.  I
>> am currently seeking candidates to fill Tech Positions for multiple A-
>> List Clients:
>
>    Spammer detected.

But did you report it? (If so, it helps if you state so).


>    Injection-Info: r27g2000yqb.googlegroups.com;
> 	posting-host=63.170.35.94;

http://www.spamcop.net/sc?track=63.170.35.94 -> looks like abuse goes to
the spammer... A whois gives sprint.net, so you could contact abuse at
sprint.net (see: http://whois.domaintools.com/63.170.35.94 )

[snip address etc.]
Spammers don't care about that. Best course of action, based on my
experience, is to contact abuse at googlegroups.com (now and then it
actually works), and sprint.net.

-- 
John Bokma                                                               j3b

Hacking & Hiking in Mexico -  http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
0
Reply john167 (400) 7/27/2010 8:52:05 PM

126 Replies
198 Views

(page loaded in 0.967 seconds)

Similiar Articles:




7/23/2012 9:30:56 AM


Reply: