Just been reading this article
<http://www.theregister.co.uk/2010/06/23/xxs_sql_injection_attacks_testing_remedy/>
which says that a lot of security holes are arising these days because
everybody is concentrating on unit testing of their own particular
components, with less attention being devoted to overall integration
testing.
Fair enough. But it’s disconcerting to see some of the advice being offered
in the reader comments, like “force everyone to use stored procedures”, or
“force everyone to use prepared/parametrized statements”, “never construct
ad-hoc SQL queries” and the like.
I construct ad-hoc queries all the time. It really isn’t that hard to do
safely. All you have to do is read the documentation—for example,
<http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>—and then write a
routine that takes arbitrary data and turns it into a valid string literal,
like this <http://www.codecodex.com/wiki/Useful_MySQL_Routines#Quoting>.
I’ve done this sort of thing for MySQL, for HTML and JavaScript (in both
Python and JavaScript itself), and for Bash. It’s not hard to verify you’ve
done it correctly. It lets you easily create table-updating code like the
following, which makes it so easy to update the code to track changes in the
database structure:
sql.cursor.execute \
(
"update items set "
+
", ".join
(
tuple
(
"%(name)s = %(value)s"
%
{
"name" : field[0],
"value" : SQLString(Params.getvalue
(
"%s[%s]" % (field[1], urllib.quote(modify_id))
))
}
for field in
(
("class_name", "modify_class"),
("make", "modify_make"),
("model", "modify_model"),
("details", "modify_details"),
("serial_nr", "modify_serial"),
("inventory_nr", "modify_invent"),
("when_purchased", "modify_when_purchased"),
... you get the idea ...
("location_name", "modify_location"),
("comment", "modify_comment"),
)
)
+
(
"last_modified = %d" % int(time.time()),
)
)
+
" where inventory_nr = %s" % SQLString(modify_id)
)
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/25/2010 12:25:56 AM |
|
In article <i00t2k$l07$1@lust.ihug.co.nz>,
Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> wrote:
> I construct ad-hoc queries all the time. It really isn’t that hard to do
> safely. All you have to do is read the documentation
I get worried when people talk about how easy it is to do something
safely. Let me suggest a couple of things you might not have considered:
1) Somebody is running your application (or the database server) with
the locale set to something unexpected. This might change how numbers,
dates, currency, etc, get formatted, which could change the meaning of
your constructed SQL statement.
2) Somebody runs your application with a different PYTHONPATH, which
causes a different (i.e. malicious) urllib module to get loaded, which
makes urllib.quote() do something you didn't expect.
> I’ve done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash. It’s not hard to verify you’ve
> done it correctly. It lets you easily create table-updating code like the
> following, which makes it so easy to update the code to track changes in the
> database structure:
>
> sql.cursor.execute \
> (
> "update items set "
> +
> ", ".join
> (
> tuple
> (
> "%(name)s = %(value)s"
> %
> {
> "name" : field[0],
> "value" : SQLString(Params.getvalue
> (
> "%s[%s]" % (field[1],
> urllib.quote(modify_id))
> ))
> }
> for field in
> (
> ("class_name", "modify_class"),
> ("make", "modify_make"),
> ("model", "modify_model"),
> ("details", "modify_details"),
> ("serial_nr", "modify_serial"),
> ("inventory_nr", "modify_invent"),
> ("when_purchased", "modify_when_purchased"),
> ... you get the idea ...
> ("location_name", "modify_location"),
> ("comment", "modify_comment"),
> )
> )
> +
> (
> "last_modified = %d" % int(time.time()),
> )
> )
> +
> " where inventory_nr = %s" % SQLString(modify_id)
> )
|
|
0
|
|
|
|
Reply
|
Roy
|
6/25/2010 1:02:48 AM
|
|
On 2010-06-24 21:02:48 -0400, Roy Smith said:
> In article <i00t2k$l07$1@lust.ihug.co.nz>,
> Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> wrote:
>
>> I construct ad-hoc queries all the time. It really isn’t that hard to do
>> safely. All you have to do is read the documentation
>
> I get worried when people talk about how easy it is to do something
> safely.
First: I agree with this. While it's definitely possible to correctly
escape a given SQL dialect under controlled conditions, it's not at all
easy to get it right, and the real world is even more unfriendly than
most people expect. Furthermore there's no reason to do it that way:
Python's DB API spec effectively requires that placeholder parameters
of *some* kind exist. Even if you feel the need to construct SQL, you
can construct it with parameters almost as easily as you can construct
it with the values baked in.
With that said...
> 2) Somebody runs your application with a different PYTHONPATH, which
> causes a different (i.e. malicious) urllib module to get loaded, which
> makes urllib.quote() do something you didn't expect.
Someone who can manipulate PYTHONPATH or otherwise add code to the
runtime environment is already in a position to hose your database,
independently of escaping-related issues. It's up to the sysadmin or
user to ensure that their environment is sane, and it's on their head
if they add broken code to a program's runtime environment.
Lawrence D'Oliveiro wrote:
> I'��ve done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash. It’s not hard to verify you’ve
> done it correctly. It lets you easily create table-updating code like the
> following, which makes it so easy to update the code to track changes in the
> database structure:
>
> sql.cursor.execute \
> (
> "update items set "
> +
> ", ".join
> (
> tuple
> (
> "%(name)s = %(value)s"
> %
> {
> "name" : field[0],
> "value" : SQLString(Params.getvalue
> (
> "%s[%s]" % (field[1],
> urllib.quote(modify_id))
> ))
> }
> for field in
> (
> ("class_name", "modify_class"),
> ("make", "modify_make"),
> ("model", "modify_model"),
> ("details", "modify_details"),
> ("serial_nr", "modify_serial"),
> ("inventory_nr", "modify_invent"),
> ("when_purchased", "modify_when_purchased"),
> ... you get the idea ...
> ("location_name", "modify_location"),
> ("comment", "modify_comment"),
> )
> )
> +
> (
> "last_modified = %d" % int(time.time()),
> )
> )
> +
> " where inventory_nr = %s" % SQLString(modify_id)
> )
Why would I write this when SQLAlchemy, even without using its ORM
features, can do it for me? It even uses the placeholder-generating
strategy I mentioned above, where possible.
Finally, it's worth noting that MySQL is (almost) the only mainstream
database that uses escaping for parameterization. PostgreSQL, SQL
Server, Oracle, DB2, and most other databases support parameters
natively in their communication protocols: parameters aren't injected
into the query string, but are sent separately and processed separately
within the DBMS. This neatly avoids encoding-related and
quoting-related problems entirely, and it means the type of the
parameter can be preserved if it's useful.
-o
|
|
0
|
|
|
|
Reply
|
angrybaldguy (338)
|
6/25/2010 2:43:26 AM
|
|
In message <roy-30B881.21024824062010@news.panix.com>, Roy Smith wrote:
> 1) Somebody is running your application (or the database server) with
> the locale set to something unexpected.
Locales are under program control, so that won’t happen.
This is why I use UTF-8 encoding for everything.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/25/2010 3:34:38 AM
|
|
In message <2010062422432660794-angrybaldguy@gmailcom>, Owen Jacobson wrote:
> Why would I write this when SQLAlchemy, even without using its ORM
> features, can do it for me?
SQLAlchemy doesn’t seem very flexible. Looking at the code examples
<http://www.sqlalchemy.org/docs/examples.html>, they’re very procedural:
build object, then do a string of separate method calls to add data to it. I
prefer the functional approach, as in my table-update example.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/25/2010 3:38:50 AM
|
|
On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:
> Just been reading this article
> ...
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.
>
> Fair enough. But it’s disconcerting to see some of the advice being
> offered in the reader comments, like “force everyone to use stored
> procedures”, or “force everyone to use prepared/parametrized
> statements”, “never construct ad-hoc SQL queries” and the like.
>
> I construct ad-hoc queries all the time. It really isn’t that hard to
> do safely.
Wrong.
Even if you get the quoting absolutely correct (which is a very big "if"),
you have to remember to perform it every time, without exception. And you
need to perform it exactly once. As the program gets more complex,
ensuring that it's done in the correct place, and only there, gets harder.
More generally, as a program gets more complex, "this will work so long as
we do X every time without fail" approaches "this won't work".
> All you have to do is read the documentation—for example,
> <http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>—and then
> write a routine that takes arbitrary data and turns it into a valid
> string literal, like this
> <http://www.codecodex.com/wiki/Useful_MySQL_Routines#Quoting>.
That's okay. Provided the documentation is accurate. And provided that you
update the escaping algorithm whenever the SQL dialect gets extended, or
you switch to a different back-end, or modify the program. IOW, it's not
even remotely okay.
"Unparsing" data so that you get the correct answer out of a subsequent
parsing step is objectively and obviously the wrong approach. The
correct approach is to skip both the unparsing and parsing steps
entirely.
Formal grammars are a useful way to represent graph-like data structures
in a human-readable and human-editable form. But for creation,
modification and use by a computer, it is invariably preferable to operate
upon the graph directly. Textual formats inherit all of the "issues" which
apply to the underlying data structure, then add a few of their own for
good measure.
> I've done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash.
And, of course, you're convinced that you got it right every time. That
attitude alone should set alarm bells ringing for anyone who's worked in
this industry for more than five minutes.
|
|
0
|
|
|
|
Reply
|
nobody (4831)
|
6/25/2010 6:47:47 AM
|
|
Nobody <nobody@nowhere.com> writes:
> More generally, as a program gets more complex, "this will work so long as
> we do X every time without fail" approaches "this won't work".
QOTW
|
|
0
|
|
|
|
Reply
|
Paul
|
6/25/2010 7:09:44 AM
|
|
On Fri, 2010-06-25, Lawrence D'Oliveiro wrote:
> Just been reading this article
> <http://www.theregister.co.uk/2010/06/23/xxs_sql_injection_attacks_testing_remedy/>
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.
I don't do SQL and I don't even understand the terminology properly
.... but the discussion around it bothers me.
Do those people really do this?
- accept untrusted user data
- try to sanitize the data (escaping certain characters etc)
- turn this data into executable code (SQL)
- executing it
Like the example in the article
SELECT * FROM hotels WHERE city = '<untrusted>';
If so, its isomorphic with doing os.popen('zcat -f %s' % untrusted)
in Python (at least on Unix, where 'zcat ...' is executed as a shell
script).
I thought it was well-known that the solution is *not* to try to
sanitize the input -- it's to switch to an interface which doesn't
involve generating an intermediate executable. In the Python example,
that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
Am I missing something? If not, I can go back to sleep -- and keep
avoiding SQL and web programming like the plague until that community
has entered the 21st century.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/25/2010 12:15:08 PM
|
|
On 6/25/2010 12:09 AM, Paul Rubin wrote:
> Nobody<nobody@nowhere.com> writes:
>> More generally, as a program gets more complex, "this will work so long as
>> we do X every time without fail" approaches "this won't work".
Yes. I was just looking at some of my own code. Out of about 100
SQL statements, I'd used manual escaping once, in code where the WHERE
clause is built up depending on what information is available for the
search. It's done properly, using "MySQLdb.escape_string(s)", which
is what's used inside "cursor.execute". Looking at the code, I
now realize that it would have been better to
add sections to the SQL string with standard escapes, and at the same
time, append the key items to a list. Then the list can be
converted to a tuple for submission to "cursor.execute".
John Nagle
|
|
0
|
|
|
|
Reply
|
John
|
6/25/2010 6:58:51 PM
|
|
On Fri, 25 Jun 2010 12:15:08 +0000, Jorgen Grahn wrote:
> I don't do SQL and I don't even understand the terminology properly
> ... but the discussion around it bothers me.
>
> Do those people really do this?
Yes. And then some.
Among web developers, the median level of programming knowledge amounts to
the first 3 chapters of "Learn PHP in 7 Days".
It doesn't help the the guy who wrote PHP itself wasn't much better.
> - accept untrusted user data
> - try to sanitize the data (escaping certain characters etc)
> - turn this data into executable code (SQL)
> - executing it
>
> Like the example in the article
>
> SELECT * FROM hotels WHERE city = '<untrusted>';
Yep. Search the BugTraq archives for "SQL injection". And most of those
are for widely-deployed middleware; the zillions of bespoke site-specific
scripts are likely to be worse.
Also: http://xkcd.com/327/
> I thought it was well-known that the solution is *not* to try to
> sanitize the input
Well known by anyone with a reasonable understanding of the principles of
programming, but somewhat less well known by the other 98% of web
developers.
> Am I missing something?
There's a world of difference between a skilled chef and the people
flipping burgers for a minimum wage. And between a chartered civil
engineer and the people laying the asphalt. And between what you
probably consider a programmer and the people doing most web development.
> If not, I can go back to sleep -- and keep
> avoiding SQL and web programming like the plague until that community
> has entered the 21st century.
Don't hold your breath.
Of course, there's no fundamental reason why you can't apply sound
practices to web development. Well, other than the fact that you're
competing against an infinite number of (code-) monkeys for lowest-bidder
contracts.
To be fair, it isn't actually limited to web developers. I've seen the
following in scientific code written in C (or, more likely, ported to C
from Fortran) for Unix:
sprintf(buff, "rm -f %s", filename);
system(buff);
Why bother learning the Unix API when you already know system()?
|
|
0
|
|
|
|
Reply
|
Nobody
|
6/25/2010 11:17:47 PM
|
|
On Fri, Jun 25, 2010 at 5:17 PM, Nobody <nobody@nowhere.com> wrote:
> To be fair, it isn't actually limited to web developers. I've seen the
> following in scientific code written in C (or, more likely, ported to C
> from Fortran) for Unix:
>
> =A0 =A0 =A0 =A0sprintf(buff, "rm -f %s", filename);
> =A0 =A0 =A0 =A0system(buff);
Tsk, tsk. And it's so easy to fix, too:
#define BUFSIZE 1000000
char buff[BUFSIZE];
if (snprintf(buff, BUFSIZE, "rm -f %s", filename) >=3D BUFSIZE) {
printf("No buffer overflow for you!\n");
} else {
system(buff);
}
There, that's much more secure.
|
|
0
|
|
|
|
Reply
|
ian.g.kelly (1155)
|
6/26/2010 12:25:04 AM
|
|
In message <pan.2010.06.25.06.47.34.297000@nowhere.com>, Nobody wrote:
> On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:
>
>> I construct ad-hoc queries all the time. It really isn’t that hard to
>> do safely.
>
> Wrong.
>
> Even if you get the quoting absolutely correct (which is a very big "if"),
> you have to remember to perform it every time, without exception.
>
> More generally, as a program gets more complex, "this will work so long as
> we do X every time without fail" approaches "this won't work".
That’s a content-free claim. Why? Because it applies equally to everything.
Replace “quoting” with something like “arithmetic”, and you’ll see what I
mean:
Even if you get the arithmetic absolutely correct (which is a very big
"if"), you have to remember to perform it every time, without exception.
More generally, as a program gets more complex, "this will work so long
as we do X every time without fail" approaches "this won't work".
From which we can conclude, according to your logic, that one shouldn’t be
doing arithmetic.
Next time, try to avoid fallacious arguments.
> And you need to perform it exactly once. As the program gets more complex,
> ensuring that it's done in the correct place, and only there, gets harder.
Nonsense. It only needs to be done at the boundary to the appropriate
component (MySQL, HTML, JavaScript, whatever). That’s the only place which
needs to have knowledge of what’s on the other side. Everything else can
work with arbitrary data without having to worry about such things.
Go back to my example, and you’ll see this: the original updates two dozen
different fields in a database table, yet it only needs two calls to
SQLString: one deals with all the fields requiring updating, while the other
one deals with the key-matching. That’s it. Instead of two dozen different
places needing checking, you only have two.
That’s what “maintainability” is all about.
|
|
0
|
|
|
|
Reply
|
ldo (2144)
|
6/26/2010 12:40:41 AM
|
|
In article <mailman.2117.1277511935.32709.python-list@python.org>,
Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Fri, Jun 25, 2010 at 5:17 PM, Nobody <nobody@nowhere.com> wrote:
> > To be fair, it isn't actually limited to web developers. I've seen the
> > following in scientific code written in C (or, more likely, ported to C
> > from Fortran) for Unix:
> >
> > � � � �sprintf(buff, "rm -f %s", filename);
> > � � � �system(buff);
>
> Tsk, tsk. And it's so easy to fix, too:
>
> #define BUFSIZE 1000000
> char buff[BUFSIZE];
> if (snprintf(buff, BUFSIZE, "rm -f %s", filename) >= BUFSIZE) {
> printf("No buffer overflow for you!\n");
> } else {
> system(buff);
> }
>
> There, that's much more secure.
I recently fixed a bug in some production code. The programmer was
careful to use snprintf() to avoid buffer overflows. The only problem
is, he wrote something along the lines of:
snprintf(buf, strlen(foo), foo);
I'm sure the code got reviewed originally, and probably looked at dozens
of times over the years. Nobody caught the problem until we ran a
static code analysis tool (Coverity) over it.
To bring this back to something remotely Python related, the point of
all this is that security is hard. A lot of the security best practices
(such as "don't compose SQL queries on the fly with externally tainted
strings") exist because they address ways that people have gotten burned
in the past. It if foolish to think that you're smarter than everybody
else and have thought of every possibility to avoid getting burned by
doing the things that have gotten other people in trouble.
|
|
0
|
|
|
|
Reply
|
Roy
|
6/26/2010 12:43:51 AM
|
|
In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
wrote:
> I thought it was well-known that the solution is *not* to try to
> sanitize the input -- it's to switch to an interface which doesn't
> involve generating an intermediate executable. In the Python example,
> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
That’s what I mean. Why do people consider input sanitization so hard?
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/26/2010 12:49:09 AM
|
|
On 2010-06-25 20:49:09 -0400, Lawrence D'Oliveiro said:
> In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
> wrote:
>
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input -- it's to switch to an interface which doesn't
>> involve generating an intermediate executable. In the Python example,
>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>
> That’s what I mean. Why do people consider input sanitization so hard?
It's not hard. It's just begging for a visit from the fuckup fairy.
-o
|
|
0
|
|
|
|
Reply
|
Owen
|
6/26/2010 2:56:02 AM
|
|
On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> In the Python example, that would be something like
>> os.popen2(['zcat', '-f', '--', untrusted]).
>
> That’s what I mean. Why do people consider input sanitization
> so hard?
It's hard because it requires thinking. Sadly, many of the
people I know who call themselves programmers couldn't code their
way out of a paper bag, let alone think logically about the
security implications of their code.[1]
-tkc
[1] much of which ends up being cargo-cult programming,
cut-n-paste'd from Google search-results.
|
|
0
|
|
|
|
Reply
|
Tim
|
6/26/2010 3:29:23 AM
|
|
On Thu, Jun 24, 2010 at 9:38 PM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <2010062422432660794-angrybaldguy@gmailcom>, Owen Jacobson wro=
te:
>
>> Why would I write this when SQLAlchemy, even without using its ORM
>> features, can do it for me?
>
> SQLAlchemy doesn=92t seem very flexible. Looking at the code examples
> <http://www.sqlalchemy.org/docs/examples.html>, they=92re very procedural=
:
> build object, then do a string of separate method calls to add data to it=
.. I
> prefer the functional approach, as in my table-update example.
Your example from the first post of the thread rewritten using sqlalchemy:
conn.execute(
items.update()
.where(items.c.inventory_nr =3D=3D modify_id)
.values(
dict(
(field[0], Params.getvalue("%s[%s]" % (field[1],
urllib.quote(modify_id))))
for field in [
(items.c.class_name, "modify_class"),
(items.c.make, "modify_make"),
(items.c.model, "modify_model"),
(items.c.details, "modify_details"),
(items.c.serial_nr, "modify_serial"),
(items.c.inventory_nr, "modify_invent"),
(items.c.when_purchased, "modify_when_purchased"),
... you get the idea ...
(items.c.location_name, "modify_location"),
(items.c.comment, "modify_comment"),
]
)
)
.values(last_modified =3D time.time())
)
Doesn't seem any less flexible to me, plus you don't have to worry
about calling your SQLString function at all.
Cheers,
Ian
|
|
0
|
|
|
|
Reply
|
Ian
|
6/26/2010 6:33:16 AM
|
|
On 2010-06-25 19:49 , Lawrence D'Oliveiro wrote:
> In message<slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
> wrote:
>
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input -- it's to switch to an interface which doesn't
>> involve generating an intermediate executable. In the Python example,
>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>
> That’s what I mean. Why do people consider input sanitization so hard?
It's not hard per se; it's just repetitive, prone to the occasional mistake,
and, frankly, really boring. When faced with things like that, we do what we do
everywhere else in programming: wrap up the repetitive bits into a simpler
library API and use that everywhere. Wrapping up the escaping code into
SQLString is a step in that direction. However, the standard SQL
parameterization in most of the DB protocols or SQLAlchemy's query construction
removes even more repetition and unnecessary typing. There's just no point in
not using it.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
|
|
0
|
|
|
|
Reply
|
Robert
|
6/26/2010 7:39:03 AM
|
|
On Sat, 26 Jun 2010 12:40:41 +1200, Lawrence D'Oliveiro wrote:
>>> I construct ad-hoc queries all the time. It really isn’t that hard to
>>> do safely.
>>
>> Wrong.
>>
>> Even if you get the quoting absolutely correct (which is a very big "if"),
>> you have to remember to perform it every time, without exception.
>>
>> More generally, as a program gets more complex, "this will work so long as
>> we do X every time without fail" approaches "this won't work".
>
> That’s a content-free claim. Why? Because it applies equally to everything.
> Replace “quoting” with something like “arithmetic”, and you’ll
> see what I mean:
If you omit the arithmetic, the program is likely to fail in very
obvious ways. Escaping is "almost" an identity function, which makes it
far more likely that omission or repetition will go unnoticed.
>> And you need to perform it exactly once. As the program gets more complex,
>> ensuring that it's done in the correct place, and only there, gets harder.
>
> Nonsense. It only needs to be done at the boundary to the appropriate
> component (MySQL, HTML, JavaScript, whatever).
That assumes that you have a well-defined "boundary", which isn't
necessarily the case.
In any case, you're still trying to make arguments about whether it's easy
or hard to get it right, which completely misses the point. Eliminating
the escaping entirely makes it impossible to get it wrong.
|
|
0
|
|
|
|
Reply
|
Nobody
|
6/26/2010 10:49:18 AM
|
|
On Fri, 25 Jun 2010 20:43:51 -0400, Roy Smith wrote:
> To bring this back to something remotely Python related, the point of
> all this is that security is hard.
Oh, this isn't solely a security issue.
Ask anyone with a surname like O'Neil, O'Connor, O'Leary, etc; they've
probably broken a lot of web apps *without even trying*.
|
|
0
|
|
|
|
Reply
|
Nobody
|
6/26/2010 11:04:38 AM
|
|
In article <2010062522560231540-angrybaldguy@gmailcom>,
Owen Jacobson <angrybaldguy@gmail.com> wrote:
> It's not hard. It's just begging for a visit from the fuckup fairy.
QOTD?
|
|
0
|
|
|
|
Reply
|
Roy
|
6/26/2010 11:59:23 AM
|
|
On Sat, 26 Jun 2010 12:04:38 +0100
Nobody <nobody@nowhere.com> wrote:
> Ask anyone with a surname like O'Neil, O'Connor, O'Leary, etc; they've
> probably broken a lot of web apps *without even trying*.
At least it isn't a problem with the first name field. Oh, wait...
--
D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
|
|
0
|
|
|
|
Reply
|
D
|
6/26/2010 12:07:11 PM
|
|
In message <mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
wrote:
> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
> ...
I see that you published my unobfuscated e-mail address on USENET for all to
see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
this was a momentary lapse of judgement, for which I expect an apology.
Otherwise, it becomes grounds for an abuse complaint to your ISP.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 2:21:50 AM
|
|
In message <mailman.2126.1277534032.32709.python-list@python.org>, Ian Kelly
wrote:
> Your example from the first post of the thread rewritten using sqlalchemy:
>
> conn.execute(
> items.update()
> .where(items.c.inventory_nr == modify_id)
> .values(
> dict(
> (field[0], Params.getvalue("%s[%s]" % (field[1],
> urllib.quote(modify_id))))
> for field in [
> (items.c.class_name, "modify_class"),
> (items.c.make, "modify_make"),
> (items.c.model, "modify_model"),
> (items.c.details, "modify_details"),
> (items.c.serial_nr, "modify_serial"),
> (items.c.inventory_nr, "modify_invent"),
> (items.c.when_purchased, "modify_when_purchased"),
> ... you get the idea ...
> (items.c.location_name, "modify_location"),
> (items.c.comment, "modify_comment"),
> ]
> )
> )
> .values(last_modified = time.time())
> )
>
> Doesn't seem any less flexible to me, plus you don't have to worry
> about calling your SQLString function at all.
Except I only needed two calls to SQLString, while you need two dozen
instances of that repetitive items.c boilerplate.
As a human, being repetitive is not my job. That’s what the computer is for.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 2:31:59 AM
|
|
In message <2010062522560231540-angrybaldguy@gmailcom>, Owen Jacobson wrote:
> It's not hard. It's just begging for a visit from the fuckup fairy.
That’s the same fallacious argument I pointed out earlier.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 2:33:57 AM
|
|
In message <pan.2010.06.26.10.49.02.156000@nowhere.com>, Nobody wrote:
> On Sat, 26 Jun 2010 12:40:41 +1200, Lawrence D'Oliveiro wrote:
>
>>>> I construct ad-hoc queries all the time. It really isn’t that hard to
>>>> do safely.
>>>
>>> Wrong.
>>>
>>> Even if you get the quoting absolutely correct (which is a very big
>>> "if"), you have to remember to perform it every time, without exception.
>>>
>>> More generally, as a program gets more complex, "this will work so long
>>> as we do X every time without fail" approaches "this won't work".
>>
>> That’s a content-free claim. Why? Because it applies equally to
>> everything. Replace “quoting” with something like “arithmetic”, and
>> you’ll see what I mean:
>
> If you omit the arithmetic, the program is likely to fail in very
> obvious ways. Escaping is "almost" an identity function, which makes it
> far more likely that omission or repetition will go unnoticed.
Maybe you need to go back and reread my original posting. The SQLString
routine doesn’t just escape special characters, it generates a full MySQL
string literal, complete with quotation marks. That makes it rather more
likely for a syntax error to occur if I forget to use it, don’t you think?
>>> And you need to perform it exactly once. As the program gets more
>>> complex, ensuring that it's done in the correct place, and only there,
>>> gets harder.
>>
>> Nonsense. It only needs to be done at the boundary to the appropriate
>> component (MySQL, HTML, JavaScript, whatever).
>
> That assumes that you have a well-defined "boundary", which isn't
> necessarily the case.
It’s ALWAYS the case.
> In any case, you're still trying to make arguments about whether it's easy
> or hard to get it right, which completely misses the point. Eliminating
> the escaping entirely makes it impossible to get it wrong.
Except nobody has yet shown an alternative which is easier to get right.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 2:36:10 AM
|
|
In article <i06cju$qqa$2@lust.ihug.co.nz>,
Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> wrote:
>In message <mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
>wrote:
>>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
>I see that you published my unobfuscated e-mail address on USENET for all to
>see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
>this was a momentary lapse of judgement, for which I expect an apology.
>Otherwise, it becomes grounds for an abuse complaint to your ISP.
You are double daft. First, I completely disagree with you about it
being abuse; from my POV anyone posting to Usenet should do so with an
unobfuscated address. Secondly, you are wrong about Tim publishing your
address unless you intended to follow up to a completely different post,
and you owe *him* an apology for a false accusation.
--
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/
"If you don't know what your program is supposed to do, you'd better not
start writing it." --Dijkstra
|
|
0
|
|
|
|
Reply
|
aahz
|
6/27/2010 2:53:04 AM
|
|
On Sat, Jun 26, 2010 at 7:21 PM, Lawrence D'Oliveiro <> wrote:
> In message <mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
> wrote:
>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
> I see that you published my unobfuscated e-mail address on USENET for all to
> see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
> this was a momentary lapse of judgement, for which I expect an apology.
> Otherwise, it becomes grounds for an abuse complaint to your ISP.
Will you give it a rest already with these threatening messages? Why
are you still using this only-partially-obfuscated address with USENET
anyway? This has happened twice before, it will doubtless happen yet
again. Just use an /entirely invalid/ From address like some other
posters do.
I can't believe you have a form letter for this...
Regards,
Chris
--
Public addresses eventually going bad is a *fact of life*; plan ahead
accordingly.
|
|
0
|
|
|
|
Reply
|
Chris
|
6/27/2010 2:55:49 AM
|
|
On 06/26/2010 09:21 PM, Lawrence D'Oliveiro wrote:
> In message<mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
> wrote:
>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
> I see that you published my unobfuscated e-mail address on USENET for all to
> see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
> this was a momentary lapse of judgement, for which I expect an apology.
> Otherwise, it becomes grounds for an abuse complaint to your ISP.
I'm sorry...you've got your knickers in a knot? That your spam
filters seem to be insufficient? That you don't have a custom
throwaway address for such public dialogs? For preventing an
"undeliverable" bounce message that your bogus address would have
caused (if your mail provider is RFC-compliant; though your mail
provider may kindly be breaking RFC by disabling "undeliverable"
responses to prevent back-scatter spam)?
Is the abuse charge "waah, he replied to my actual email rather
than the false one I spoofed"?
I'm not sure an abuse complaint to my ISP would net you anything
since the exact out-bound headers show nothing abusive, only the
correcting of an invalid TLD to prevent a bounce (and a distinct
lack of USENET references in the original message that went to
you and CC'ed python-list@python.org).
Having regularly used python.list@tim.thechases.com unobfuscated
for easily over 5 years, the spam to this address has been almost
negligible (or so effectively dealt with by Thunderbird's spam
filters that I've never noticed it).
-tkc
|
|
0
|
|
|
|
Reply
|
Tim
|
6/27/2010 3:23:53 AM
|
|
On 6/26/10 7:21 PM, Lawrence D'Oliveiro wrote:
> In message<mailman.2123.1277522976.32709.python-list@python.org>, Tim Chase
> wrote:
>
>> On 06/25/2010 07:49 PM, Lawrence D'Oliveiro wrote:
>> ...
>
> I see that you published my unobfuscated e-mail address on USENET for all to
> see. I obfuscated it for a reason, to keep the spammers away. I'm assuming
> this was a momentary lapse of judgement, for which I expect an apology.
> Otherwise, it becomes grounds for an abuse complaint to your ISP.
Wow.
Way to be a douchebag.
I was going to say something about the realities of this forum and its
dual-nature and conflicting netiquette and on. But I decided it really
just had no point.
So, I'm left with: wow. You kinda suck*, man.
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
P.S. *Then again, I'm fairly sure anytime someone has a form letter
which contains the words, "I expect an apology", there's some personal
suck going on.
|
|
0
|
|
|
|
Reply
|
Stephen
|
6/27/2010 3:27:52 AM
|
|
Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> writes:
> I see that you published my unobfuscated e-mail address on USENET for
> all to see. I obfuscated it for a reason, to keep the spammers away.
> I'm assuming this was a momentary lapse of judgement, for which I
> expect an apology. Otherwise, it becomes grounds for an abuse
> complaint to your ISP.
Er? On what grounds would you complain to their ISP? You might consider
the person rude, but that's not grounds for an abuse complaint. What
part of their ISP's terms of service do you think they have abused by
de-obfuscating information you freely posted to the internet?
--
\ “If you do not trust the source do not use this program.” |
`\ —Microsoft Vista security dialogue |
_o__) |
Ben Finney
|
|
0
|
|
|
|
Reply
|
Ben
|
6/27/2010 3:50:41 AM
|
|
On Sat, Jun 26, 2010 at 8:50 PM, Ben Finney <ben+python@benfinney.id.au> wrote:
> Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> writes:
>
>> I see that you published my unobfuscated e-mail address on USENET for
>> all to see. I obfuscated it for a reason, to keep the spammers away.
>> I'm assuming this was a momentary lapse of judgement, for which I
>> expect an apology. Otherwise, it becomes grounds for an abuse
>> complaint to your ISP.
>
> Er? On what grounds would you complain to their ISP? You might consider
> the person rude, but that's not grounds for an abuse complaint. What
> part of their ISP's terms of service do you think they have abused by
> de-obfuscating information you freely posted to the internet?
I routinely post my email on this and other mailing lists and have yet
to get a piece of spam in my inbox as a result. I suggest you get a
better spam filter rather than expecting the rest of the universe to
annoy itself for your benefit.
Geremy Condra
|
|
0
|
|
|
|
Reply
|
geremy
|
6/27/2010 4:04:45 AM
|
|
In message <pan.2010.06.26.11.04.22.328000@nowhere.com>, Nobody wrote:
> Ask anyone with a surname like O'Neil, O'Connor, O'Leary, etc; they've
> probably broken a lot of web apps *without even trying*.
Last I checked, I couldn’t post comments on freedom-to-tinker.com.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 4:15:18 AM
|
|
In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
> I recently fixed a bug in some production code. The programmer was
> careful to use snprintf() to avoid buffer overflows. The only problem
> is, he wrote something along the lines of:
>
> snprintf(buf, strlen(foo), foo);
A long while ago I came up with this macro:
#define Descr(v) &v, sizeof v
making the correct version of the above become
snprintf(Descr(buf), foo);
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 4:17:39 AM
|
|
On Sat, Jun 26, 2010 at 8:31 PM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> Except I only needed two calls to SQLString, while you need two dozen
> instances of that repetitive items.c boilerplate.
>
> As a human, being repetitive is not my job. That=92s what the computer is=
for.
Then why do you have every parameter prefixed with "modify_"? 8-)
But seriously, if that bothers you, then fold the "items.c." portion
into the generator expression with a getattr call. Or just change
them back to the same strings you had originally, and sqlalchemy will
be just as happy to accept them as-is.
Cheers,
Ian
|
|
0
|
|
|
|
Reply
|
Ian
|
6/27/2010 7:31:17 AM
|
|
On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>
>> I recently fixed a bug in some production code. =C2=A0The programmer was
>> careful to use snprintf() to avoid buffer overflows. =C2=A0The only prob=
lem
>> is, he wrote something along the lines of:
>>
>> snprintf(buf, strlen(foo), foo);
>
> A long while ago I came up with this macro:
>
> =C2=A0 =C2=A0#define Descr(v) &v, sizeof v
>
> making the correct version of the above become
>
> =C2=A0 =C2=A0snprintf(Descr(buf), foo);
>
Not quite right. If buf is a char array, as suggested by the use of
sizeof, then you're not passing a char* to snprintf. You need to lose
the & in your macro.
--=20
regards,
kushal
|
|
0
|
|
|
|
Reply
|
python2058 (92)
|
6/27/2010 8:15:40 AM
|
|
In message <mailman.2184.1277626565.32709.python-list@python.org>, Kushal
Kumaran wrote:
> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>
>>> I recently fixed a bug in some production code. The programmer was
>>> careful to use snprintf() to avoid buffer overflows. The only problem
>>> is, he wrote something along the lines of:
>>>
>>> snprintf(buf, strlen(foo), foo);
>>
>> A long while ago I came up with this macro:
>>
>> #define Descr(v) &v, sizeof v
>>
>> making the correct version of the above become
>>
>> snprintf(Descr(buf), foo);
>
> Not quite right. If buf is a char array, as suggested by the use of
> sizeof, then you're not passing a char* to snprintf.
What am I passing, then?
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 11:46:36 AM
|
|
In message <mailman.2183.1277623909.32709.python-list@python.org>, Ian Kelly
wrote:
> On Sat, Jun 26, 2010 at 8:31 PM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>> Except I only needed two calls to SQLString, while you need two dozen
>> instances of that repetitive items.c boilerplate.
>>
>> As a human, being repetitive is not my job. That’s what the computer is
>> for.
>
> Then why do you have every parameter prefixed with "modify_"? 8-)
Touché :). Actually it’s because the same form can be used to add a new
record to the table, so there’s a separate set of input fields for that.
> But seriously, if that bothers you, then fold the "items.c." portion
> into the generator expression with a getattr call. Or just change
> them back to the same strings you had originally, and sqlalchemy will
> be just as happy to accept them as-is.
All this trouble, and it only gets rid of 2 of the 3 instances of data-
escaping in the example.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/27/2010 11:51:16 AM
|
|
On Sun, 27 Jun 2010 14:36:10 +1200, Lawrence D'Oliveiro wrote:
>> In any case, you're still trying to make arguments about whether it's easy
>> or hard to get it right, which completely misses the point. Eliminating
>> the escaping entirely makes it impossible to get it wrong.
>
> Except nobody has yet shown an alternative which is easier to get right.
For SQL, use stored procedures or prepared statements. For HTML/XML, use a
DOM (or similar) interface.
|
|
0
|
|
|
|
Reply
|
Nobody
|
6/27/2010 1:55:23 PM
|
|
On Sat, 2010-06-26, Lawrence D'Oliveiro wrote:
> In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
> wrote:
>
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input -- it's to switch to an interface which doesn't
>> involve generating an intermediate executable. In the Python example,
>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>
> That???s what I mean. Why do people consider input sanitization so hard?
I'm not sure you understood me correctly, because I advocate
*not* doing input sanitization. Hard or not -- I don't want to know,
because I don't want to do it.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/27/2010 7:17:54 PM
|
|
On Fri, 2010-06-25, Nobody wrote:
> On Fri, 25 Jun 2010 12:15:08 +0000, Jorgen Grahn wrote:
>
>> I don't do SQL and I don't even understand the terminology properly
>> ... but the discussion around it bothers me.
>>
>> Do those people really do this?
>
> Yes. And then some.
>
> Among web developers, the median level of programming knowledge amounts to
> the first 3 chapters of "Learn PHP in 7 Days".
>
> It doesn't help the the guy who wrote PHP itself wasn't much better.
>
>> - accept untrusted user data
>> - try to sanitize the data (escaping certain characters etc)
>> - turn this data into executable code (SQL)
>> - executing it
>>
>> Like the example in the article
>>
>> SELECT * FROM hotels WHERE city = '<untrusted>';
>
> Yep. Search the BugTraq archives for "SQL injection". And most of those
> are for widely-deployed middleware; the zillions of bespoke site-specific
> scripts are likely to be worse.
>
> Also: http://xkcd.com/327/
Priceless!
As is often the case with xkcd, I learned something, too: there's a
widely used web application/portal/database thingy which silently
strips some characters from my input. I thought it had to do with
HTML, but it's in fact exactly the sequences "'", ')', ';' and '--'
from the comic, and a few more like '>' and undoubtedly some I haven't
noticed yet.
That is surely "input sanitization" gone horribly wrong: I enter "6--8
slices of bread", but the system stores "68 slices of bread".
>> I thought it was well-known that the solution is *not* to try to
>> sanitize the input
>
> Well known by anyone with a reasonable understanding of the principles of
> programming, but somewhat less well known by the other 98% of web
> developers.
>
>> Am I missing something?
>
> There's a world of difference between a skilled chef and the people
> flipping burgers for a minimum wage. And between a chartered civil
> engineer and the people laying the asphalt. And between what you
> probably consider a programmer and the people doing most web development.
I don't know them, so I wouldn't know ... What I would *expect* is
that safe tools are provided for them, not just workarounds so they
can keep using the unsafe tools. That's what Python did, with its
multitude of alternatives to os.system and os.popen.
Anyway, thanks. It's always nice to be able to map foreign terminology
like "SQL injection" to something you already know.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/27/2010 8:15:11 PM
|
|
On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>
>> I recently fixed a bug in some production code. The programmer was
>> careful to use snprintf() to avoid buffer overflows. The only problem
>> is, he wrote something along the lines of:
>>
>> snprintf(buf, strlen(foo), foo);
>
> A long while ago I came up with this macro:
>
> #define Descr(v) &v, sizeof v
>
> making the correct version of the above become
>
> snprintf(Descr(buf), foo);
This is off-topic, but I believe snprintf() in C can *never* safely be
the only thing you do to the buffer: you also have to NUL-terminate it
manually in some corner cases. See the documentation.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/27/2010 8:30:48 PM
|
|
On Jun 24, 6:02=A0pm, Roy Smith <r...@panix.com> wrote:
> In article <i00t2k$l0...@lust.ihug.co.nz>,
> =A0Lawrence D'Oliveiro <l...@geek-central.gen.new_zealand> wrote:
>
> > I construct ad-hoc queries all the time. It really isn=92t that hard to=
do
> > safely. All you have to do is read the documentation
>
> I get worried when people talk about how easy it is to do something
> safely. =A0Let me suggest a couple of things you might not have considere=
d:
>
> 1) Somebody is running your application (or the database server) with
> the locale set to something unexpected. =A0This might change how numbers,
> dates, currency, etc, get formatted, which could change the meaning of
> your constructed SQL statement.
>
> 2) Somebody runs your application with a different PYTHONPATH, which
> causes a different (i.e. malicious) urllib module to get loaded, which
> makes urllib.quote() do something you didn't expect.
Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):
results =3D rdb_query(table =3D model,
columns =3D [model.name, model.number])
results =3D rdb_inner_join(tables =3D [records,tags],
joins =3D [(records.id,tags.record_id)]),
columns =3D [record.name, tag.name])
Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare. Still, it hasn't stopped binary APIs in other kinds of
libraries.
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/27/2010 10:07:28 PM
|
|
In article
<14e44c9c-04d9-452d-b544-498adfaf7d40@d8g2000yqf.googlegroups.com>,
Carl Banks <pavlovevidence@gmail.com> wrote:
> Seriously, almost every other kind of library uses a binary API. What
> makes databases so special that they need a string-command based API?
> How about this instead (where this a direct binary interface to the
> library):
>
> results = rdb_query(table = model,
> columns = [model.name, model.number])
>
> results = rdb_inner_join(tables = [records,tags],
> joins = [(records.id,tags.record_id)]),
> columns = [record.name, tag.name])
>
> Well, we know the real reason is that C, Java, and friends lack
> expressiveness and so constructing a binary query is an ASCII
> nightmare. Still, it hasn't stopped binary APIs in other kinds of
> libraries.
Well, the answer to that one is simple. SQL, in the hands of somebody
like me, can be used to express a few pathetic joins and what I do with
it could probably be handled with the kind of API you're describing.
But, the language has far more expressivity than that, and a
domain-specific language is really a good fit for what it can do.
The problem is not so much that SQL queries are described as text
strings, but that the distinction between program and data gets lost if
you build the query as one big string. What you need (and which the
Python API supplies) is the ability to clearly distinguish between "this
text is my program" and "this text is a value which my program uses".
Python has the same problem. If I had a text string, s, which I read
from some external source, and wanted to interpret that string as an
integer, I could do (at least) two different things.
# Thing 1
myInteger = int(s)
# Thing 2
myInteger = eval(s)
for properly formed input, either one works, but thing 2 loses the
distinction between program and data and is thus dangerous. Exactly
like building a SQL query by smashing a bunch of strings together.
|
|
0
|
|
|
|
Reply
|
roy (2043)
|
6/27/2010 10:20:02 PM
|
|
Carl Banks <pavlovevidence@gmail.com> writes:
> Seriously, almost every other kind of library uses a binary API.
Except for the huge number that deal with text protocols or languages.
> What makes databases so special that they need a string-command based
> API?
Because SQL is a text language.
--
\ “In the long run, the utility of all non-Free software |
`\ approaches zero. All non-Free software is a dead end.” —Mark |
_o__) Pilgrim, 2006 |
Ben Finney
|
|
0
|
|
|
|
Reply
|
Ben
|
6/27/2010 11:35:53 PM
|
|
On 2010-06-26 22:33:57 -0400, Lawrence D'Oliveiro said:
> In message <2010062522560231540-angrybaldguy@gmailcom>, Owen Jacobson wrote:
>
>> It's not hard. It's just begging for a visit from the fuckup fairy.
>
> That’s the same fallacious argument I pointed out earlier.
In the sense that "using correct manual escaping leads to SQL injection
vulnerabilities", yes, that's a fallacious argument on its own.
However, as sites like BUGTRAQ amply demonstrate, generating SQL
through string manipulation is a risky development practice[0]. You can
continue to justify your choice to do so however you want, and you may
even be the One True Developer capable of getting it absolutely right
under all circumstances, but I'd still reject patches that introduced a
SQLString-like function and ask that you resubmit them using the
database API's parameterization tools instead.
Assuming for the sake of discussion that your SQLString function
perfectly captures the transformation required to turn an arbitrary str
into a MySQL string literal. How do you address the following issues?
1. Other (possibly inexperienced) developers reading your source who
may not have the skills to correctly implement the same transform
correctly learn from your programs that writing your own query munger
is okay.
1a. Other (possibly inexperienced) developers decide to copy and paste
your function without fully understanding how it works, in tandem with
any of the other issues below. (If you think this is rare, I invite you
to visit stackoverflow or roseindia some time.)
2. MySQL changes the quoting and escaping rules to address a
bug/feature request/developer whim, introducing a new set of corner
cases into your function and forcing you to re-learn the escaping and
quoting rules. (For people using DB API parameters, this is a matter of
upgrading the DB adapter module to a version that supports the modified
rules.)
3. You decide to switch from MySQL to a more fully-featured RDBMS,
which may have different quoting and escaping rules around string
literals.
3a. *Someone else* decides to port your program to a different RDBMS,
and may not understand that SQLString implements MySQL's quoting and
escaping rules only.
4. MySQL AB finally get off their collective duffs and adds real
parameter separation to the MySQL wire protocol, and implements real
prepared statements to massive speed gains in scenarios that are
relevant to your interests; string-based query construction gets left
out in the cold.
4a. As with case 3, except that instead of the rules changing when you
move to a new RDBMS, it's the relative performance of submitting new
queries versus reusing a parameterized query that changes.
On top of the obvious issue of completely avoiding quoting bugs, using
query parameters rather than escaping and string manipulation neatly
saves you from having to address any of these problems (and a multitude
of others) -- the DB API implementation will handle things for you, and
you are propagating good practice in an easy-to-understand form.
I am honestly at a loss trying to understand your position. There is a
huge body of documentation out there about the weaknesses of
string-manipulation-based approaches to query construction, and the use
of query parameters is so compellingly the Right Thing that I have a
very hard time comprehending why anyone would opt not to use it except
out of pure ignorance of their existence. Generating executable code --
including SQL -- from untrusted user input introduces an large
vulnerability surface for very little benefit.
You don't handle function parameters by building up python-language
strs containing the values as literals and eval'ing them, do you?
-o
[0] If you want to be *really* pedantic, string-manipulation-based
query construction is strongly correlated with the occurrence of SQL
injection vulnerabilities and bugs, which is in turn not strongly
correlated with very many other practices. Happy?
|
|
0
|
|
|
|
Reply
|
angrybaldguy (338)
|
6/28/2010 1:49:11 AM
|
|
On Jun 27, 4:35=A0pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> Carl Banks <pavlovevide...@gmail.com> writes:
> > Seriously, almost every other kind of library uses a binary API.
>
> Except for the huge number that deal with text protocols or languages.
No, not really. Almost all types of libraries have binary APIs,
including those that deal with text protocols or language. Any
control with string commands is something that's built on top of the
binary API. And culturally, programmers interfacing those libraries
expect to and are expected to use the binary API for low-level
programming.
RDBs, as a whole, either don't have binary APIs or they have them but
no one really uses them.
> > What makes databases so special that they need a string-command based
> > API?
>
> Because SQL is a text language.
Circular logic. I'm disappointed, usually when you sit on your
reinforced soapbox and pretense the air of infinite expertise you at
least use reasonable logic.
Also, I was asking about databases. "SQL is a text language" is not
the answer to the question "Why do RDBs use string commands instead of
binary APIs"?
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/28/2010 2:35:05 AM
|
|
On Jun 27, 3:20=A0pm, Roy Smith <r...@panix.com> wrote:
> In article
> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
> =A0Carl Banks <pavlovevide...@gmail.com> wrote:
>
>
>
> > Seriously, almost every other kind of library uses a binary API. What
> > makes databases so special that they need a string-command based API?
> > How about this instead (where this a direct binary interface to the
> > library):
>
> > results =3D rdb_query(table =3D model,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 columns =3D [model.name, model.=
number])
>
> > results =3D rdb_inner_join(tables =3D [records,tags],
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0joins =3D [(records.=
id,tags.record_id)]),
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0columns =3D [record.=
name, tag.name])
>
> > Well, we know the real reason is that C, Java, and friends lack
> > expressiveness and so constructing a binary query is an ASCII
> > nightmare. =A0Still, it hasn't stopped binary APIs in other kinds of
> > libraries.
>
> Well, the answer to that one is simple. =A0SQL, in the hands of somebody
> like me, can be used to express a few pathetic joins and what I do with
> it could probably be handled with the kind of API you're describing. =A0
> But, the language has far more expressivity than that, and a
> domain-specific language is really a good fit for what it can do.
I'm not the biggest expert on SQL ever, but the only thing I can think
of is expressions. Statements don't express anything very complex,
and could straightforwardly be represented by function calls. But
it's a fair point.
> The problem is not so much that SQL queries are described as text
> strings,
No, it is the problem, or part of it. String commands are inherently
prone to injection attacks, that's the main problem with them.
> but that the distinction between program and data gets lost if
> you build the query as one big string.
That too.
Carl Banks
|
|
0
|
|
|
|
Reply
|
pavlovevidence (1338)
|
6/28/2010 2:51:59 AM
|
|
On 2010-06-27 22:51:59 -0400, Carl Banks said:
> On Jun 27, 3:20�pm, Roy Smith <r...@panix.com> wrote:
>> In article
>> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
>> �Carl Banks <pavlovevide...@gmail.com> wrote:
>>
>>
>>
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>>> How about this instead (where this a direct binary interface to the
>>> library):
>>
>>> results = rdb_query(table = model,
>>> � � � � � � � � � � columns = [model.name, model.number])
>>
>>> results = rdb_inner_join(tables = [records,tags],
>>> � � � � � � � � � � � � �joins = [(records.id,tags.record_id)]),
>>> � � � � � � � � � � � � �columns = [record.name, tag.name])
>>
>>> Well, we know the real reason is that C, Java, and friends lack
>>> expressiveness and so constructing a binary query is an ASCII
>>> nightmare. �Still, it hasn't stopped binary APIs in other kinds of
>>> libraries.
>>
>> Well, the answer to that one is simple. �SQL, in the hands of somebody
>> like me, can be used to express a few pathetic joins and what I do with
>> it could probably be handled with the kind of API you're describing. �
>> But, the language has far more expressivity than that, and a
>> domain-specific language is really a good fit for what it can do.
>
> I'm not the biggest expert on SQL ever, but the only thing I can think
> of is expressions. Statements don't express anything very complex,
> and could straightforwardly be represented by function calls. But
> it's a fair point.
Off the top of my head, I can think of a few things that would be
tricky to turn into an API:
* Aggregation (GROUP BY, aggregate functions over arbitrary
expressions, HAVING clauses).
* CASE expressions.
* Subqueries.
* Recursive queries (in DBMSes that support them).
* Window clauses (likewise).
* Set operations between queries (UNION, DIFFERENCE, INTERSECT).
* A surprisingly rich set of JOIN clauses beyond the obvious inner
natural joins.
* Various DBMS-specific locking hints.
* Computed inserts and updates.
* Updates and deletes that include joins.
* RETURNING lists on modification queries.
* Explicit (DBMS-side) cursors.
This is by no means an exhaustive list.
Of course, it's possible to represent all of this via an API rather
than a language, and libraries like SQLAlchemy make a reasonable
attempt�at doing just that. However, not every programming language has
the kind of structural flexibility to do that well: a library similar
to SQLalchemy would be incredibly clunky (if it worked at all) in, say,
Java or C#, and it'd be nearly impossible to pull off in C. Even LDAP,
which is defined more in terms of APIs than languages, forgoes trying
to define a predicate API and uses a domain-specific filtering language
instead.
There's certainly a useful subset of SQL that could be trivially
replaced with an API. Simple by-the-numbers CRUD queries don't exercise
much of SQL's power. In fact, we can do that already: any ORM can
handle that level just fine.
-o
|
|
0
|
|
|
|
Reply
|
Owen
|
6/28/2010 3:18:25 AM
|
|
On 2010-06-27 22:51:59 -0400, Carl Banks said:
> On Jun 27, 3:20�pm, Roy Smith <r...@panix.com> wrote:
>> In article
>> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
>> �Carl Banks <pavlovevide...@gmail.com> wrote:
>>
>>
>>
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>>> How about this instead (where this a direct binary interface to the
>>> library):
>>
>>> results = rdb_query(table = model,
>>> � � � � � � � � � � columns = [model.name, model.number])
>>
>>> results = rdb_inner_join(tables = [records,tags],
>>> � � � � � � � � � � � � �joins = [(records.id,tags.record_id)]),
>>> � � � � � � � � � � � � �columns = [record.name, tag.name])
>>
>>> Well, we know the real reason is that C, Java, and friends lack
>>> expressiveness and so constructing a binary query is an ASCII
>>> nightmare. �Still, it hasn't stopped binary APIs in other kinds of
>>> libraries.
>>
>> Well, the answer to that one is simple. �SQL, in the hands of somebody
>> like me, can be used to express a few pathetic joins and what I do with
>> it could probably be handled with the kind of API you're describing. �
>> But, the language has far more expressivity than that, and a
>> domain-specific language is really a good fit for what it can do.
>
> I'm not the biggest expert on SQL ever, but the only thing I can think
> of is expressions. Statements don't express anything very complex,
> and could straightforwardly be represented by function calls. But
> it's a fair point.
Off the top of my head, I can think of a few things that would be
tricky to turn into an API:
* Aggregation (GROUP BY, aggregate functions over arbitrary
expressions, HAVING clauses).
* CASE expressions.
* Subqueries.
* Recursive queries (in DBMSes that support them).
* Window clauses (likewise).
* Set operations between queries (UNION, DIFFERENCE, INTERSECT).
* A surprisingly rich set of JOIN clauses beyond the obvious inner
natural joins.
* Various DBMS-specific locking hints.
* Computed inserts and updates.
* Updates and deletes that include joins.
* RETURNING lists on modification queries.
* Explicit (DBMS-side) cursors.
This is by no means an exhaustive list.
Of course, it's possible to represent all of this via an API rather
than a language, and libraries like SQLAlchemy make a reasonable
attempt�at doing just that. However, not every programming language has
the kind of structural flexibility to do that well: a library similar
to SQLalchemy would be incredibly clunky (if it worked at all) in, say,
Java or C#, and it'd be nearly impossible to pull off in C. Even LDAP,
which is defined more in terms of APIs than languages, forgoes trying
to define a predicate API and uses a domain-specific filtering language
instead.
There's certainly a useful subset of SQL that could be trivially
replaced with an API. Simple by-the-numbers CRUD queries don't exercise
much of SQL's power. In fact, we can do that already: any ORM can
handle that level just fine.
-o
|
|
0
|
|
|
|
Reply
|
Owen
|
6/28/2010 3:19:37 AM
|
|
Carl Banks <pavlovevidence@gmail.com> writes:
> On Jun 27, 4:35 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> > Carl Banks <pavlovevide...@gmail.com> writes:
> > > Seriously, almost every other kind of library uses a binary API.
> >
> > Except for the huge number that deal with text protocols or languages.
>
> No, not really. Almost all types of libraries have binary APIs,
> including those that deal with text protocols or language. Any
> control with string commands is something that's built on top of the
> binary API.
I don't know what you mean by this.
Are you referring to the operating system's function call API? It's
trivially true that the OS function call API is “binary”, but that
doesn't seem useful for distinguishing; by that definiition, SQL isn't a
“library API” at all. So I assumed you didn't mean that.
Rather, I was taking you to mean the network API used for communicating
with the server; and it's in that context that I'm saying there are a
huge number of text-based network APIs.
If that's not what you mean either, then I need you to explain.
> I'm disappointed, usually when you sit on your reinforced soapbox and
> pretense the air of infinite expertise you at least use reasonable
> logic.
Kindly stop inventing straw men to attack; I deny the position you're
painting for me.
> Also, I was asking about databases. "SQL is a text language" is not
> the answer to the question "Why do RDBs use string commands instead of
> binary APIs"?
To that question, I'd say that SQL isn't a library API, but rather a
network API and a command API, and is thus well implemented with textual
commands.
--
\ “[W]e are still the first generation of users, and for all that |
`\ we may have invented the net, we still don't really get it.” |
_o__) —Douglas Adams |
Ben Finney
|
|
0
|
|
|
|
Reply
|
Ben
|
6/28/2010 3:33:23 AM
|
|
On Jun 27, 8:19=A0pm, Owen Jacobson <angrybald...@gmail.com> wrote:
> On 2010-06-27 22:51:59 -0400, Carl Banks said:
> > On Jun 27, 3:20 pm, Roy Smith <r...@panix.com> wrote:
> >> In article
> >> <14e44c9c-04d9-452d-b544-498adfaf7...@d8g2000yqf.googlegroups.com>,
> >> Carl Banks <pavlovevide...@gmail.com> wrote:
>
> >>> Seriously, almost every other kind of library uses a binary API. What
> >>> makes databases so special that they need a string-command based API?
> >>> How about this instead (where this a direct binary interface to the
> >>> library):
>
> >>> results =3D rdb_query(table =3D model,
> >>> columns =3D [model.name, model.number])
>
> >>> results =3D rdb_inner_join(tables =3D [records,tags],
> >>> joins =3D [(records.id,tags.record_id)]),
> >>> columns =3D [record.name, tag.name])
>
> >>> Well, we know the real reason is that C, Java, and friends lack
> >>> expressiveness and so constructing a binary query is an ASCII
> >>> nightmare. Still, it hasn't stopped binary APIs in other kinds of
> >>> libraries.
>
> >> Well, the answer to that one is simple. SQL, in the hands of somebody
> >> like me, can be used to express a few pathetic joins and what I do wit=
h
> >> it could probably be handled with the kind of API you're describing.
> >> But, the language has far more expressivity than that, and a
> >> domain-specific language is really a good fit for what it can do.
>
> > I'm not the biggest expert on SQL ever, but the only thing I can think
> > of is expressions. =A0Statements don't express anything very complex,
> > and could straightforwardly be represented by function calls. =A0But
> > it's a fair point.
>
> Off the top of my head, I can think of a few things that would be
> tricky to turn into an API:
>
> =A0* Aggregation (GROUP BY, aggregate functions over arbitrary
> expressions, HAVING clauses).
> =A0* CASE expressions.
> =A0* Subqueries.
> =A0* Recursive queries (in DBMSes that support them).
> =A0* Window clauses (likewise).
> =A0* Set operations between queries (UNION, DIFFERENCE, INTERSECT).
> =A0* A surprisingly rich set of JOIN clauses beyond the obvious inner
> natural joins.
> =A0* Various DBMS-specific locking hints.
> =A0* Computed inserts and updates.
> =A0* Updates and deletes that include joins.
> =A0* RETURNING lists on modification queries.
> =A0* Explicit (DBMS-side) cursors.
>
> This is by no means an exhaustive list.
I don't know the exact details of all of these, but I'm going to opine
that at least some of these are easily expressible with a function
call API. Perhaps more naturally than with string queries. For
instance, set operations:
query1 =3D rdb_query(...)
query2 =3D rdb_query(...)
final_query =3D rdb_union(query1,query2)
or
final_query =3D query1 & query2
I'm not sure why GROUP BY couldn't be expressed by a keyword
argument. The complexity of aggregate functions and computed inserts
comes mainly from expressions (which Roy Smith already mentioned), the
actual statements are simple.
> Of course, it's possible to represent all of this via an API rather
> than a language, and libraries like SQLAlchemy make a reasonable
> attempt at doing just that. However, not every programming language has
> the kind of structural flexibility to do that well: a library similar
> to SQLalchemy would be incredibly clunky (if it worked at all) in, say,
> Java or C#, and it'd be nearly impossible to pull off in C.
Yeah, which was kind of my original theory.
Carl Banks
|
|
0
|
|
|
|
Reply
|
pavlovevidence (1338)
|
6/28/2010 3:48:07 AM
|
|
On 6/27/10 7:51 PM, Carl Banks wrote:
> I'm not the biggest expert on SQL ever, but the only thing I can think
> of is expressions. Statements don't express anything very complex,
> and could straightforwardly be represented by function calls.
See, there's really two kinds of SQL out there.
There's the layman's SQL which is pretty straight-forward. Sure, it can
start looking a little complicated if you get multiple clauses in the
WHERE line (and maybe you're ambitious and do a simple inner join), but
its probably still not bad. That can get translated into an API pretty
easily.
Then there's the type of SQL that results in DBA's having jobs-- and
deservedly so. Its *really* a very flexible and powerful language
capable of doing quite a lot to bend, flex, twist, and interleave that
data in the server while building up a result set for you.
I'm honestly only really in the former camp with a toe into the latter
(I use aggregation and windowing functions over some interesting joins
on occasion, but it takes effort). So I can't give a lot of serious
examples to *prove* I'm right.
So I just have to say: based on my experience and admittedly limited
imagination, converting the full expressive power of SQL into a regular
sort of API would be a very, very, very hairy sort of mess. SQLAlchemy
can do the layman's SQL, and can *kind of* do a *little bit* of the
advanced stuff-- but usually, it does the advanced stuff by just making
it very easy for you to shove it out of the way and do SQL directly.
But still: that's the structured part of SQL which belongs in a string.
The data does not. It should be obvious that when a database provides
you a mechanism to pass data in such that it doesn't need sanitization*
at all, that's preferable to actually doing sanitization, even if you're
divinely capable of perfect sanitization and even if sanitization is a
trivial task that a monkey should be able to handle.
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
P.S. *My computer /swears/ sanitization is spelled wrong. Either I'm
high or it's high. Stupid old school mac mini.
|
|
0
|
|
|
|
Reply
|
Stephen
|
6/28/2010 3:52:01 AM
|
|
On 6/27/10 8:48 PM, Carl Banks wrote:
> I don't know the exact details of all of these, but I'm going to opine
> that at least some of these are easily expressible with a function
> call API. Perhaps more naturally than with string queries. For
> instance, set operations:
>
> query1 = rdb_query(...)
> query2 = rdb_query(...)
>
> final_query = rdb_union(query1,query2)
>
> or
>
> final_query = query1& query2
But, see, that's not actually what's going on behind the scenes in the
database. Unless your "query1" and "query2" objects are opaque
pseudo-objects which do not actually represent results -- the query
planners do a *lot* of stuff by looking at the whole query and computing
just how to go about executing all of the instructions.
The engine of a SQL database is a pretty sophisticated little pieces of
coding. Because SQL is declarative, the engine is able to optimize just
how to do everything when it looks at the full query, and even try out a
few different ideas at first before deciding on just which path to take.
(This is an area where parametrized queries is even more important: but
I'm not sure if MySQL does proper prepared queries and caching of
execution plans).
If you go and API it, then you're actually imposing an order on how it
processes the query... unless your API is just a sort of opaque wrapper
for some underlining declarative structure. (Like ORM's try to be)
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
|
|
0
|
|
|
|
Reply
|
Stephen
|
6/28/2010 4:02:57 AM
|
|
On Jun 27, 8:52=A0pm, Stephen Hansen <me+list/pyt...@ixokai.io> wrote:
> Then there's the type of SQL that results in DBA's having jobs-- and
> deservedly so. Its *really* a very flexible and powerful language
> capable of doing quite a lot to bend, flex, twist, and interleave that
> data in the server while building up a result set for you.
All right, I get it.
I'm not talking about SQL, I'm talking about RDBs. But I guess it is
important for serious RDBs to support queries complex enough that a
language like SQL is really needed to express it--even if being called
from an expressive language like Python. Not everything is a simple
inner joins. I defer to the community then, as my knowledge of
advanced SQL is minimal.
We'll just have accept the risk of injection attacks as a trade off,
and try to educate people to use placeholders when writing SQL.
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/28/2010 4:12:30 AM
|
|
On 2010-06-28 00:02:57 -0400, Stephen Hansen said:
> On 6/27/10 8:48 PM, Carl Banks wrote:
>> I don't know the exact details of all of these, but I'm going to opine
>> that at least some of these are easily expressible with a function
>> call API. Perhaps more naturally than with string queries. For
>> instance, set operations:
>>
>> query1 = rdb_query(...)
>> query2 = rdb_query(...)
>>
>> final_query = rdb_union(query1,query2)
>>
>> or
>>
>> final_query = query1& query2
>
> But, see, that's not actually what's going on behind the scenes in the
> database. Unless your "query1" and "query2" objects are opaque
> pseudo-objects which do not actually represent results -- the query
> planners do a *lot* of stuff by looking at the whole query and
> computing just how to go about executing all of the instructions.
I believe that *is* his point: that we can replace the SQL language
with a "query object model" that lets us specify what we want without
resorting to string-whacking when our needs are dynamic, without
changing the rest of the workflow. This is obviously true: each RDBMS
does something very much like what Carl is proposing, internally.
However, implementing such an API usefully (never mind comfortably) in
a cross-language way is... difficult, and an RDBMS that can only be
used from Python (or even from Python and other Smalltalk-like
languages) is not terribly useful at all.
-o
|
|
0
|
|
|
|
Reply
|
Owen
|
6/28/2010 4:25:59 AM
|
|
On Jun 27, 8:33=A0pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> Carl Banks <pavlovevide...@gmail.com> writes:
> > I'm disappointed, usually when you sit on your reinforced soapbox and
> > pretense the air of infinite expertise you at least use reasonable
> > logic.
>
> Kindly stop inventing straw men to attack; I deny the position you're
> painting for me.
No, this is not a straw man, you are 100% percent guilty of circular
logic as I accused you of.
Plus, I will not kindly do anything for you unless you kindly stop
being condescending and self-righteous when answering questions and
start treating people with respect. It's not just me, you do it to
newbies who have reasonable questions and you end up making them feel
like assholes just for asking. You don't just act that way to newbies
that deserve it. You are part of the reason people are here accusing
the Python community of being unfriendly and unhelpful.
And that's not a strawman, either.
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/28/2010 4:28:33 AM
|
|
On Jun 27, 9:02=A0pm, Stephen Hansen <me+list/pyt...@ixokai.io> wrote:
> On 6/27/10 8:48 PM, Carl Banks wrote:
>
> > I don't know the exact details of all of these, but I'm going to opine
> > that at least some of these are easily expressible with a function
> > call API. =A0Perhaps more naturally than with string queries. =A0For
> > instance, set operations:
>
> > query1 =3D rdb_query(...)
> > query2 =3D rdb_query(...)
>
> > final_query =3D rdb_union(query1,query2)
>
> > or
>
> > final_query =3D query1& =A0query2
>
> But, see, that's not actually what's going on behind the scenes in the
> database. Unless your "query1" and "query2" objects are opaque
> pseudo-objects which do not actually represent results
That's exactly what they are. Nothing is actually sent to the
database until the user starts retrieving results. This is fairly
common thing for some interfaces to do.
For instance, OpenGL almost always returns immediately after a command
is posted without doing anything. The driver will queue the command
in memory until some event happens to trigger it (maybe a signal from
the graphics that is is done processing commands, or the queue being
full, or an explicit flush request from the user).
Incidentally, OpenGL has its own DSL for per-vertex and per-pixel
operations (known as vertex and fragment shaders) that replaces an
older binary API. I daresay it's a little less at risk for an
injection attack, seeing that the shaders run on the GPU and only run
simple math operations. But you never know.
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/28/2010 4:43:14 AM
|
|
On Sun, Jun 27, 2010 at 5:16 PM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <mailman.2184.1277626565.32709.python-list@python.org>, Kushal
> Kumaran wrote:
>
>> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
>> <ldo@geek-central.gen.new_zealand> wrote:
>>
>>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>>
>>>> I recently fixed a bug in some production code. =C2=A0The programmer w=
as
>>>> careful to use snprintf() to avoid buffer overflows. =C2=A0The only pr=
oblem
>>>> is, he wrote something along the lines of:
>>>>
>>>> snprintf(buf, strlen(foo), foo);
>>>
>>> A long while ago I came up with this macro:
>>>
>>> #define Descr(v) &v, sizeof v
>>>
>>> making the correct version of the above become
>>>
>>> snprintf(Descr(buf), foo);
>>
>> Not quite right. =C2=A0If buf is a char array, as suggested by the use o=
f
>> sizeof, then you're not passing a char* to snprintf.
>
> What am I passing, then?
Here's what gcc tells me (I declared buf as char buf[512]):
sprintf.c:8: warning: passing argument 1 of =E2=80=98snprintf=E2=80=99 from
incompatible pointer type
/usr/include/stdio.h:363: note: expected =E2=80=98char * __restrict__=E2=80=
=99 but
argument is of type =E2=80=98char (*)[512]=E2=80=99
You just need to lose the & from the macro.
--=20
regards,
kushal
|
|
0
|
|
|
|
Reply
|
python2058 (92)
|
6/28/2010 4:47:56 AM
|
|
Carl Banks <pavlovevidence@gmail.com> writes:
> On Jun 27, 8:33 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> > Carl Banks <pavlovevide...@gmail.com> writes:
> > > I'm disappointed, usually when you sit on your reinforced soapbox and
> > > pretense the air of infinite expertise you at least use reasonable
> > > logic.
> >
> > Kindly stop inventing straw men to attack; I deny the position you're
> > painting for me.
>
> No, this is not a straw man, you are 100% percent guilty of circular
> logic as I accused you of.
The straw man you attacked is as I quoted above.
The claim of circular logic is a separate point, and I addressed it in
the rest of the message. Like you, I stripped the part of the message
that I was not responding to specifically.
> Plus, I will not kindly do anything for you unless you kindly stop
> being condescending and self-righteous when answering questions and
> start treating people with respect.
I always endeavour to treat people with respect, and I leave it to the
independent reader to decide how successful I am in that endeavour.
Respect for a person, though, entails subjecting that person's
statements to criticism where appropriate. Don't mistake exposure of
flaws for self-righteousness, nor criticism for condescension.
This isn't a forum for discussing my style, so I'll limit this message
to merely addressing these slurs.
--
\ “The long-term solution to mountains of waste is not more |
`\ landfill sites but fewer shopping centres.” —Clive Hamilton, |
_o__) _Affluenza_, 2005 |
Ben Finney
|
|
0
|
|
|
|
Reply
|
Ben
|
6/28/2010 4:53:27 AM
|
|
On Mon, Jun 28, 2010 at 2:00 AM, Jorgen Grahn <grahn+nntp@snipabacken.se> w=
rote:
> On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>
>>> I recently fixed a bug in some production code. =C2=A0The programmer wa=
s
>>> careful to use snprintf() to avoid buffer overflows. =C2=A0The only pro=
blem
>>> is, he wrote something along the lines of:
>>>
>>> snprintf(buf, strlen(foo), foo);
>>
>> A long while ago I came up with this macro:
>>
>> =C2=A0 =C2=A0 #define Descr(v) &v, sizeof v
>>
>> making the correct version of the above become
>>
>> =C2=A0 =C2=A0 snprintf(Descr(buf), foo);
>
> This is off-topic, but I believe snprintf() in C can *never* safely be
> the only thing you do to the buffer: you also have to NUL-terminate it
> manually in some corner cases. See the documentation.
>
snprintf goes to great lengths to be safe, in fact. You might be
thinking of strncpy.
--=20
regards,
kushal
|
|
0
|
|
|
|
Reply
|
python2058 (92)
|
6/28/2010 4:54:23 AM
|
|
On Jun 27, 9:54=A0pm, Kushal Kumaran <kushal.kumaran+pyt...@gmail.com>
wrote:
> On Mon, Jun 28, 2010 at 2:00 AM, Jorgen Grahn <grahn+n...@snipabacken.se>=
wrote:
> > On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
> >> In message <roy-854954.20435125062...@news.panix.com>, Roy Smith wrote=
:
>
> >>> I recently fixed a bug in some production code. =A0The programmer was
> >>> careful to use snprintf() to avoid buffer overflows. =A0The only prob=
lem
> >>> is, he wrote something along the lines of:
>
> >>> snprintf(buf, strlen(foo), foo);
>
> >> A long while ago I came up with this macro:
>
> >> =A0 =A0 #define Descr(v) &v, sizeof v
>
> >> making the correct version of the above become
>
> >> =A0 =A0 snprintf(Descr(buf), foo);
>
> > This is off-topic, but I believe snprintf() in C can *never* safely be
> > the only thing you do to the buffer: you also have to NUL-terminate it
> > manually in some corner cases. See the documentation.
>
> snprintf goes to great lengths to be safe, in fact. =A0You might be
> thinking of strncpy.
Indeed, strncpy does not copy that final NUL if it's at or beyond the
nth element. Probably the most mind-bogglingly stupid thing about the
standard C library, which has lots of mind-boggling stupidity.
Whenever I do an audit of someone's C code the first thing I do is
search for strncpy and see if they set the nth character to 0. (They
usually didn't.)
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/28/2010 5:07:10 AM
|
|
On Mon, 2010-06-28, Kushal Kumaran wrote:
> On Mon, Jun 28, 2010 at 2:00 AM, Jorgen Grahn <grahn+nntp@snipabacken.se> wrote:
>> On Sun, 2010-06-27, Lawrence D'Oliveiro wrote:
>>> In message <roy-854954.20435125062010@news.panix.com>, Roy Smith wrote:
>>>
>>>> I recently fixed a bug in some production code. �The programmer was
>>>> careful to use snprintf() to avoid buffer overflows. �The only problem
>>>> is, he wrote something along the lines of:
>>>>
>>>> snprintf(buf, strlen(foo), foo);
>>>
>>> A long while ago I came up with this macro:
>>>
>>> � � #define Descr(v) &v, sizeof v
>>>
>>> making the correct version of the above become
>>>
>>> � � snprintf(Descr(buf), foo);
>>
>> This is off-topic, but I believe snprintf() in C can *never* safely be
>> the only thing you do to the buffer: you also have to NUL-terminate it
>> manually in some corner cases. See the documentation.
>
> snprintf goes to great lengths to be safe, in fact. You might be
> thinking of strncpy.
Yes, it was indeed strncpy I was thinking of. Thanks.
But actually, the snprintf(3) man page I have is not 100% clear on
this issue, so last time I used it, I added a manual NUL-termination
plus a comment saying I wasn't sure it was needed. I normally use C++
or Python, so I am a bit rusty on these things.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/28/2010 7:58:34 AM
|
|
Carl Banks wrote:
> Indeed, strncpy does not copy that final NUL if it's at or beyond the
> nth element. Probably the most mind-bogglingly stupid thing about the
> standard C library, which has lots of mind-boggling stupidity.
I don't think it was as stupid as that back when C was
designed. Every byte of memory was precious in those days,
and if you had, say, 10 bytes allocated for a string, you
wanted to be able to use all 10 of them for useful data.
So the convention was that a NUL byte was used to mark
the end of the string *if it didn't fill all the available
space*. Functions such as strncpy and snprintf are designed
for use with strings that follow this convention. Proper
usage requires being cognizant of the maximum length and
using appropriate length-limited functions for all operations
on such strings.
--
Greg
|
|
0
|
|
|
|
Reply
|
Gregory
|
6/28/2010 9:44:34 AM
|
|
Gregory Ewing <greg.ewing@canterbury.ac.nz> writes:
> I don't think it was as stupid as that back when C was
> designed. Every byte of memory was precious in those days,
> and if you had, say, 10 bytes allocated for a string, you
> wanted to be able to use all 10 of them for useful data.
No I don't think so. Traditional C strings simply didn't carry length
info except for the nul byte at the end. Most string functions expected
the nul to be there. The nul byte convention (instead of having a
header word with a length) arguably saved some space both by eliminating
a multi-byte header and by allowing trailing substrings to be
represented as pointers into a larger string. In retrospect it seems
like a big error.
|
|
0
|
|
|
|
Reply
|
Paul
|
6/28/2010 9:48:34 AM
|
|
On Sun, 27 Jun 2010 21:02:57 -0700, Stephen Hansen
<me+list/python@ixokai.io> declaimed the following in
gmane.comp.python.general:
> (This is an area where parametrized queries is even more important: but
> I'm not sure if MySQL does proper prepared queries and caching of
> execution plans).
MySQL version 5 finally added prepared statements and a discrete
parameter passing mechanism...
However, since there likely are many MySQL v4.x installations out
there, which only work with complete string SQL, MySQLdb still formats
full SQL statements (and it uses the Python % string interpolation to do
that, after converting/escaping parameters -- which is why %s is the
only allowed placeholder; even a numeric parameter has been converted to
a quoted string before being inserted in the SQL).
It would be nice if MySQLdb could become version aware in a future
release, and use prepared statements on v5 engines... I doubt it can
drop the existing string based queries any time soon... Consider the
arguments about how long Python 2.x will be in use (I'm still on 2.5)...
Imagine the sluggishness in having database engines converted
(especially in a shared provider environment, where the language
specific adapters also need updating -- ODBC drivers, etc.)
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
|
|
0
|
|
|
|
Reply
|
Dennis
|
6/28/2010 10:07:29 AM
|
|
On Sun, 27 Jun 2010 21:12:30 -0700 (PDT), Carl Banks
<pavlovevidence@gmail.com> declaimed the following in
gmane.comp.python.general:
> I'm not talking about SQL, I'm talking about RDBs. But I guess it is
> important for serious RDBs to support queries complex enough that a
> language like SQL is really needed to express it--even if being called
> from an expressive language like Python. Not everything is a simple
> inner joins. I defer to the community then, as my knowledge of
> advanced SQL is minimal.
>
SQL is almost a hybrid of relational algebra and relational
calculus, though typically considered more in the latter category (the
simplistic definition of the two is the one specifies /how to/ obtain a
result in RA, whereas in RC one specifies /what/ the result should look
like and let the engine figure out how to generate it.
"Select field, field, ..., field from ..." is algebra "project"
operation... In RA you'd have to specify the steps...
x1 = join(t1, t2)
x2 = restrict(x1, t1.fld1 = t2.fld3)
result = select(x2, field, ..., field)
SQL:
select field, ..., field, from t1, t2 where t1.fld1 = t2.fld3
(implicit join, just as the algebra is a full cross product)
The classical example of RC is IBM's QBE (query by example) -- which
drew single record tables on the screen, and one filled in a result
table with references to fields in the sources, and included (somehow)
the join criteria...
Somewhere in storage I should have a 400 page text on relational
database theory, which covers relational algebra and calculus, but
predates SQL.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
|
|
0
|
|
|
|
Reply
|
Dennis
|
6/28/2010 10:07:29 AM
|
|
On Sun, 27 Jun 2010 21:49:11 -0400, Owen Jacobson
<angrybaldguy@gmail.com> declaimed the following in
gmane.comp.python.general:
> 4. MySQL AB finally get off their collective duffs and adds real
> parameter separation to the MySQL wire protocol, and implements real
> prepared statements to massive speed gains in scenarios that are
They did with version 5 of MySQL... Also added triggers and stored
procedures as I recall (though possibly limited functionality). But
MySQLdb is still compatible with versions 3.x and 4.x (with some
difficulties in the connection string password handling). IT is what is
not version aware and uses the old established "complete SQL string"
query system.
Does MySQL AB still exist? I thought Sun absorbed MySQL, and Oracle
has absorbed Sun...
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
|
|
0
|
|
|
|
Reply
|
wlfraed (4456)
|
6/28/2010 10:07:29 AM
|
|
On Sun, 27 Jun 2010 19:35:05 -0700 (PDT), Carl Banks
<pavlovevidence@gmail.com> declaimed the following in
gmane.comp.python.general:
>
> Also, I was asking about databases. "SQL is a text language" is not
> the answer to the question "Why do RDBs use string commands instead of
> binary APIs"?
>
Try this: Why do RDBMs use SQL?
Prior to SQL (and relational databases) become common, one had to
learn an interface that was specific to each database engine (and had
quite different look&feel if the underlying engine was hierarchical or
DBTG network [relational was mostly a theoretical view for manipulating
databases stored under hierarchical or network engines]). If one was
lucky, there was even an interactive query language processor.
Coding for something like a DBTG network database did not allow for
easy changes in queries... What would be a simple join in SQL was
traversing a circular linked list in the DBTG database my college
taught. EG: loop get next "master" record; loop get next sub-record
[etc. until all needed data retrieved] until back to master; until back
to top of database.
SQL started as an interactive query language, meant to typed by
(knowledgeable) users at a command prompt. But since it melded with
relational databases so well it became a de facto standard query
language not only for interactive queries but as a common semi-portable
API for embedding into code -- no DBMS specific procedural function
library needed, just one interface to send a query, and one to retrieve
result records.
(Ever notice the cyclic history -- 50s lots of mixed flat files, 60s
hierarchical databases [in which some master record type has links to
related records -- but the data is stored in a tree so finding data fast
really needed careful database design to avoid having to traverse too
much of the tree; imagine needing to read department information records
to access personnel records to access promotion/pay-raise records to
find the current pay rate to produce the weekly paycheck for an
employee; and if the employee changes department you have to move all
their personnel data [promotion history, etc] from one link to another;
or duplicate the personnel record saving an "end date" in the first
department record and an effective start date on the new department
copy], 70s with network [easier to traverse as each record type could
link to any other record type -- doing payroll did not require reading
department records], 80s relational wherein nothing is linked via
pointers but only by logical comparisons of fields [and so easily
implemented as sets of flat files again <G>])
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
|
|
0
|
|
|
|
Reply
|
wlfraed (4456)
|
6/28/2010 10:07:29 AM
|
|
In message <mailman.2231.1277700501.32709.python-list@python.org>, Kushal
Kumaran wrote:
> On Sun, Jun 27, 2010 at 5:16 PM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>>In message <mailman.2184.1277626565.32709.python-list@python.org>, Kushal
>> Kumaran wrote:
>>
>>> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
>>> <ldo@geek-central.gen.new_zealand> wrote:
>>>
>>>> A long while ago I came up with this macro:
>>>>
>>>> #define Descr(v) &v, sizeof v
>>>>
>>>> making the correct version of the above become
>>>>
>>>> snprintf(Descr(buf), foo);
>>>
>>> Not quite right. If buf is a char array, as suggested by the use of
>>> sizeof, then you're not passing a char* to snprintf.
>>
>> What am I passing, then?
>
> Here's what gcc tells me (I declared buf as char buf[512]):
> sprintf.c:8: warning: passing argument 1 of ‘snprintf’ from
> incompatible pointer type
> /usr/include/stdio.h:363: note: expected ‘char * __restrict__’ but
> argument is of type ‘char (*)[512]’
>
> You just need to lose the & from the macro.
Why does this work, then:
ldo@theon:hack> cat test.c
#include <stdio.h>
int main(int argc, char ** argv)
{
char buf[512];
const int a = 2, b = 3;
snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
fprintf(stdout, buf);
return
0;
} /*main*/
ldo@theon:hack> ./test
2 + 3 = 5
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/29/2010 12:26:43 AM
|
|
In message
<14e44c9c-04d9-452d-b544-498adfaf7d40@d8g2000yqf.googlegroups.com>, Carl
Banks wrote:
> Seriously, almost every other kind of library uses a binary API. What
> makes databases so special that they need a string-command based API?
HTML is also effectively a string-based API. And what about regular
expressions? And all the functionality available through the subprocess
module and its predecessors?
The reality is, embedding one language within another is a fact of life. I
think it’s important for programmers to be able to deal correctly with it.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/29/2010 12:30:36 AM
|
|
In message <pan.2010.06.27.13.55.04.500000@nowhere.com>, Nobody wrote:
> On Sun, 27 Jun 2010 14:36:10 +1200, Lawrence D'Oliveiro wrote:
>
>> Except nobody has yet shown an alternative which is easier to get right.
>
> For SQL, use stored procedures or prepared statements.
So feel free to rewrite my example using either stored procedures or
prepared statements, to prove how much easier it is.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/29/2010 12:32:19 AM
|
|
In article <7xmxuffpxp.fsf@ruckus.brouhaha.com>,
Paul Rubin <no.email@nospam.invalid> wrote:
> Gregory Ewing <greg.ewing@canterbury.ac.nz> writes:
> > I don't think it was as stupid as that back when C was
> > designed. Every byte of memory was precious in those days,
> > and if you had, say, 10 bytes allocated for a string, you
> > wanted to be able to use all 10 of them for useful data.
>
> No I don't think so. Traditional C strings simply didn't carry length
> info except for the nul byte at the end. Most string functions expected
> the nul to be there. The nul byte convention (instead of having a
> header word with a length) arguably saved some space both by eliminating
> a multi-byte header and by allowing trailing substrings to be
> represented as pointers into a larger string. In retrospect it seems
> like a big error.
Null-terminated strings predate C. Various assembler languages had
ASCIIZ (or similar) directives long before that.
The nice thing about null-terminated strings is how portable they have
been over various word lengths. Life would have been truly inconvenient
if K&R had picked, say, a 16-bit length field, and then we needed to
bump that up to 32 bits in the 80's, and again to 64 bits in the 90's.
|
|
0
|
|
|
|
Reply
|
Roy
|
6/29/2010 12:55:53 AM
|
|
On Mon, 28 Jun 2010 20:55:53 -0400, Roy Smith wrote:
> The nice thing about null-terminated strings is how portable they have
> been over various word lengths. Life would have been truly inconvenient
> if K&R had picked, say, a 16-bit length field, and then we needed to
> bump that up to 32 bits in the 80's, and again to 64 bits in the 90's.
Or a Pascal 8 bit length field.
However the cost of null-terminated strings is that they can't store
binary data, and worse, they're slow. In fact, according to some, null-
terminated strings are the *worst* way to implement a string type.
http://www.joelonsoftware.com/articles/fog0000000319.html
--
Steven
|
|
0
|
|
|
|
Reply
|
Steven
|
6/29/2010 2:07:50 AM
|
|
On Mon, 28 Jun 2010 03:07:29 -0700, Dennis Lee Bieber wrote:
> Coding for something like a DBTG network database did not allow for
> easy changes in queries... What would be a simple join in SQL was
> traversing a circular linked list in the DBTG database my college
> taught. EG: loop get next "master" record; loop get next sub-record
> [etc. until all needed data retrieved] until back to master; until back
> to top of database.
We'll also note that most of these you'd have to map out where each
field in a record was by hand, any time you wanted to open the file.
Often several times, because there would be multiple record layouts per
file.
--
67. No matter how many shorts we have in the system, my guards will be
instructed to treat every surveillance camera malfunction as a
full-scale emergency.
--Peter Anspach's list of things to do as an Evil Overlord
|
|
0
|
|
|
|
Reply
|
Peter
|
6/29/2010 3:25:02 AM
|
|
On Tue, Jun 29, 2010 at 5:56 AM, Lawrence D'Oliveiro
<ldo@geek-central.gen.new_zealand> wrote:
> In message <mailman.2231.1277700501.32709.python-list@python.org>, Kushal
> Kumaran wrote:
>
>> On Sun, Jun 27, 2010 at 5:16 PM, Lawrence D'Oliveiro
>> <ldo@geek-central.gen.new_zealand> wrote:
>>
>>>In message <mailman.2184.1277626565.32709.python-list@python.org>, Kusha=
l
>>> Kumaran wrote:
>>>
>>>> On Sun, Jun 27, 2010 at 9:47 AM, Lawrence D'Oliveiro
>>>> <ldo@geek-central.gen.new_zealand> wrote:
>>>>
>>>>> A long while ago I came up with this macro:
>>>>>
>>>>> #define Descr(v) &v, sizeof v
>>>>>
>>>>> making the correct version of the above become
>>>>>
>>>>> snprintf(Descr(buf), foo);
>>>>
>>>> Not quite right. =C2=A0If buf is a char array, as suggested by the use=
of
>>>> sizeof, then you're not passing a char* to snprintf.
>>>
>>> What am I passing, then?
>>
>> Here's what gcc tells me (I declared buf as char buf[512]):
>> sprintf.c:8: warning: passing argument 1 of =E2=80=98snprintf=E2=80=99 f=
rom
>> incompatible pointer type
>> /usr/include/stdio.h:363: note: expected =E2=80=98char * __restrict__=E2=
=80=99 but
>> argument is of type =E2=80=98char (*)[512]=E2=80=99
>>
>> You just need to lose the & from the macro.
>
> Why does this work, then:
>
> ldo@theon:hack> cat test.c
> #include <stdio.h>
>
> int main(int argc, char ** argv)
> =C2=A0{
> =C2=A0 =C2=A0char buf[512];
> =C2=A0 =C2=A0const int a =3D 2, b =3D 3;
> =C2=A0 =C2=A0snprintf(&buf, sizeof buf, "%d + %d =3D %d\n", a, b, a + b);
> =C2=A0 =C2=A0fprintf(stdout, buf);
> =C2=A0 =C2=A0return
> =C2=A0 =C2=A0 =C2=A0 =C2=A00;
> =C2=A0} /*main*/
> ldo@theon:hack> ./test
> 2 + 3 =3D 5
>
By accident. I hope your compiler warned you about your snprintf call.
Reading these threads might help you understand how char* and char
(*)[512] are different:
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/24708a920=
4061ce/848ceaf5ec774d81
http://groups.google.com/group/comp.lang.c.moderated/browse_thread/thread/f=
e264c550947a2e5/32b330cdf8aba3d6
--=20
regards,
kushal
|
|
0
|
|
|
|
Reply
|
python2058 (92)
|
6/29/2010 4:19:13 AM
|
|
On Tue, 29 Jun 2010 03:25:02 GMT, "Peter H. Coffin"
<hellsop@ninehells.com> declaimed the following in
gmane.comp.python.general:
> We'll also note that most of these you'd have to map out where each
> field in a record was by hand, any time you wanted to open the file.
> Often several times, because there would be multiple record layouts per
> file.
Ah yes -- you did have to know the entire record structure
beforehand to create the "image" for processing...
And the database engine on the Xerox Sigma running CP/V really got
nasty -- you had to preallocate the expected disk space for the entire
database ahead of time. The engine used was CP/V* called a "random" file
-- you asked for a CONTIGUOUS chunk of disk space, and the OS maintained
NO information about the contents (not even an equivalent of EOF).
* CP/V had some interesting features: file types of "consecutive",
"keyed", and "random". As mentioned, "random" also implied contiguous
disk space allocation; "consecutive" and "keyed" could be disjoint disk
sectors. "Consecutive" is closest to the UNIX "stream"; start from the
beginning and just read... "Keyed" were ISAM files -- and were the most
common file type! The line editor (mid-70s here, editor was line
oriented) used "line numbers" as ISAM keys, so even source code was
being stored in an ISAM file (and the FORTRAN direct access I/O "record
number", as in
read(unit, rec=#) buffer
was not the more common record_length * (#-1) offset; it was an ISAM
key!)
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
|
|
0
|
|
|
|
Reply
|
Dennis
|
6/29/2010 7:14:32 AM
|
|
On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
>> Seriously, almost every other kind of library uses a binary API. What
>> makes databases so special that they need a string-command based API?
>
> HTML is also effectively a string-based API.
HTML is a data format. The sane way to construct or manipulate HTML is via
the DOM, not string operations.
> And what about regular expressions?
What about them? As the saying goes:
Some people, when confronted with a problem, think
"I know, I'll use regular expressions."
Now they have two problems.
They have some uses, e.g. defining tokens[1]. Using them to match more
complex constructs is error-prone and should generally be avoided unless
you're going to manually verify the result. Oh, and you should never
generate regexps dynamically; that way madness lies.
[1] Assuming that the language's tokens can be described by a regular
grammar. This isn't always the case, e.g. you can't tokenise PostScript
using regexps, as string literals can contain nested parentheses.
> And all the functionality available through the subprocess
> module and its predecessors?
The main reason why everyone recommends subprocess over its predecessors
is that it allows you to bypass the shell, which is one of the most
common sources of the type of error being discussed in this thread.
IOW, rather than having to construct a shell command which (hopefully)
will pass the desired arguments to the child, you just pass the desired
arguments to the child directly, without involving the shell.
> The reality is, embedding one language within another is a fact of life. I
> think it’s important for programmers to be able to deal correctly with it.
That depends upon what you mean by "embedding". The correct way to use
code written in one language from code written in another is to make the
first accept parameters and make the second pass them, not to have the
second (try to) generate the former dynamically.
Sometimes dynamic code generation is inevitable (e.g. if you're writing a
compiler, you probably need to generate assembler or C code), but it's not
to be done lightly, and it's unwise to take shortcuts (e.g. ad-hoc string
substitutions).
|
|
0
|
|
|
|
Reply
|
Nobody
|
6/29/2010 9:35:43 AM
|
|
Owen Jacobson <angrybaldguy@gmail.com> wrote:
> However, not every programming language has
> the kind of structural flexibility to do that well: a library similar
> to SQLalchemy would be incredibly clunky (if it worked at all) in,
say,
> Java or C#, and it'd be nearly impossible to pull off in C.
I guess you've never used LINQ in C# then?
Microsoft did a pretty impressive job with LINQ: they provided a set of
methods that may be used to query SQL databases and the same methods
also work on any other sequence-like types. They also produced a DSL
that compiles into the LINQ method calls which means that those who
prefer SQL syntax can use it to process non-SQL data.
A LINQ expression produces a generator that allows you to iterate over
the result set (and you can re-use the generator so that if it depends
on the values of other variables or attributes each time you iterate you
get a different set of results).
When you use LINQ on a SQL database internally it generates the correct
SQL to produce the result set on the SQL server, when you use it on an
array or other such sequence it uses generic functions compiled for the
appropriate data types. In order to be able to do this they changed the
language to allow expressions to compile either to executable code or to
a parse tree. For example:
var participants =
Competition.GetParticipants()
.Where(participant=> participant.Score > 80)
.OrderByDescending(participant => participant.Score)
.Select(participant => new { participant.Id,
Name=participant.Name });
If this is operating on a database table the Where method is overloaded
to accept a parse tree as its argument and it can then use that to
generate SQL, but for .Net objects the Where method simply uses the
lambda expression as a callable delegate.
(example cribbed from
http://geekswithblogs.net/shahed/archive/2008/01/28/118992.aspx)
--
Duncan Booth http://kupuguy.blogspot.com
|
|
0
|
|
|
|
Reply
|
Duncan
|
6/29/2010 10:32:49 AM
|
|
Nobody <nobody@nowhere.com> wrote:
> > And what about regular expressions?
>
> What about them? As the saying goes:
>
> Some people, when confronted with a problem, think
> "I know, I'll use regular expressions."
> Now they have two problems.
That's silly. RE is a good tool. Like all good tools, it is the right
tool for some jobs and the wrong tool for others.
I've noticed over the years a significant anti-RE sentiment in the
Python community. One reason, I suppose, is because Python gives you
some good string manipulation tools, i.e. split(), startswith(),
endswith(), and the 'in' operator, which cover many of the common RE use
cases. But there are still plenty of times when a RE is the best tool
and it's worth investing the effort to learn how to use them effectively.
One tool that Python gives you which makes RE a pleasure is raw strings.
Getting rid of all those extra backslashes really helps improve
readability.
Another great feature is VERBOSE. I've written some truly complicated
REs using that, and still been able to figure out what they meant the
next day :-)
|
|
0
|
|
|
|
Reply
|
Roy
|
6/29/2010 12:41:03 PM
|
|
On 6/29/10 5:41 AM, Roy Smith wrote:
> Nobody<nobody@nowhere.com> wrote:
>
>>> And what about regular expressions?
>>
>> What about them? As the saying goes:
>>
>> Some people, when confronted with a problem, think
>> "I know, I'll use regular expressions."
>> Now they have two problems.
>
> That's silly. RE is a good tool. Like all good tools, it is the right
> tool for some jobs and the wrong tool for others.
There's nothing silly about it.
It is an exaggeration though: but it does represent a good thing to keep
in mind.
Yes, re is a tool -- and a useful one at that. But its also a tool which
/seems/ like an omnitool capable of tackling everything.
Regular expressions are a complicated mini-language well suited towards
extensive use in a unix type environment where you want to embed certain
logic of 'what to operate on' into many different commands that aren't
languages at all -- and perl embraced it to make it perl's answer to
text problems. Which is fine.
In Python, certainly it has its uses: many of them in fact, and in many
it really is the best solution.
Its not just that its the right tool for some jobs and the wrong tool
for others, or that -- as you said also -- that Python provides a rather
rich string type which can do many common tasks natively and better, but
that regular expressions live in the front of the mind for so many
people coming to the language that its the first thing they even think
of, and what should be simple becomes difficult.
So people quote that proverb. Its a good proverb. As all proverbs, its
not perfectly applicable to all situations. But it does has an important
lesson to it: you should generally not consider re to be the solution
you're looking for until you are quite sure there's nothing else to
solve the same task.
It obviously applies less to the guru's who know all about regular
expressions and their subtleties including potential pathological behavior.
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
|
|
0
|
|
|
|
Reply
|
Stephen
|
6/29/2010 2:11:48 PM
|
|
On 29/06/2010 01:55, Roy Smith wrote:
[snips]
> The nice thing about null-terminated strings is how portable they have
> been over various word lengths.
The bad thing about null-terminated strings is the number of off-by-one
errors they've helped to create. I obviously have never created an
off-by-one error myself. :)
Kindest regards.
Mark Lawrence.
|
|
0
|
|
|
|
Reply
|
Mark
|
6/29/2010 2:31:30 PM
|
|
In message <mailman.2332.1277785175.32709.python-list@python.org>, Kushal
Kumaran wrote:
> On Tue, Jun 29, 2010 at 5:56 AM, Lawrence D'Oliveiro
> <ldo@geek-central.gen.new_zealand> wrote:
>
>> Why does this work, then:
>>
>> ldo@theon:hack> cat test.c
>> #include <stdio.h>
>>
>> int main(int argc, char ** argv)
>> {
>> char buf[512];
>> const int a = 2, b = 3;
>> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>> fprintf(stdout, buf);
>> return
>> 0;
>> } /*main*/
>> ldo@theon:hack> ./test
>> 2 + 3 = 5
>
> By accident.
I have yet to find an architecture or C compiler where it DOESN’T work.
Feel free to try and prove me wrong.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/30/2010 12:25:11 AM
|
|
In message <slrni2f8v2.j19.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
wrote:
> On Sat, 2010-06-26, Lawrence D'Oliveiro wrote:
>
>> In message <slrni297ec.1m5.grahn+nntp@frailea.sa.invalid>, Jorgen Grahn
>> wrote:
>>
>>> I thought it was well-known that the solution is *not* to try to
>>> sanitize the input -- it's to switch to an interface which doesn't
>>> involve generating an intermediate executable. In the Python example,
>>> that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
>>
>> That’s what I mean. Why do people consider input sanitization so hard?
>
> I'm not sure you understood me correctly, because I advocate
> *not* doing input sanitization. Hard or not -- I don't want to know,
> because I don't want to do it.
But no-one has yet managed to come up with an alternative that involves less
work.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
6/30/2010 12:26:11 AM
|
|
On Jun 28, 3:07=A0am, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:
> On Sun, 27 Jun 2010 21:02:57 -0700, Stephen Hansen
> <me+list/pyt...@ixokai.io> declaimed the following in
> gmane.comp.python.general:
>
> > (This is an area where parametrized queries is even more important: but
> > I'm not sure if MySQL does proper prepared queries and caching of
> > execution plans).
>
> =A0 =A0 =A0 =A0 MySQL version 5 finally added prepared statements and a d=
iscrete
> parameter passing mechanism...
>
> =A0 =A0 =A0 =A0 However, since there likely are many MySQL v4.x installat=
ions out
> there, which only work with complete string SQL, MySQLdb still formats
> full SQL statements (and it uses the Python % string interpolation to do
> that, after converting/escaping parameters -- which is why %s is the
> only allowed placeholder; even a numeric parameter has been converted to
> a quoted string before being inserted in the SQL).
>
> =A0 =A0 =A0 =A0 It would be nice if MySQLdb could become version aware in=
a future
> release, and use prepared statements on v5 engines... I doubt it can
> drop the existing string based queries any time soon... Consider the
> arguments about how long Python 2.x will be in use (I'm still on 2.5)...
> Imagine the sluggishness in having database engines converted
> (especially in a shared provider environment, where the language
> specific adapters also need updating -- ODBC drivers, etc.)
Thanks, your replies to this subthread have been most enlightening.
Carl Banks
|
|
0
|
|
|
|
Reply
|
pavlovevidence (1338)
|
6/30/2010 3:24:36 AM
|
|
On 06/29/2010 06:25 PM, Lawrence D'Oliveiro wrote:
> I have yet to find an architecture or C compiler where it DOESN’T work.
>
> Feel free to try and prove me wrong.
Okay, I will. Your code passes a char** when a char* is expected. Every
compiler I know of will give you a *warning*. Mistaking char*, char**,
and char[] is a common mistake that almost every C program makes in the
beginning. Now for the proof:
Consider this variation where I use a dynamically allocated buffer
instead of static:
#include <stdio.h>
int main(int argc, char ** argv)
{
char *buf = malloc(512 * sizeof(char));
const int a = 2, b = 3;
snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
fprintf(stdout, buf);
free(buf);
return 0;
} /*main*/
On my machine, an immediate segfault (stack overrun). Your code only
works because your buf is statically allocated, which means &buf==buf.
But this equivalance does not hold for any other situation. If your
buffer was dynamically allocated on the heap, instead of passing a
pointer to the buffer (which *is* what buf itself is), you are passing a
pointer to the pointer, which is where buf is stored on the stack, but
not the buffer itself. Instant stack corruption.
|
|
0
|
|
|
|
Reply
|
Michael
|
6/30/2010 4:05:17 AM
|
|
On 06/29/2010 06:26 PM, Lawrence D'Oliveiro wrote:
>> I'm not sure you understood me correctly, because I advocate
>> *not* doing input sanitization. Hard or not -- I don't want to know,
>> because I don't want to do it.
>
> But no-one has yet managed to come up with an alternative that involves less
> work.
Your case is still not persuasive.
How is using the DB API's placeholders and parameterization more work?
It's the same amount of keystrokes, perhaps even less. You would just
be substituting the API's parameter placeholders for Python's. In fact
with Psycopg2 and the mysql python db apis, it's almost a matter of
simply removing the "%" and putting in a comma, turning python's string
substitution into a method call. And you can leave out the quotes
around where the variables go. If I have to sanitize every input, I
have to do it on each and every field on each and every form action.
With the DB API doing the work I just do it once, in one place. Is this
not easier that manually escaping everything and then embedding it in
the query string?
I've not used sqlalchemy, but it looks similarly easy.
|
|
0
|
|
|
|
Reply
|
Michael
|
6/30/2010 4:11:16 AM
|
|
On 06/29/2010 10:05 PM, Michael Torrie wrote:
> #include <stdio.h>
>
> int main(int argc, char ** argv)
> {
> char *buf = malloc(512 * sizeof(char));
> const int a = 2, b = 3;
> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
^^^^^^^^^^
Make that 512*sizeof(buf)
Still segfaults though.
> fprintf(stdout, buf);
> free(buf);
> return 0;
> } /*main*/
|
|
0
|
|
|
|
Reply
|
Michael
|
6/30/2010 4:17:17 AM
|
|
On 06/29/2010 10:17 PM, Michael Torrie wrote:
> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>> #include <stdio.h>
>>
>> int main(int argc, char ** argv)
>> {
>> char *buf = malloc(512 * sizeof(char));
>> const int a = 2, b = 3;
>> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
> ^^^^^^^^^^
> Make that 512*sizeof(buf)
Sigh. Try again. How about "512 * sizeof(char)" ? Still doesn't make
a different. The code still crashes because the &buf is incorrect.
Another reason python programming is just so much funner and easier!
This little diversion is fun though. C is pretty powerful and I enjoy
it, but it sure keeps one on one's toes. I made a similar mistake to
the &buf thing years ago when I thought I could return strings (char *)
from functions on the stack the way Pascal and BASIC could. It was only
by pure luck that my code worked as the part of the stack being accessed
was invalid and could have been overwritten.
>> fprintf(stdout, buf);
>> free(buf);
>> return 0;
>> } /*main*/
|
|
0
|
|
|
|
Reply
|
Michael
|
6/30/2010 4:28:25 AM
|
|
On Jun 28, 2:44=A0am, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote:
> Carl Banks wrote:
> > Indeed, strncpy does not copy that final NUL if it's at or beyond the
> > nth element. =A0Probably the most mind-bogglingly stupid thing about th=
e
> > standard C library, which has lots of mind-boggling stupidity.
>
> I don't think it was as stupid as that back when C was
> designed. Every byte of memory was precious in those days,
> and if you had, say, 10 bytes allocated for a string, you
> wanted to be able to use all 10 of them for useful data.
>
> So the convention was that a NUL byte was used to mark
> the end of the string *if it didn't fill all the available
> space*.
I can't think of any function in the standard library that observes
that convention, which inclines me to disbelieve this convention ever
really existed. If it did, there would be functions to support it.
For that matter, I'm not really inclined to believe bytes were *that*
precious in those days.
> Functions such as strncpy and snprintf are designed
> for use with strings that follow this convention. Proper
> usage requires being cognizant of the maximum length and
> using appropriate length-limited functions for all operations
> on such strings.
Well, no. Being cognizant of the string's maximum length doesn't make
you able to pass it to printf, or system, or any other C function.
The obvious rationale behind strncpy's stupid behavior is that it's
not a string function at all, but a memory block function, that stops
at a NUL in case you don't care what's after the NUL in a block. But
it leads you to believe it's a string function by it's name.
Carl Banks
|
|
0
|
|
|
|
Reply
|
Carl
|
6/30/2010 4:49:20 AM
|
|
On Wed, 2010-06-30, Michael Torrie wrote:
> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>> #include <stdio.h>
>>>
>>> int main(int argc, char ** argv)
>>> {
>>> char *buf = malloc(512 * sizeof(char));
>>> const int a = 2, b = 3;
>>> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>> ^^^^^^^^^^
>> Make that 512*sizeof(buf)
>
> Sigh. Try again. How about "512 * sizeof(char)" ? Still doesn't make
> a different. The code still crashes because the &buf is incorrect.
I haven't tried to understand the rest ... but never write
'sizeof(char)' unless you might change the type later. 'sizeof(char)'
is by definition 1 -- even on odd-ball architectures where a char is
e.g. 16 bits.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/30/2010 9:00:17 AM
|
|
On Wed, 2010-06-30, Carl Banks wrote:
> On Jun 28, 2:44�am, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote:
>> Carl Banks wrote:
>> > Indeed, strncpy does not copy that final NUL if it's at or beyond the
>> > nth element. �Probably the most mind-bogglingly stupid thing about the
>> > standard C library, which has lots of mind-boggling stupidity.
>>
>> I don't think it was as stupid as that back when C was
>> designed. Every byte of memory was precious in those days,
>> and if you had, say, 10 bytes allocated for a string, you
>> wanted to be able to use all 10 of them for useful data.
>>
>> So the convention was that a NUL byte was used to mark
>> the end of the string *if it didn't fill all the available
>> space*.
>
> I can't think of any function in the standard library that observes
> that convention,
Me neither, except strncpy(), according to above.
> which inclines me to disbelieve this convention ever
> really existed. If it did, there would be functions to support it.
Maybe others existed, but got killed off early. That would make
strncpy() a living fossil, like the Coelacanth ...
> For that matter, I'm not really inclined to believe bytes were *that*
> precious in those days.
It's somewhat believable. If I handled thousands of student names in a
big C array char[30][], I would resent the fact that 1/30 of the
memory was wasted on NUL bytes. I'm sure plenty of people have done what
Gregory suggests ... but it's not clear that strncpy() was designed to
support those people.
I suppose it's all lost in history.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/30/2010 9:18:28 AM
|
|
On Tue, 29 Jun 2010 08:41:03 -0400, Roy Smith wrote:
>> > And what about regular expressions?
>>
>> What about them? As the saying goes:
>>
>> Some people, when confronted with a problem, think
>> "I know, I'll use regular expressions."
>> Now they have two problems.
>
> That's silly. RE is a good tool. Like all good tools, it is the right
> tool for some jobs and the wrong tool for others.
"When all you have is a hammer, everything looks like a nail" ;)
Except, REs are more like a turbocharged angle grinder: bloody
dangerous in the hands of a novice.
[I was going to say "hole hawg", but then realised that most of my post
would be a quotation explaining it. The reference is to Neal Stephenson's
essay "In the Beginning was the Command Line":
<http://www.cryptonomicon.com/beginning.html>]
> I've noticed over the years a significant anti-RE sentiment in the
> Python community.
IMHO, the sentiment isn't so much against REs per se, but against
excessive or inappropriate use. Apart from making it easy to write
illegible code, they also make it easy to write code that "mostly sort-of
works" but somewhat harder to write code which is actually correct.
It doesn't help that questions on REs often start out by stating a problem
for which REs are inappropriate, e.g. parsing a context-free (or higher)
language, and in the same sentence indicate the the poster is already
predisposed to using REs.
|
|
0
|
|
|
|
Reply
|
Nobody
|
6/30/2010 12:22:15 PM
|
|
On 06/30/2010 03:00 AM, Jorgen Grahn wrote:
> On Wed, 2010-06-30, Michael Torrie wrote:
>> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>>> #include <stdio.h>
>>>>
>>>> int main(int argc, char ** argv)
>>>> {
>>>> char *buf = malloc(512 * sizeof(char));
>>>> const int a = 2, b = 3;
>>>> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>> ^^^^^^^^^^
>>> Make that 512*sizeof(buf)
>>
>> Sigh. Try again. How about "512 * sizeof(char)" ? Still doesn't make
>> a different. The code still crashes because the &buf is incorrect.
>
> I haven't tried to understand the rest ... but never write
> 'sizeof(char)' unless you might change the type later. 'sizeof(char)'
> is by definition 1 -- even on odd-ball architectures where a char is
> e.g. 16 bits.
You're right. I normally don't use sizeof(char). This is obviously a
contrived example; I just wanted to make the example such that there's
no way the original poster could argue that the crash is caused by
something other than &buf.
Then again, it's always a bad idea in C to make assumptions about
anything. If you're on Windows and want to use the unicode versions of
everything, you'd need to do sizeof(). So using it here would remind
you that when you move to the 16-bit Microsoft unicode versions of
snprintf need to change the sizeof(char) lines as well to sizeof(wchar_t).
|
|
0
|
|
|
|
Reply
|
Michael
|
6/30/2010 2:02:47 PM
|
|
On Tue, 2010-06-29, Stephen Hansen wrote:
> On 6/29/10 5:41 AM, Roy Smith wrote:
>> Nobody<nobody@nowhere.com> wrote:
>>
>>>> And what about regular expressions?
>>>
>>> What about them? As the saying goes:
>>>
>>> Some people, when confronted with a problem, think
>>> "I know, I'll use regular expressions."
>>> Now they have two problems.
>>
>> That's silly. RE is a good tool. Like all good tools, it is the right
>> tool for some jobs and the wrong tool for others.
>
> There's nothing silly about it.
>
> It is an exaggeration though: but it does represent a good thing to keep
> in mind.
Not an exaggeration: it's an absolute. It literally says that any time
you try to solve a problem with a regex, (A) it won't solve the problem
and (B) it will in itself become a problem. And it doesn't tell you
why: you're supposed to accept or reject this without thinking.
How can that be a good thing to keep in mind?
I wouldn't normally be annoyed by the quote, but it is thrown around a
lot in various places, not just here.
> Yes, re is a tool -- and a useful one at that. But its also a tool which
> /seems/ like an omnitool capable of tackling everything.
That's more like my attitude towards them.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
6/30/2010 2:14:38 PM
|
|
On 6/30/10 7:14 AM, Jorgen Grahn wrote:
> On Tue, 2010-06-29, Stephen Hansen wrote:
>> On 6/29/10 5:41 AM, Roy Smith wrote:
>>> Nobody<nobody@nowhere.com> wrote:
>>>
>>>>> And what about regular expressions?
>>>>
>>>> What about them? As the saying goes:
>>>>
>>>> Some people, when confronted with a problem, think
>>>> "I know, I'll use regular expressions."
>>>> Now they have two problems.
>>>
>>> That's silly. RE is a good tool. Like all good tools, it is the right
>>> tool for some jobs and the wrong tool for others.
>>
>> There's nothing silly about it.
>>
>> It is an exaggeration though: but it does represent a good thing to keep
>> in mind.
>
> Not an exaggeration: it's an absolute. It literally says that any time
> you try to solve a problem with a regex, (A) it won't solve the problem
> and (B) it will in itself become a problem. And it doesn't tell you
> why: you're supposed to accept or reject this without thinking.
>
> How can that be a good thing to keep in mind?
That it speaks in absolutes is what makes it an exaggeration. Yes, it
literally says something kind of like that (Your 'a' is a
mischaracterization).
It's still a very good thing to keep in mind.
Its a "saying" -- a proverb, an expression. Since when are the wise
remarks of our ancient forefathers literal? Not last I checked.
Reading into a saying as not a guide or suggestion or cautionary tale
but instead a doctrinal absolute is where we run into problems, not in
the repeating of them.
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
|
|
0
|
|
|
|
Reply
|
Stephen
|
6/30/2010 3:23:24 PM
|
|
On 6/30/2010 8:22 AM, Nobody wrote:
>> I've noticed over the years a significant anti-RE sentiment in the
>> Python community.
>
> IMHO, the sentiment isn't so much against REs per se, but against
> excessive or inappropriate use. Apart from making it easy to write
> illegible code, they also make it easy to write code that "mostly sort-of
> works" but somewhat harder to write code which is actually correct.
>
> It doesn't help that questions on REs often start out by stating a problem
> for which REs are inappropriate, e.g. parsing a context-free (or higher)
> language, and in the same sentence indicate the the poster is already
> predisposed to using REs.
They also often start with a problem that is 'sub-relational-grammar'
and easily solved with string methods, and again the OP proposes to use
the overkill of REs. In other words, people ask "How do I do this with
an RE" rather than "What tool should I use for this, and how".
If people asked "How do I push a pin into a corkboard with a (standard)
hammer" or "How do I break up a concrete sidewalk with a (standard)
hammer), it would not be 'anti-hammer sentiment' to suggest another
tool, like pliers or a jackhammer.
--
Terry Jan Reedy
|
|
0
|
|
|
|
Reply
|
Terry
|
6/30/2010 4:56:48 PM
|
|
Terry Reedy wrote:
> On 6/30/2010 8:22 AM, Nobody wrote:
>
>>> I've noticed over the years a significant anti-RE sentiment in the
>>> Python community.
>>
>> IMHO, the sentiment isn't so much against REs per se, but against
>> excessive or inappropriate use. Apart from making it easy to write
>> illegible code, they also make it easy to write code that "mostly sort-of
>> works" but somewhat harder to write code which is actually correct.
>>
>> It doesn't help that questions on REs often start out by stating a
>> problem
>> for which REs are inappropriate, e.g. parsing a context-free (or higher)
>> language, and in the same sentence indicate the the poster is already
>> predisposed to using REs.
>
> They also often start with a problem that is 'sub-relational-grammar'
> and easily solved with string methods, and again the OP proposes to use
> the overkill of REs. In other words, people ask "How do I do this with
> an RE" rather than "What tool should I use for this, and how".
>
> If people asked "How do I push a pin into a corkboard with a (standard)
> hammer" or "How do I break up a concrete sidewalk with a (standard)
> hammer), it would not be 'anti-hammer sentiment' to suggest another
> tool, like pliers or a jackhammer.
I took the time to learn REs about a year ago. It was well worth it,
even though I've only used REs a handful of times since, because when
you need them there is no good substitute. But when you don't, there
are plenty. ;)
~Ethan~
|
|
0
|
|
|
|
Reply
|
Ethan
|
6/30/2010 5:38:11 PM
|
|
Jorgen Grahn <grahn+nntp@snipabacken.se> writes:
> It's somewhat believable. If I handled thousands of student names in a
> big C array char[30][], I would resent the fact that 1/30 of the
> memory was wasted on NUL bytes.
But you'd be wasting even more of the memory on bytes left unused when
the student's name is less than 30 chars. If memory is that scarce you
need a different representation.
|
|
0
|
|
|
|
Reply
|
Paul
|
6/30/2010 7:17:40 PM
|
|
On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
> On Tue, 2010-06-29, Stephen Hansen wrote:
>> On 6/29/10 5:41 AM, Roy Smith wrote:
>>> Nobody<nobody@nowhere.com> wrote:
>>>
>>>>> And what about regular expressions?
>>>>
>>>> What about them? As the saying goes:
>>>>
>>>> Some people, when confronted with a problem, think "I know, I'll
>>>> use regular expressions." Now they have two problems.
>>>
>>> That's silly. RE is a good tool. Like all good tools, it is the
>>> right tool for some jobs and the wrong tool for others.
>>
>> There's nothing silly about it.
>>
>> It is an exaggeration though: but it does represent a good thing to
>> keep in mind.
>
> Not an exaggeration: it's an absolute. It literally says that any time
> you try to solve a problem with a regex, (A) it won't solve the problem
> and (B) it will in itself become a problem. And it doesn't tell you
> why: you're supposed to accept or reject this without thinking.
It's a *two sentence* summary, not a reasoned and nuanced essay on the
pros and cons for REs.
Sheesh, I can just imagine you as a child, arguing with your teacher on
being told not to run with scissors -- "but teacher, there may be
circumstances where running with scissors is the right thing to do, you
are guilty of over-simplifying a complex topic into a single simplified
sound-byte, instead of providing a detailed, rich heuristic for analysing
each and every situation in full before making the decision whether or
not to run with scissors".
If you look at the quote carefully, instead of making a knee-jerk
reaction, you will see that it is *literally* correct. Given some
problem, having decided to solve it with a regex, you DO have two
problems:
(1) Merely making the decision "use REs" doesn't actually solve the
original problem, any more than "use a hammer" solves the problem of "how
do I build a table?". You've decided on an approach and a tool, but your
original problem still applies.
(2) AND you now have the additional problem of dealing with regular
expressions, which are notoriously hard to write, harder to debug,
difficult to maintain, often slow, incapable of solving certain common
problems (such as parsing nested parentheses).
So it might be a short, simplified quip, but it *is* literally correct.
> How can that be a good thing to keep in mind?
Because many people consider REs to be some sort of panacea for solving
every text-based problem, and it's a good thing to open their eyes.
--
Steven
|
|
0
|
|
|
|
Reply
|
steve9679 (1985)
|
6/30/2010 8:30:38 PM
|
|
In message <mailman.2369.1277870727.32709.python-list@python.org>, Michael
Torrie wrote:
> Okay, I will. Your code passes a char** when a char* is expected.
No it doesn’t.
> Consider this variation where I use a dynamically allocated buffer
> instead of static:
And so you misunderstand the difference between a C array and a pointer.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
7/1/2010 12:36:45 AM
|
|
On 06/30/2010 06:36 PM, Lawrence D'Oliveiro wrote:
> In message <mailman.2369.1277870727.32709.python-list@python.org>,
> Michael Torrie wrote:
>
>> Okay, I will. Your code passes a char** when a char* is expected.
>
> No it doesn’t.
You're right; it doesn't. Your code passes char (*)[512].
warning: passing argument 1 of ‘snprintf’ from incompatible pointer type
/usr/include/stdio.h:385: note: expected ‘char * __restrict__’ but
argument is of type ‘char (*)[512]’
> And so you misunderstand the difference between a C array and a
> pointer.
You make a pretty big assumption.
Given "char buf[512]", buf's type is char * according to the compiler
and every C textbook I know of. With a static char array, there's no
need to take it's address since it *is* the address of the first
element. Taking the address can lead to problems if you ever substitute
a dynamically-allocated buffer for the statically-allocated one. For
one-dimensional arrays at least, static arrays and pointers are
interchangeable when calling snprinf. You do not agree?
Anyway, this is far enough away from Python.
|
|
0
|
|
|
|
Reply
|
Michael
|
7/1/2010 5:40:06 AM
|
|
On Wed, 2010-06-30, Steven D'Aprano wrote:
> On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
>
>> On Tue, 2010-06-29, Stephen Hansen wrote:
>>> On 6/29/10 5:41 AM, Roy Smith wrote:
>>>> Nobody<nobody@nowhere.com> wrote:
>>>>
>>>>>> And what about regular expressions?
>>>>>
>>>>> What about them? As the saying goes:
>>>>>
>>>>> Some people, when confronted with a problem, think "I know, I'll
>>>>> use regular expressions." Now they have two problems.
>>>>
>>>> That's silly. RE is a good tool. Like all good tools, it is the
>>>> right tool for some jobs and the wrong tool for others.
>>>
>>> There's nothing silly about it.
>>>
>>> It is an exaggeration though: but it does represent a good thing to
>>> keep in mind.
>>
>> Not an exaggeration: it's an absolute. It literally says that any time
>> you try to solve a problem with a regex, (A) it won't solve the problem
>> and (B) it will in itself become a problem. And it doesn't tell you
>> why: you're supposed to accept or reject this without thinking.
>
> It's a *two sentence* summary, not a reasoned and nuanced essay on the
> pros and cons for REs.
Well, perhaps you cannot say anything useful about REs in general in
two sentences, and should use either more words, or not say anything
at all.
The way it was used in the quoted text above is one example of what I
mean. (Unless other details have been trimmed -- I can't check right
now.) If he meant to say "REs aren't really a good solution for this
kind of problem, even though they look tempting", then he should have
said that.
> Sheesh, I can just imagine you as a child, arguing with your teacher on
> being told not to run with scissors -- "but teacher, there may be
> circumstances where running with scissors is the right thing to do, you
> are guilty of over-simplifying a complex topic into a single simplified
> sound-byte, instead of providing a detailed, rich heuristic for analysing
> each and every situation in full before making the decision whether or
> not to run with scissors".
When I was a child I expected that kind of argumentation from adults.
I expect something more as an adult.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
7/1/2010 6:58:38 AM
|
|
On Wed, 2010-06-30, Michael Torrie wrote:
> On 06/30/2010 03:00 AM, Jorgen Grahn wrote:
>> On Wed, 2010-06-30, Michael Torrie wrote:
>>> On 06/29/2010 10:17 PM, Michael Torrie wrote:
>>>> On 06/29/2010 10:05 PM, Michael Torrie wrote:
>>>>> #include <stdio.h>
>>>>>
>>>>> int main(int argc, char ** argv)
>>>>> {
>>>>> char *buf = malloc(512 * sizeof(char));
>>>>> const int a = 2, b = 3;
>>>>> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
>>>> ^^^^^^^^^^
>>>> Make that 512*sizeof(buf)
>>>
>>> Sigh. Try again. How about "512 * sizeof(char)" ? Still doesn't make
>>> a different. The code still crashes because the &buf is incorrect.
>>
>> I haven't tried to understand the rest ... but never write
>> 'sizeof(char)' unless you might change the type later. 'sizeof(char)'
>> is by definition 1 -- even on odd-ball architectures where a char is
>> e.g. 16 bits.
>
> You're right. I normally don't use sizeof(char). This is obviously a
> contrived example; I just wanted to make the example such that there's
> no way the original poster could argue that the crash is caused by
> something other than &buf.
>
> Then again, it's always a bad idea in C to make assumptions about
> anything.
There are some things you cannot assume, others which few fellow
programmers can care to memorize, and others which you often can get
away with (like assuming an int is more than 16 bits, when your code
is tied to a modern Unix anyway).
But sizeof(char) is always 1.
> If you're on Windows and want to use the unicode versions of
> everything, you'd need to do sizeof(). So using it here would remind
> you that when you move to the 16-bit Microsoft unicode versions of
> snprintf need to change the sizeof(char) lines as well to sizeof(wchar_t).
Yes -- see "unless you might change the type later" above.
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
|
|
0
|
|
|
|
Reply
|
Jorgen
|
7/1/2010 7:09:57 AM
|
|
On 6/30/10 11:58 PM, Jorgen Grahn wrote:
> On Wed, 2010-06-30, Steven D'Aprano wrote:
>> On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
>>> On Tue, 2010-06-29, Stephen Hansen wrote:
>>>>
>>>> There's nothing silly about it.
>>>>
>>>> It is an exaggeration though: but it does represent a good thing to
>>>> keep in mind.
>>>
>>> Not an exaggeration: it's an absolute. It literally says that any time
>>> you try to solve a problem with a regex, (A) it won't solve the problem
>>> and (B) it will in itself become a problem. And it doesn't tell you
>>> why: you're supposed to accept or reject this without thinking.
>>
>> It's a *two sentence* summary, not a reasoned and nuanced essay on the
>> pros and cons for REs.
>
> Well, perhaps you cannot say anything useful about REs in general in
> two sentences, and should use either more words, or not say anything
> at all.
>
> The way it was used in the quoted text above is one example of what I
> mean. (Unless other details have been trimmed -- I can't check right
> now.) If he meant to say "REs aren't really a good solution for this
> kind of problem, even though they look tempting", then he should have
> said that.
The way it is used above (Even with more stripping) is exactly where it
is legitimate.
Regular expressions are a powerful tool.
The use of a powerful tool when a simple tool is available that achieves
the same end is inappropriate, because power *always* has a cost.
The entire point of the quote is that when you look at a problem, you
should *begin* from the position that a complex, powerful tool is not
what you need to solve it.
You should always begin from a position that a simple tool will suffice
to do what you need.
The quote does not deny the power of regular expressions; it challenges
widely held assumption and belief that comes from *somewhere* that they
are the best way to approach any problem that is text related.
Does it come off as negative towards regular expressions? Certainly. But
not because of any fault of re's on their own, but because there is this
widespread perception that they are the swiss army knife that can solve
any problem by just flicking out the right little blade.
Its about redefining perception.
Regular expressions are not the go-to solution for anything to do with
text. Regular expressions are the tool you reach for when nothing else
will work.
Its not your first step; its your last (or, at least, one that happens
way later then most people come around expecting it to be).
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
|
|
0
|
|
|
|
Reply
|
python3307 (206)
|
7/1/2010 7:19:09 AM
|
|
On Wed, 30 Jun 2010 23:40:06 -0600, Michael Torrie wrote:
> Given "char buf[512]", buf's type is char * according to the compiler
> and every C textbook I know of.
No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
use "buf" as an rvalue (rather than an lvalue), it will be implicitly
converted to char*.
If you take its address, you'll get a "pointer to array of 512 chars",
i.e. a pointer to the array rather than to the first element. Converting
this to a char* will yield a pointer to the first element.
If buf was declared "char *buf", then taking its address will yield a
char**, and converting this to a char* will produce a pointer to the first
byte of the pointer, which is unlikely to be useful.
|
|
0
|
|
|
|
Reply
|
Nobody
|
7/1/2010 7:24:06 AM
|
|
Stephen Hansen wrote:
> On 6/30/10 11:58 PM, Jorgen Grahn wrote:
>> On Wed, 2010-06-30, Steven D'Aprano wrote:
>>> On Wed, 30 Jun 2010 14:14:38 +0000, Jorgen Grahn wrote:
>>>> On Tue, 2010-06-29, Stephen Hansen wrote:
>>>>>
>>>>> There's nothing silly about it.
>>>>>
>>>>> It is an exaggeration though: but it does represent a good thing to
>>>>> keep in mind.
>>>>
>>>> Not an exaggeration: it's an absolute. It literally says that any time
>>>> you try to solve a problem with a regex, (A) it won't solve the
>>>> problem
>>>> and (B) it will in itself become a problem. And it doesn't tell you
>>>> why: you're supposed to accept or reject this without thinking.
>>>
>>> It's a *two sentence* summary, not a reasoned and nuanced essay on the
>>> pros and cons for REs.
>>
>> Well, perhaps you cannot say anything useful about REs in general in
>> two sentences, and should use either more words, or not say anything
>> at all.
>>
>> The way it was used in the quoted text above is one example of what I
>> mean. (Unless other details have been trimmed -- I can't check right
>> now.) If he meant to say "REs aren't really a good solution for this
>> kind of problem, even though they look tempting", then he should have
>> said that.
>
> The way it is used above (Even with more stripping) is exactly where
> it is legitimate.
>
> Regular expressions are a powerful tool.
>
> The use of a powerful tool when a simple tool is available that
> achieves the same end is inappropriate, because power *always* has a
> cost.
>
> The entire point of the quote is that when you look at a problem, you
> should *begin* from the position that a complex, powerful tool is not
> what you need to solve it.
>
> You should always begin from a position that a simple tool will
> suffice to do what you need.
>
> The quote does not deny the power of regular expressions; it
> challenges widely held assumption and belief that comes from
> *somewhere* that they are the best way to approach any problem that is
> text related.
>
> Does it come off as negative towards regular expressions? Certainly.
> But not because of any fault of re's on their own, but because there
> is this widespread perception that they are the swiss army knife that
> can solve any problem by just flicking out the right little blade.
>
> Its about redefining perception.
>
> Regular expressions are not the go-to solution for anything to do with
> text. Regular expressions are the tool you reach for when nothing else
> will work.
>
> Its not your first step; its your last (or, at least, one that happens
> way later then most people come around expecting it to be).
>
Guys, this dogmatic discussion already took place in this list. Why
start again ?
Re is part of the python standard library, for some purpose I guess.
JM
|
|
0
|
|
|
|
Reply
|
jeanmichel (477)
|
7/1/2010 10:03:24 AM
|
|
Stephen Hansen <me+list/python@ixokai.io> wrote:
> The quote does not deny the power of regular expressions; it challenges
> widely held assumption and belief that comes from *somewhere* that they
> are the best way to approach any problem that is text related.
Well, that assumption comes from historical unix usage where traditional
tools like awk, sed, ed, and grep, made heavy use of regex, and
therefore people learned to become proficient at them and use them all
the time. Somewhat later, the next generation of tools such as vi and
perl continued that tradition. Given the tools that were available at
the time, regex was indeed likely to be the best tool available for most
text-related problems.
Keep in mind that in the early days, people were working on hard-copy
terminals [[http://en.wikipedia.org/wiki/ASR-33]] so economy of
expression was a significant selling point for regexes.
Not trying to further this somewhat silly debate, just adding a bit of
historical viewpoint to answer the implicit question you ask as to where
the assumption came from.
|
|
0
|
|
|
|
Reply
|
Roy
|
7/1/2010 12:11:03 PM
|
|
On 7/1/10 3:03 AM, Jean-Michel Pichavant wrote:
> Re is part of the python standard library, for some purpose I guess.
No, *really*?
So all those people who have been advocating its useless and shouldn't
be are already too late?
Damn.
Well, there goes *that* whole crusade we were all out on. Since we can't
destroy re, maybe we can go club baby seals.
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
|
|
0
|
|
|
|
Reply
|
Stephen
|
7/1/2010 2:18:53 PM
|
|
On 7/1/10 5:11 AM, Roy Smith wrote:
> Stephen Hansen<me+list/python@ixokai.io> wrote:
>
>> The quote does not deny the power of regular expressions; it challenges
>> widely held assumption and belief that comes from *somewhere* that they
>> are the best way to approach any problem that is text related.
>
> Well, that assumption comes from historical unix usage where traditional
> tools like awk, sed, ed, and grep, made heavy use of regex, and
> therefore people learned to become proficient at them and use them all
> the time.
Oh, I'm fully aware of the history of re's -- but its not those old hats
and even their students and the unix geeks I'm talking about.
It's the newbies and people wandering into the language with absolutely
no idea about the history of unix, shell scripting and such, who so
often arrive with the idea firmly planted in their head, that I wonder
at. Sure, there's going to be a certain amount of cross-polination from
unix-geeks to students-of-students-of-students-of unix geeks to spread
the idea, but it seems more pervasive for that. I just picture a
re-vangelist camping out in high schools and colleges selling the party
line or something :)
--
... Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/
P.S. And no, unix geeks is not a pejorative term.
|
|
0
|
|
|
|
Reply
|
Stephen
|
7/1/2010 2:27:23 PM
|
|
Nobody wrote:
> On Wed, 30 Jun 2010 23:40:06 -0600, Michael Torrie wrote:
>> Given "char buf[512]", buf's type is char * according to the compiler
>> and every C textbook I know of.
References from Kernighan & Ritchie _The C Programming Language_ second
edition:
> No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
> use "buf" as an rvalue (rather than an lvalue), it will be implicitly
> converted to char*.
K&R2 A7.1
> If you take its address, you'll get a "pointer to array of 512 chars",
> i.e. a pointer to the array rather than to the first element. Converting
> this to a char* will yield a pointer to the first element.
K&R2 A7.4.2
��������Mel.
|
|
0
|
|
|
|
Reply
|
Mel
|
7/1/2010 3:36:46 PM
|
|
On 07/01/2010 01:24 AM, Nobody wrote:
> No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
> use "buf" as an rvalue (rather than an lvalue), it will be implicitly
> converted to char*.
Yes this is true. I misstated. I meant that most text books I've seen
say to just use the variable in an *rvalue* as a pointer (can't think of
any lvalue use of an array).
K&R states that arrays (in C anyway) are always *passed* by pointer,
hence when you pass an array to a function it automatically decays into
a pointer. Which is what you said. So no need for & and the compiler
warning you get with it. That's all.
If the OP was striving for pedantic correctness, he would use &buf[0].
|
|
0
|
|
|
|
Reply
|
Michael
|
7/1/2010 4:10:16 PM
|
|
On 7/1/2010 8:36 AM, Mel wrote:
> Nobody wrote:
>> On Wed, 30 Jun 2010 23:40:06 -0600, Michael Torrie wrote:
>>> Given "char buf[512]", buf's type is char * according to the compiler
>>> and every C textbook I know of.
>
> References from Kernighan& Ritchie _The C Programming Language_ second
> edition:
>
>> No, the type of "buf" is "char [512]", i.e. "array of 512 chars". If you
>> use "buf" as an rvalue (rather than an lvalue), it will be implicitly
>> converted to char*.
Yes, unfortunately. The approach to arrays in C is just broken,
for historical reasons. To understand C, you have to realize that
in the early versions, function declarations weren't visible when
function calls were compiled. That came later, in ANSI C. So
parameter passing in C is very dumb. Billions of crashes due
to buffer overflows later, we're still suffering from that mistake.
But this isn't a Python issue.
John Nagle
|
|
0
|
|
|
|
Reply
|
John
|
7/1/2010 5:17:02 PM
|
|
In message <mailman.2370.1277871088.32709.python-list@python.org>, Michael
Torrie wrote:
> On 06/29/2010 06:26 PM, Lawrence D'Oliveiro wrote:
>>> I'm not sure you understood me correctly, because I advocate
>>> *not* doing input sanitization. Hard or not -- I don't want to know,
>>> because I don't want to do it.
>>
>> But no-one has yet managed to come up with an alternative that involves
>> less work.
>
> Your case is still not persuasive.
So persuade me. I have given an example of code written the way I do it. Now
let’s see you rewrite it using your preferred technique, just to prove that
your way is simpler and easier to understand.
Enough hand-waving, let’s see some code!
|
|
0
|
|
|
|
Reply
|
Lawrence
|
7/1/2010 11:47:00 PM
|
|
In message <4c2ccd9c$0$1643$742ec2ed@news.sonic.net>, John Nagle wrote:
> The approach to arrays in C is just broken, for historical reasons.
Nevertheless, it it at least self-consistent. To return to my original
macro:
#define Descr(v) &v, sizeof v
As written, this works whatever the type of v: array, struct, whatever.
> So parameter passing in C is very dumb.
Nothing to do with the above issue.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
7/1/2010 11:50:59 PM
|
|
On Thursday 01 July 2010 16:50:59 Lawrence D'Oliveiro wrote:
> Nevertheless, it it at least self-consistent. To return to my original
> macro:
>
> #define Descr(v) &v, sizeof v
>
> As written, this works whatever the type of v: array, struct, whatever.
>
Doesn't seem to, sorry. Using Michael Torrie's code example, slightly
modified...
[rami@tigris ~]$ cat example.c
#include <stdio.h>
#define Descr(v) &v, sizeof v
int main(int argc, char ** argv)
{
char *buf = malloc(512 * sizeof(char));
const int a = 2, b = 3;
snprintf(Descr(buf), "%d + %d = %d\n", a, b, a + b);
fprintf(stdout, buf);
free(buf);
return 0;
} /*main*/
[rami@tigris ~]$ clang example.c
example.c:11:18: warning: incompatible pointer types passing 'char **', expected
'char *' [-pedantic]
snprintf(Descr(buf), "%d + %d = %d\n", a, b, a + b);
^~~~~~~~~~
example.c:4:18: note: instantiated from:
#define Descr(v) &v, sizeof v
^~~~~~~~~~~~
<<snip>>
[rami@tigris ~]$ ./a.out
Segmentation fault
----
Rami Chowdhury
"Passion is inversely proportional to the amount of real information available."
-- Benford's Law of Controversy
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
|
|
0
|
|
|
|
Reply
|
Rami
|
7/2/2010 3:17:55 AM
|
|
In message <mailman.136.1278040489.1673.python-list@python.org>, Rami
Chowdhury wrote:
> On Thursday 01 July 2010 16:50:59 Lawrence D'Oliveiro wrote:
>
>> Nevertheless, it it at least self-consistent. To return to my original
>> macro:
>>
>> #define Descr(v) &v, sizeof v
>>
>> As written, this works whatever the type of v: array, struct, whatever.
>
> Doesn't seem to, sorry. Using Michael Torrie's code example, slightly
> modified...
>
> char *buf = malloc(512 * sizeof(char));
Again, you misunderstand the difference between a C array and a pointer.
Study the following example, which does work, and you might grasp the point:
ldo@theon:hack> cat test.c
#include <stdio.h>
int main(int argc, char ** argv)
{
char buf[512];
const int a = 2, b = 3;
snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
fprintf(stdout, buf);
return
0;
} /*main*/
ldo@theon:hack> ./test
2 + 3 = 5
|
|
0
|
|
|
|
Reply
|
Lawrence
|
7/3/2010 2:20:26 AM
|
|
On Friday 02 July 2010 19:20:26 Lawrence D'Oliveiro wrote:
> In message <mailman.136.1278040489.1673.python-list@python.org>, Rami
> Chowdhury wrote:
> > On Thursday 01 July 2010 16:50:59 Lawrence D'Oliveiro wrote:
> >> Nevertheless, it it at least self-consistent. To return to my original
> >>
> >> macro:
> >> #define Descr(v) &v, sizeof v
> >>
> >> As written, this works whatever the type of v: array, struct, whatever.
> >
> > Doesn't seem to, sorry. Using Michael Torrie's code example, slightly
> > modified...
> >
> > char *buf = malloc(512 * sizeof(char));
>
> Again, you misunderstand the difference between a C array and a pointer.
> Study the following example, which does work, and you might grasp the
> point:
>
> ldo@theon:hack> cat test.c
> #include <stdio.h>
>
> int main(int argc, char ** argv)
> {
> char buf[512];
> const int a = 2, b = 3;
> snprintf(&buf, sizeof buf, "%d + %d = %d\n", a, b, a + b);
> fprintf(stdout, buf);
> return
> 0;
> } /*main*/
> ldo@theon:hack> ./test
> 2 + 3 = 5
I'm sorry, perhaps you've misunderstood what I was refuting. You posted:
> >> macro:
> >> #define Descr(v) &v, sizeof v
> >>
> >> As written, this works whatever the type of v: array, struct, whatever.
With my code example I found that, as others have pointed out, unfortunately it
doesn't work if v is a pointer to a heap-allocated area.
----
Rami Chowdhury
"A man with a watch knows what time it is. A man with two watches is never
sure". -- Segal's Law
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
|
|
0
|
|
|
|
Reply
|
Rami
|
7/3/2010 3:07:24 AM
|
|
On Mon, Jun 28, 2010 at 6:44 PM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Carl Banks wrote:
>
>> Indeed, strncpy does not copy that final NUL if it's at or beyond the
>> nth element. =C2=A0Probably the most mind-bogglingly stupid thing about =
the
>> standard C library, which has lots of mind-boggling stupidity.
>
> I don't think it was as stupid as that back when C was
> designed
Actually, strncpy had a very specific use case when it was introduced
(dealing with limited-size entries in very old unix filesystem). It
should never be used for C string handling, and I don't think it is
fair to say it is stupid: it does exactly what it was designed for. It
just happens that most people don't know what it was designed for.
David
|
|
0
|
|
|
|
Reply
|
David
|
7/3/2010 3:09:45 AM
|
|
In message <mailman.182.1278126257.1673.python-list@python.org>, Rami
Chowdhury wrote:
> I'm sorry, perhaps you've misunderstood what I was refuting. You posted:
>> >> macro:
>> >> #define Descr(v) &v, sizeof v
>> >>
>> >> As written, this works whatever the type of v: array, struct,
>> >> whatever.
>
> With my code example I found that, as others have pointed out,
> unfortunately it doesn't work if v is a pointer to a heap-allocated area.
It still correctly passes the address and size of that pointer variable. It
that’s not what you intended, you shouldn’t use it.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
7/4/2010 1:23:21 AM
|
|
In message <mailman.2128.1277537954.32709.python-list@python.org>, Robert
Kern wrote:
> On 2010-06-25 19:49 , Lawrence D'Oliveiro wrote:
>
>> Why do people consider input sanitization so hard?
>
> It's not hard per se; it's just repetitive, prone to the occasional
> mistake, and, frankly, really boring.
But as a programmer, I’m not in the habit of doing “repetitive” and
“boring”. Look at the example I posted, and you’ll see. It’s the ones trying
to come up with alternatives to my code who produce things that look
“reptitive” and “boring”.
|
|
0
|
|
|
|
Reply
|
Lawrence
|
7/4/2010 1:28:20 AM
|
|
In message <pan.2010.06.29.09.35.18.594000@nowhere.com>, Nobody wrote:
> On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
>
>>> Seriously, almost every other kind of library uses a binary API. What
>>> makes databases so special that they need a string-command based API?
>>
>> HTML is also effectively a string-based API.
>
> HTML is a data format. The sane way to construct or manipulate HTML is via
> the DOM, not string operations.
What is this “DOM” of which you speak? I looked here
<http://docs.python.org/library/>, but can find nothing that sounds like
that, that is relevant to HTML.
>> And what about regular expressions?
>
> What about them? As the saying goes:
>
> Some people, when confronted with a problem, think
> "I know, I'll use regular expressions."
> Now they have two problems.
>
> They have some uses, e.g. defining tokens[1]. Using them to match more
> complex constructs is error-prone ...
What if they’re NOT more complex, but they can simply contain user-entered
data?
>> And all the functionality available through the subprocess
>> module and its predecessors?
>
> The main reason why everyone recommends subprocess over its predecessors
> is that it allows you to bypass the shell, which is one of the most
> common sources of the type of error being discussed in this thread.
How would you deal with this, then: I wrote a script called ExtractMac, to
convert various old Macintosh-format documents accumulated over the years
(stored in AppleDouble form by uploading to a Netatalk server) to more
cross-platform formats. This has a table of conversion commands to use. For
example, the entries for PICT and TEXT Macintosh file types look like this:
"PICT" :
{
"type" : "image",
"ext" : ".png",
"act" : "convert %(src)s %(dst)s",
},
"TEXT" :
{
"type" : "text",
"ext" : ".txt",
"act" : "LineEndings unix <%(src)s >%(dst)s",
},
The conversion code that uses this table looks like
Cmd = \
(
Act.get("act", "cp -p %(src)s %(dst)s")
%
{
"src" : ShellEscape(Src),
"dst" : ShellEscape(DstFileName),
}
)
sys.stderr.write("Doing: %s\n" % Cmd)
Status = os.system(Cmd)
How much simpler would your alternative be? I don’t think it would be
simpler at all.
|
|
0
|
|
|
|
Reply
|
ldo (2144)
|
7/4/2010 2:33:44 AM
|
|
On Saturday 03 July 2010 19:33:44 Lawrence D'Oliveiro wrote:
> In message <pan.2010.06.29.09.35.18.594000@nowhere.com>, Nobody wrote:
> > On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote:
> >>> Seriously, almost every other kind of library uses a binary API. What
> >>> makes databases so special that they need a string-command based API?
> >>=20
> >> HTML is also effectively a string-based API.
> >=20
> > HTML is a data format. The sane way to construct or manipulate HTML is
> > via the DOM, not string operations.
>=20
> What is this =E2=80=9CDOM=E2=80=9D of which you speak? I looked here
> <http://docs.python.org/library/>, but can find nothing that sounds like
> that, that is relevant to HTML.
>=20
The Document Object Model - I don't think the standard library has an HTML =
DOM=20
module but there's certainly one for XML (and XHTML):=20
http://docs.python.org/library/xml.dom.html
=2D---
Rami Chowdhury
"Any sufficiently advanced incompetence is indistinguishable from malice."
=2D- Grey's Law
+1-408-597-7068 / +44-7875-841-046 / +88-01819-245544
|
|
0
|
|
|
|
Reply
|
rami.chowdhury (138)
|
7/4/2010 2:43:44 AM
|
|
Seeking industry expert candidates
I=92m Justin Smith, Director of Tech Recruiting at Express Seattle. I
am currently seeking candidates to fill Tech Positions for multiple A-
List Clients:
=95 Quality Assurance Engineer,
=95 Senior Data Engineer, Search Experience
=95 Senior Software Development Engineer, UX / UI
=95 Software Dev Engineer
=95 Software Dev TEST Engineer
=95 Software Development Manager,
=95 Sr Applications Engineer =96 Strong Linux Systems Administrator
=95 SR Technical PM, -
=95 Web Designer/Developer =96 strong tech and art background
=95 Business Analyst,
Many of our Clients work within a Linux environment. For greatest
impact, on your resume highlight relevant skills and technologies used
in an environment supported by Linux, languages that show you
understand and know object oriented development, have experience with
high volume sites that are notable and are continually learning new
skills.
Hot List that gets our attention =96 LAMP Stack Experience, Linux, Perl
and Java/JavaScript Experts that are current in the use and show
expertise. Microsoft environment and dot net technologies are not
added attractors to many of our clients.
If you are interested in these roles, send me your resume, cover
letter highlighting noteworthy skills and projects with expected base
salary to justin.smith@expresspros.com and I can submit it ASAP.
Justin(dot)Smith(at)ExpressPros(dot)com DO FEEL FREE TO REFER this
on to a friend or colleague with strong skills as well.
Qualifications:
- Computer Science degree or equivalent work experience (5+ years).
- Expert level fluency in at least one mainstream object-oriented
programming language (C++, Java, Ruby, Python).
- Proven coding skills in C++ and or Java on Unix/Linux platforms is a
must.
- Experience with MySQL or Oracle databases a plus.
- Linux or LAMP Stack experience preferred.
- Experience with HTML5, XML, XSD, WSDL, and SOAP and a history
working with web client software
- Experience with scalable distributed systems is a positive.
Added value attractors if the qualifications are available:
+ Experience with the iPhone SDK and Objective-C. =96 published app that
is stable, engaging
+ Experience with the BlackBerry SDK and/or J2ME. =96 published app that
is stable, engaging
+ Experience with the Android SDK. =96 published app that is stable,
engaging
If you are interested in these roles, send me your resume, cover
letter highlighting noteworthy skills and projects with expected base
salary to justin.smith@expresspros.com and I can submit it ASAP.
Justin(dot)Smith(at)ExpressPros(dot)com DO FEEL FREE TO REFER this on
to a friend or colleague with strong skills as well.
On Jul 1, 7:18=A0am, Stephen Hansen <me+list/pyt...@ixokai.io> wrote:
> On 7/1/10 3:03 AM, Jean-Michel Pichavant wrote:
>
> > Re is part of the python standard library, for some purpose I guess.
>
> No, *really*?
>
> So all those people who have been advocating its useless and shouldn't
> be are already too late?
>
> Damn.
>
> Well, there goes *that* whole crusade we were all out on. Since we can't
> destroy re, maybe we can go club baby seals.
>
> --
>
> =A0 =A0 ... Stephen Hansen
> =A0 =A0 ... Also: Ixokai
> =A0 =A0 ... Mail: me+list/python (AT) ixokai (DOT) io
> =A0 =A0 ... Blog:http://meh.ixokai.io/
|
|
0
|
|
|
|
Reply
|
justin2009smith (2)
|
7/26/2010 11:19:59 PM
|
|
Justin Smith <justin2009smith@gmail.com> writes:
> Seeking industry expert candidates
Please don't reply in an existing thread with an unrelated message. If
you want to start a new discussion, compose a new message, not a reply.
For job advertisements, please don't use this forum at all; instead use
the Python Jobs Board <URL:http://www.python.org/community/jobs/>.
--
\ “We are stuck with technology when what we really want is just |
`\ stuff that works.” —Douglas Adams |
_o__) |
Ben Finney
|
|
0
|
|
|
|
Reply
|
python6 (872)
|
7/27/2010 3:26:46 AM
|
|
On 7/26/2010 4:19 PM, Justin Smith wrote:
> Seeking industry expert candidates
>
> I�m Justin Smith, Director of Tech Recruiting at Express Seattle. I
> am currently seeking candidates to fill Tech Positions for multiple A-
> List Clients:
Spammer detected.
Injection-Info: r27g2000yqb.googlegroups.com;
posting-host=63.170.35.94;
posting-account=XlBkJgkAAAC7JNUw8ZEYCvz12vv6mGCK
Reverse DNS: "franchisevpn.expresspersonnel.com"
Site analysis: Domain "www.expresspersonnel.com"
redirected to different domain "www.expresspros.com"
Site analysis:
From Secure certificate (Secure certificate, high confidence)
Express Personnel Services, Inc.
Oklahoma City, OK
UNITED STATES
Oklahoma corporation search:
EXPRESS SERVICES, INC.
Filing Number: 2400436307
Name Type: Legal Name
Status: In Existence
Corp type: Foreign For Profit Business Corporation
Jurisdiction: COLORADO
Formation Date: 28 Aug 1985
Colorado corporation search:
ID: 19871524232
Name: EXPRESS SERVICES, INC.
Principal Street Address: 8516 NW Expressway,
Oklahoma City, OK 73162, United States
Target coordinates:
35.56973,-97.668001
Corporate class: Franchiser
|
|
0
|
|
|
|
Reply
|
nagle (1023)
|
7/27/2010 6:40:16 PM
|
|
John Nagle <nagle@animats.com> writes:
> On 7/26/2010 4:19 PM, Justin Smith wrote:
>> Seeking industry expert candidates
>>
>> I’m Justin Smith, Director of Tech Recruiting at Express Seattle. I
>> am currently seeking candidates to fill Tech Positions for multiple A-
>> List Clients:
>
> Spammer detected.
But did you report it? (If so, it helps if you state so).
> Injection-Info: r27g2000yqb.googlegroups.com;
> posting-host=63.170.35.94;
http://www.spamcop.net/sc?track=63.170.35.94 -> looks like abuse goes to
the spammer... A whois gives sprint.net, so you could contact abuse at
sprint.net (see: http://whois.domaintools.com/63.170.35.94 )
[snip address etc.]
Spammers don't care about that. Best course of action, based on my
experience, is to contact abuse at googlegroups.com (now and then it
actually works), and sprint.net.
--
John Bokma j3b
Hacking & Hiking in Mexico - http://johnbokma.com/
http://castleamber.com/ - Perl & Python Development
|
|
0
|
|
|
|
Reply
|
john167 (400)
|
7/27/2010 8:52:05 PM
|
|
|
126 Replies
198 Views
(page loaded in 0.967 seconds)
Similiar Articles: efficiency in awk - comp.lang.awkSo to end the missionary work and answer your ... loops or recursion) over large ranges or sets of data. ... think the days of interpreted languages being considered ... Escape character - Wikipedia, the free encyclopedia... part of the syntax for many programming languages, data ... An escape character may not have its own meaning, so ... The ampersand (&) character may be considered as an escape ... Magic quotes - Wikipedia, the free encyclopediaWhile many DBMS support escaping quotes with a backslash ... for an approach involving data tainting, where data from untrusted sources, such as user input, are considered ... 7/23/2012 9:30:56 AM
|