Index after the fact?

  • Follow


My original schema / database / table create script did not include all my necessary keys. 

Is it too late to know specify new keys?
Right now there is only one primary key.

I've noticed that certain queries are very slow when selecting on a column that isn't a key.

So my question is can a new key be added now?
Will that destroy any contents of the current table?
Will that speed up the results of select statements on columns that aren't keys?

TIA.
B. 
0
Reply bjobrien62 (510) 11/25/2011 9:35:10 PM

On 11/25/2011 4:35 PM, SpreadTooThin wrote:
> My original schema / database / table create script did not include all my necessary keys.
>
> Is it too late to know specify new keys?
> Right now there is only one primary key.
>
> I've noticed that certain queries are very slow when selecting on a column that isn't a key.
>
> So my question is can a new key be added now?

http://dev.mysql.com/doc/refman/5.0/en/create-index.html

> Will that destroy any contents of the current table?

No.

> Will that speed up the results of select statements on columns that aren't keys?

If you build the index(es) properly, yes.
0
Reply doug_at_milmacdotcom (64) 11/25/2011 9:54:52 PM


On Nov 25, 2:54=A0pm, Doug Miller <doug_at_milmacdot...@example.com>
wrote:
> On 11/25/2011 4:35 PM, SpreadTooThin wrote:
>
> > My original schema / database / table create script did not include all=
 my necessary keys.
>
> > Is it too late to know specify new keys?
> > Right now there is only one primary key.
>
> > I've noticed that certain queries are very slow when selecting on a col=
umn that isn't a key.
>
> > So my question is can a new key be added now?
>
> http://dev.mysql.com/doc/refman/5.0/en/create-index.html
>
> > Will that destroy any contents of the current table?
>
> No.
>
> > Will that speed up the results of select statements on columns that are=
n't keys?
>
> If you build the index(es) properly, yes.

And of course new insert will automatically update the new indecies
(indexes) as well?
0
Reply bjobrien62 (510) 11/25/2011 10:07:17 PM

On 11/25/2011 5:07 PM, SpreadTooThin wrote:

>
> And of course new insert will automatically update the new indecies
*indices
> (indexes) as well?

Yes.
0
Reply doug_at_milmacdotcom (64) 11/25/2011 10:30:48 PM

> My original schema / database / table create script did not include all my necessary keys. 
> 
> Is it too late to know specify new keys?

No.  Check out commands such as:
	CREATE INDEX ...
	ALTER TABLE foo ADD INDEX (last_name, first_name);
	ALTER TABLE foo ADD UNIQUE (last_name, first_name);
	ALTER TABLE foo ADD PRIMARY KEY (member_number);
	ALTER TABLE foo DROP PRIMARY KEY;
	ALTER TABLE foo DROP INDEX ... 

If you have less than optimal indexes, you may want to drop the old
ones and add new ones (which can be done in one large ALTER TABLE).

> Right now there is only one primary key.

You are permitted to have at most one primary key, for much the
same reason that if you have twins, one child is considered the
primary child and the other one is only a twin.  Ok, so that analogy
doesn't work so well.  "primary key" has a vague connotation of
"most important key", which is why there can be only one.  If you
want a different primary key, drop the old one first, then add a
new one.

You may have, however, multiple unique indexes.  Each serves two
purposes:  speeding lookups and preventing duplicates.  (If you
want to allow duplicates, use a non-unique index).  Only one of
them can be the primary key, if for no other reason that indexes
have names and only one of them can be named 'PRIMARY'.

I'm sure there is a limit on the number of indexes you can have but
unless you go really nuts creating them, it's not normally a problem.
(All storage engines allow at least 16 indexes.  Some allow many
more.) Note that having lots of indexes slows down inserting and
deleting rows, and updating key fields in rows.  Depending on the
ratio of looking up records vs. changing them in the mix of queries
your application uses, this can be a good thing or a bad thing.

(There are a few applications for a write-mostly, read-rarely
database, such as an ISP recording dynamic IP addresses assigned
to users (where login time is not billed).  There might be 100,000
logins/logouts a day, and half a dozen lookups of the form "who had
this IP at this time" from abuse complaints.)


> I've noticed that certain queries are very slow when selecting on a column that isn't a key.
> 
> So my question is can a new key be added now?

Yes.  

> Will that destroy any contents of the current table?

No.  However, if you attempt to add a unique index when there are
duplicates in the table, adding the index will fail.  Make it a
non-unique index or somehow eliminate the duplicates.  Also, adding
an index with a large quantity of data can be slow.


> Will that speed up the results of select statements on columns that aren't keys?

With intelligent selection of the new key, (which might involve
multiple columns) it should speed up the query.  If you do it while
your application is running, it may pause for a while while the
index is being created, depending on how much data you have in
there.  Unless you are doing something real-time and safety-critical
like guiding a missile, this is not normally a problem.


0
Reply gordonb.trwcc (1) 11/26/2011 4:16:21 AM

Thanks very much for your detailed reply.

I was wondering if there is a way to analyze the sql that is being used to access the database and customize the indices based on the sql?
0
Reply bjobrien62 (510) 11/27/2011 4:44:03 PM

SpreadTooThin wrote:
> Thanks very much for your detailed reply.
> 
> I was wondering if there is a way to analyze the sql that is being used to access the database and customize the indices based on the sql?
I am no guru, but look at the 'explain' command.

It is easy to add indices after the creation of a table.

My approach is to index until performance is acceptable - but beware of 
caching.  MySQL caches results so unless you change data between tests a 
second lookup is massively faster than the first.
0
Reply tnp (2247) 11/27/2011 6:05:43 PM

> Thanks very much for your detailed reply.
> 
> I was wondering if there is a way to analyze the sql that is being used to access the database and customize the indices based on the sql?

If you need to get hold of the queries being used, you can turn on
the MySQL log.  Or look at the source code of your applications.
Then look at the SQL.

First, if you have auto_increment fields, you need a unique index
on that field (probably a primary key).

Also, if you need to enforce uniqueness on the values in a particular
column or combination of particular columns, you need a unique index
on that (the columns may be in any order).  This is a candidate for
a primary key if you don't already have one.

Now, look at the WHERE and ON clauses used in your queries.

	WHERE field = 'value'
or	WHERE field > 'value'
or	WHERE field < 'value'
suggests an index on field.

	WHERE field1 = 'value1' AND field2 = 'value2'
suggests an index on (field1, field2) or (field2, field1).

	WHERE field LIKE '%value%'
does not suggest an index, since there is no leading character
in the pattern.

	WHERE field LIKE 'value%'
(with no leading % or _ wildcards) suggests an index on field.

	WHERE last_name = 'value1' and first_name = 'value2' and 
		middle_name > 'S'
suggests an index on (last_name, first_name, middle_name) or
(first_name, last_name, middle_name).  The exact-equality tests go
first, then an inequality test or a LIKE with a non-wildcard prefix.

	If you JOIN two tables on one or more fields being equal,
that suggests an index on those fields (in any order).

Now, try to merge all the suggestions into a reasonable (and small)
set of indexes.  Keep in mind that the most important indexes are
those suggested by queries that are:
	- run most often
	- deal with the most data
	- are most time-critical
	- are the most time-consuming
	- are most selective (narrow down the choices the most)

There is no point in having multiple indexes on the same fields in
the same order on the same table.  Drop duplicates.

If a suggested index has multiple fields, and one or more of those
fields is a unique index or primary key compared for exact equality,
you might as well drop all but one unique value, as the unique value
will select at most one record, and the others do not add much (that
one record will either be included or excluded by the values of the
other fields).  Since you already have a unique index for the
uniqueness, no additional indexes are needed for that suggested
index.

Order of fields in an index matters.  Phone books are ordered by
(last_name, first_name), and that ordering is pretty useless for
searching for first_name='George' AND street_address='123 Main
Street'.  You end up having to scan the whole book.

An index on (a, b, c, d) works as an index on (a), (a, b), and (a, b, c)
as well, but *not* on (b, c, d).

An index on (a, b) works as a *PARTIAL* substitute for an index on
(a, b, c), or (a, b, c, d) (but it's slightly slower).  

MySQL generally uses at most one index on the WHERE clause and one
index on each JOIN.  An index on (a) and a separate index on (b)
is not a complete substitute for an index on (b, a) or (a, b) (but
it is a partial substitute, using the index on (a) as a partial
substitute for one on (a,b) ).

MySQL can also use an index for the fields in an ORDER BY clause.

An index on a field that has few different values does little to
improve the search.  For example, for a customer database for a
lawn-mowing service or other *local* small business in Fort Worth,
Texas, an index on (planet, country, state, city) as opposed to
just (city) may be rather pointless since 99.9% of them are for
('Earth', 'United States', 'Texas', city), (hint:  Fort Worth is
not near the Texas state boundary) and, surprise, no entries for
Australia, France, Oklahoma, or Mars.  For a mail-order business,
this can be very different.


Example:  I have a database of video recordings, which is accessed
by private web pages.  The 'recording' table has one entry for each
copy of a particular recording.  There are other tables I'm not
going to mention here.

1.  The table has an `id` auto_increment field.  This is the primary key.

2.  A given `track` on a given `disk` may contain no more than one
episode recording, so this suggests a unique index on (disk, track)
or (track, disk) to prevent duplicates.

The web site has several pages that use the following WHERE clauses:

3.  WHERE series = 'value'			
		(all the recordings for a particular series)
		suggests an index on (series).
4.  WHERE series = 'value1' and prod = 'value2'
		(all the recordings of a particular episode (series
		and production number) - there can be more than one
		such recording)
		suggests an index on (series, prod) or (prod, series)
5.  WHERE disk = 'value'
		(all the recordings on a particular disk)
		suggests an index on (disk)

6.  Updates to particular records, for example, when the recording is
moved from a DVR hard disk to a DVD, are done by:
WHERE id = 'value'.

There are other fields, such as recording length, recording date,
quality, and title, which do not need an index.  You might think
that this is not normalized, as title might seem to depend only on
(series, prod), but it turns out that the same episode can have
different titles on different recordings.

Ok, we've got suggestions for indexes:

1.  (id) unique
2.  (disk, track) or (track, disk) unique
3.  (series)
4.  (series, prod) or (prod, series)
5.  (disk)
6.  (id)

I need a primary key of (id) to satisfy #1, which also covers #6.
From #2, I choose a unique index on (disk, track) which also satisfies #5.
If I had chosen (track, disk), that wouldn't satisfy #5.
From #4, I choose a non-unique index on (series, prod) which also
satisfies #3.  

That's 3 indexes.

0
Reply gordonb.03qgo (1) 11/29/2011 1:14:56 AM

I just had this sinking feeling...
While the ALTER TABLE command is in progress, should clients be stopped from doing inserts into that table?

because.. they are!!!!
0
Reply bjobrien62 (510) 12/1/2011 7:15:47 PM

SpreadTooThin wrote:
> I just had this sinking feeling...
> While the ALTER TABLE command is in progress, should clients be stopped from doing inserts into that table?
> 
> because.. they are!!!!


I've never bothered ;-)

My guess is it locks the table against inserts while indexing...oddly 
enough MySQL seems to have been written by people with brains larger 
than peas. And it would seem the logical thing to do.



0
Reply tnp (2247) 12/1/2011 8:22:25 PM

On 12/1/2011 2:15 PM, SpreadTooThin wrote:
> I just had this sinking feeling...
> While the ALTER TABLE command is in progress, should clients be stopped from doing inserts into that table?
>
> because.. they are!!!!

Yes, think about it.  To build the index, MySQL has to scan the entire 
table.  While that is going on, you wouldn't want changes to the table - 
it could corrupt the index.

MySQL *could* keep track of the changes and apply them to the index 
after the fact, but to do so would be fairly complicated - for an action 
which is performed very seldom.  So rather, MySQL locks the table while 
the index is being built.

Unless you have a huge table, it shouldn't take that long.  Just do it 
when the system is not very active.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
0
Reply jstucklex (14341) 12/1/2011 8:34:35 PM

can I create an alter table and add three indices all at one?
so rather than:
ALTER TABLE MyTable ADD INDEX (a);
ALTER TABLE MyTable ADD INDEX (b);
ALTER TABLE MyTable ADD INDEX (c);

Can this be done?
ALTER TABLE MyTable ADD INDEX (a), (b), (c);
0
Reply bjobrien62 (510) 12/2/2011 7:22:09 PM

On 12/2/2011 2:22 PM, SpreadTooThin wrote:
> can I create an alter table and add three indices all at one?
> so rather than:
> ALTER TABLE MyTable ADD INDEX (a);
> ALTER TABLE MyTable ADD INDEX (b);
> ALTER TABLE MyTable ADD INDEX (c);
>
> Can this be done?
> ALTER TABLE MyTable ADD INDEX (a), (b), (c);

You can, syntax would be

ALTER TABLE MyTable ADD INDEX ndxa (a),
                      ADD INDEX ndxb (b),
                      ADD INDEX ndxc (c);

P.S. Note that I added names to the indexes.  I find its a good idea to 
name your indexes - that way if you have you can more easily tell what 
the indexes pertain to.

Not sure how much this is going to help, though.  MySQL still has to 
lock the tables - and creating 3 indexes at once will take more time 
than creating 1. I don't know if it's 3x the amount of time, but if it's 
taking a long time to create the indexes, I would think a break between 
them to allow waiting programs to access the table.

One other thing - are you sure you need all the indexes?  Too many 
indexes are bad also - they slow down UPDATE (affecting the indexed 
column(s)), INSERT and DELETE statements.  For best performance you 
should have the indexes you need, but only the indexes you need.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
0
Reply jstucklex (14341) 12/2/2011 8:28:05 PM

Well I looked at the SQL and the WHERE statements and I can see that these indexes are necessary.  (My table has 12.5 Million Rows at the moment.)
The insert still seems quite fast, and the application lag that I was experiencing before adding these indexes is GONE! Yay.

Thanks all for the tips.
0
Reply bjobrien62 (510) 12/3/2011 2:29:05 AM

> I just had this sinking feeling...
> While the ALTER TABLE command is in progress, should clients be stopped from doing inserts into that table?
> 
> because.. they are!!!!

If the table before and after the ALTER TABLE is acceptable to your
application (as would be the case for adding/deleting indexes),
you're fine (MySQL table locking will handle it), unless the
applications are so real-time that a locked table pause to rebuild
an index is unacceptable.  (For example, if your client is running
your heart pacemaker, be careful.)

If you are renaming or deleting a column used by the application and
intend to fix the application to no longer use that column, yes, you
need to stop your clients and fix them.

If the number or order of columns is changing, this may break your SQL.
One way to avoid this is to program defensively.  In INSERT sql,
list the columns:
	INSERT INTO foo (bar, baz, name) VALUES (99, 12, 'George');
NOT
	INSERT INTO foo VALUES (99, 12, 'George');
which will break if someone adds a 4th column.

Don't use SELECT * FROM ...; .  Name the columns.  This protects you
against problems when adding or re-arranging the order of the columns.



0
Reply gordonb.bg25f (1) 12/3/2011 3:46:21 AM

SpreadTooThin wrote:
> Well I looked at the SQL and the WHERE statements and I can see that these indexes are necessary.  (My table has 12.5 Million Rows at the moment.)
> The insert still seems quite fast, and the application lag that I was experiencing before adding these indexes is GONE! Yay.
> 

Always nice to hear good news..
> Thanks all for the tips.
0
Reply tnp (2247) 12/3/2011 4:06:36 AM

Gordon Burditt wrote:
>> I just had this sinking feeling...
>> While the ALTER TABLE command is in progress, should clients be stopped from doing inserts into that table?
>>
>> because.. they are!!!!
> 
> If the table before and after the ALTER TABLE is acceptable to your
> application (as would be the case for adding/deleting indexes),
> you're fine (MySQL table locking will handle it), unless the
> applications are so real-time that a locked table pause to rebuild
> an index is unacceptable.  (For example, if your client is running
> your heart pacemaker, be careful.)
> 
> If you are renaming or deleting a column used by the application and
> intend to fix the application to no longer use that column, yes, you
> need to stop your clients and fix them.
> 
> If the number or order of columns is changing, this may break your SQL.
> One way to avoid this is to program defensively.  In INSERT sql,
> list the columns:
> 	INSERT INTO foo (bar, baz, name) VALUES (99, 12, 'George');
> NOT
> 	INSERT INTO foo VALUES (99, 12, 'George');
> which will break if someone adds a 4th column.

well will it?

Does Mysql complain or simply fill the first three?

I must say my preference is:

INSERT INTO foo set bar='99', baz='12', name='George');


Simply because I find it easier to relate to things as name/value pairs...


> 
> Don't use SELECT * FROM ...; .  Name the columns.  This protects you
> against problems when adding or re-arranging the order of the columns.
> 
> 

Again, if using - say - PHP the data is orgainised in a field name 
indexed associative array, so it doesn't matter what the order is if you 
use the indices to access it.

> 
0
Reply tnp (2247) 12/3/2011 4:11:17 AM

>> If the number or order of columns is changing, this may break your SQL.
>> One way to avoid this is to program defensively.  In INSERT sql,
>> list the columns:
>>       INSERT INTO foo (bar, baz, name) VALUES (99, 12, 'George');
>> NOT
>>       INSERT INTO foo VALUES (99, 12, 'George');
>> which will break if someone adds a 4th column.
> 
> well will it?

Yes, the error i get is:

ERROR 1136 (21S01): Column count doesn't match value count at row 1


> Again, if using - say - PHP the data is orgainised in a field name 
> indexed associative array, so it doesn't matter what the order is if you 
> use the indices to access it.

I have found that to be a problem on occasion when the value being
selected is an expression (perhaps a messy, complicated one) and
trying to figure out what name it has in the associative array
isn't always easy.

It may also make it much harder to figure out what code actually uses
a given field, should you be thinking of changing its type or changing
what's in it.
0
Reply gordon13 (229) 12/3/2011 5:57:10 AM

On 12/2/2011 11:11 PM, The Natural Philosopher wrote:
> Gordon Burditt wrote:
>>> I just had this sinking feeling...
>>> While the ALTER TABLE command is in progress, should clients be
>>> stopped from doing inserts into that table?
>>>
>>> because.. they are!!!!
>>
>> If the table before and after the ALTER TABLE is acceptable to your
>> application (as would be the case for adding/deleting indexes),
>> you're fine (MySQL table locking will handle it), unless the
>> applications are so real-time that a locked table pause to rebuild
>> an index is unacceptable. (For example, if your client is running
>> your heart pacemaker, be careful.)
>>
>> If you are renaming or deleting a column used by the application and
>> intend to fix the application to no longer use that column, yes, you
>> need to stop your clients and fix them.
>>
>> If the number or order of columns is changing, this may break your SQL.
>> One way to avoid this is to program defensively. In INSERT sql,
>> list the columns:
>> INSERT INTO foo (bar, baz, name) VALUES (99, 12, 'George');
>> NOT
>> INSERT INTO foo VALUES (99, 12, 'George');
>> which will break if someone adds a 4th column.
>
> well will it?
>
> Does Mysql complain or simply fill the first three?
>
> I must say my preference is:
>
> INSERT INTO foo set bar='99', baz='12', name='George');
>
>
> Simply because I find it easier to relate to things as name/value pairs...
>
>

Which is non-standard SQL and will fail on every other RDBMS on the 
market (and even in MySQL running in STRICT mode).


>>
>> Don't use SELECT * FROM ...; . Name the columns. This protects you
>> against problems when adding or re-arranging the order of the columns.
>>
>>
>
> Again, if using - say - PHP the data is orgainised in a field name
> indexed associative array, so it doesn't matter what the order is if you
> use the indices to access it.
>
>>

Once again you miss the point...

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
0
Reply jstucklex (14341) 12/3/2011 2:14:55 PM

On 12/3/2011 12:57 AM, Gordon Burditt wrote:
>>> If the number or order of columns is changing, this may break your SQL.
>>> One way to avoid this is to program defensively.  In INSERT sql,
>>> list the columns:
>>>        INSERT INTO foo (bar, baz, name) VALUES (99, 12, 'George');
>>> NOT
>>>        INSERT INTO foo VALUES (99, 12, 'George');
>>> which will break if someone adds a 4th column.
>>
>> well will it?
>
> Yes, the error i get is:
>
> ERROR 1136 (21S01): Column count doesn't match value count at row 1
>
>

Gordon, don't try to argue with TNP.  There are 10 year olds with more 
programming aptitude than he has.  But he's not afraid to tell people 
why his ways are best.

>> Again, if using - say - PHP the data is orgainised in a field name
>> indexed associative array, so it doesn't matter what the order is if you
>> use the indices to access it.
>
> I have found that to be a problem on occasion when the value being
> selected is an expression (perhaps a messy, complicated one) and
> trying to figure out what name it has in the associative array
> isn't always easy.
>
> It may also make it much harder to figure out what code actually uses
> a given field, should you be thinking of changing its type or changing
> what's in it.

Quite true.   But immaterial for a script kiddie like TNP.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
0
Reply jstucklex (14341) 12/3/2011 2:17:05 PM

19 Replies
66 Views

(page loaded in 0.291 seconds)

Similiar Articles:


















7/24/2012 1:54:44 AM


Reply: