I made a query with a join of one table to itself, but the execution time
of thequery takes too long. Afer 200 sec I killed it.
SELECT COUNT(*)
FROM `statistics_psm` AS `psm1`
LEFT JOIN `statistics_psm` AS `psm2`
USING(`RemoteAddr`)
I know, the query above makes no sense, its just a simplified version of
the original.
Result of EXPLAIN:
+-------+-------+---------------+------------+
| table | type | possible_keys | key |
+-------+-------+---------------+------------+
| psm1 | index | NULL | RemoteAddr |
| psm2 | ref | RemoteAddr | RemoteAddr |
+-------+-------+---------------+------------+
+-------+---------------------+-------+-------------+
| table | ref | rows | Extra |
+-------+---------------------+-------+-------------+
| psm1 | NULL | 47034 | Using index |
| psm2 | xxx.psm1.RemoteAddr | 3 | Using index |
+-------+---------------------+-------+-------------+
Table structure:
CREATE TABLE IF NOT EXISTS `priz24_statistics_psm` (
`psm_id` int(10) unsigned NOT NULL default '0',
`products_id` int(10) unsigned NOT NULL default '0',
`RemoteAddr` varchar(39) NOT NULL default '',
`Datetime` datetime NOT NULL default '0000-00-00 00:00:00',
`Referer` varchar(255) NOT NULL default '',
PRIMARY KEY (`psm_id`,`products_id`,`RemoteAddr`,`Datetime`),
KEY `RemoteAddr` (`RemoteAddr`)
) TYPE=MyISAM;
Name Typ Kardinalität Feld
PRIMARY PRIMARY 47006 psm_id
products_id
RemoteAddr
Datetime
RemoteAddr INDEX 15668 RemoteAddr
Angaben Value
Format dynamic
rows 47,006
rowlength ø 41
rowsize ø 85 Bytes
I have tested the query on two different MySQL versions, 3.0.x und 5.x,
with no differnece.
Can somebody tell me, why is this query soo slow and how to speed it up?
PS:
PSM means comparison shopping site.
The table counts the clicks to products in an online shop from different
comparison shopping sites.
|
|
0
|
|
|
|
Reply
|
Frank
|
2/15/2008 7:47:20 AM |
|
Frank Arthur schreef:
> I made a query with a join of one table to itself, but the execution time
> of thequery takes too long. Afer 200 sec I killed it.
>
> SELECT COUNT(*)
> FROM `statistics_psm` AS `psm1`
> LEFT JOIN `statistics_psm` AS `psm2`
> USING(`RemoteAddr`)
>
you did not read the manual completly...
you need to specify the relationship between `psm1` and `psm2`
http://dev.mysql.com/doc/refman/5.0/en/join.html
SELECT COUNT(*)
FROM `statistics_psm` AS `psm1`
LEFT JOIN `statistics_psm` AS `psm2`
ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
USING(`RemoteAddr`)
--
Luuk
|
|
0
|
|
|
|
Reply
|
Luuk
|
2/15/2008 8:38:36 AM
|
|
Luuk wrote:
> Frank Arthur schrieb:
> you did not read the manual completly... you need to specify the
> relationship between `psm1` and `psm2`
> http://dev.mysql.com/doc/refman/5.0/en/join.html
>
> SELECT COUNT(*)
> FROM `statistics_psm` AS `psm1`
> LEFT JOIN `statistics_psm` AS `psm2`
> ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
> USING(`RemoteAddr`)
You are wrong.
I specified the Relationship with:
USING(`RemoteAddr`)
This is the same as:
ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`
I don't want to join by psm_id, because (for the original query) I need
the relation between RemoteAddr with and without psm_id.
|
|
0
|
|
|
|
Reply
|
Frank
|
2/15/2008 9:22:54 AM
|
|
Frank Arthur schreef:
> Luuk wrote:
>
>> Frank Arthur schrieb:
>> you did not read the manual completly... you need to specify the
>> relationship between `psm1` and `psm2`
>> http://dev.mysql.com/doc/refman/5.0/en/join.html
>>
>> SELECT COUNT(*)
>> FROM `statistics_psm` AS `psm1`
>> LEFT JOIN `statistics_psm` AS `psm2`
>> ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
>> USING(`RemoteAddr`)
>
> You are wrong.
>
> I specified the Relationship with:
> USING(`RemoteAddr`)
> This is the same as:
> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`
>
> I don't want to join by psm_id, because (for the original query) I need
> the relation between RemoteAddr with and without psm_id.
sorry, i overlooked...
But a second look at your query is think your result set can be large,
and because of that slow.
for every `RemoteAddr` you are linking al other `RemoteAddr` values, so
if a `RemoteAddr` is used often in your database you will get a lot of
results
if `RemoteAddr` is unique, you'll only get about 47K records
if `RemoteAddr` is used on two records your result set is 47K*2 = 94K
records
....
if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8
milion records...
can you post the results of ?:
select RemoteAddr, count(*) c from statistics_psm group by RemoteAddr
order by c desc limit 10;
--
Luuk
|
|
0
|
|
|
|
Reply
|
Luuk
|
2/15/2008 10:11:57 AM
|
|
Luuk wrote:
> Frank Arthur schreef:
>> I specified the Relationship with:
>> USING(`RemoteAddr`)
>> This is the same as:
>> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`
>
> sorry, i overlooked...
No problem.^^
> But a second look at your query is think your result set can be large,
> and because of that slow.
>
> for every `RemoteAddr` you are linking al other `RemoteAddr` values, so
> if a `RemoteAddr` is used often in your database you will get a lot of
> results
>
> if `RemoteAddr` is unique, you'll only get about 47K records if
> `RemoteAddr` is used on two records your result set is 47K*2 = 94K
> records
> ...
> if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8
> milion records...
>
>
> can you post the results of ?:
mysql> SELECT `RemoteAddr`
-> , COUNT(*) AS `c`
-> FROM `statistics_psm`
-> GROUP BY `RemoteAddr`
-> ORDER BY `c` DESC
-> LIMIT 10;
+----------------+-------+
| RemoteAddr | c |
+----------------+-------+
| 66.249.66.20 | 19303 |
| 38.98.120.68 | 3609 |
| 84.189.229.26 | 395 |
| 69.65.122.206 | 310 |
| 84.189.235.199 | 293 |
| 121.246.24.116 | 144 |
| 84.189.217.18 | 94 |
| 87.194.5.102 | 85 |
| 84.189.238.249 | 80 |
| 84.189.246.222 | 75 |
+----------------+-------+
10 rows in set (0.04 sec)
You may right. 19303 * 19303 = 372605809 rows
This is too much für a fast query.
Hmm, I may tray to use a temporary table and delete such IPs with too
much entries.
|
|
0
|
|
|
|
Reply
|
Frank
|
2/15/2008 10:51:43 AM
|
|
|
4 Replies
413 Views
(page loaded in 0.111 seconds)
Similiar Articles: Identitying duplicate values using self-join - comp.databases ...Identitying duplicate values using self-join - comp.databases ... Identitying duplicate values using self-join - comp.databases ... join to a table itself takes too long ... Hibernate: table without primary key - comp.lang.java.databases ...join to a table itself takes too long - comp.databases.mysql ... I made a query with a join of one table to itself, but ... Referer` varchar(255) NOT NULL default ... Non-unique columns via ODBC driver - comp.lang.perl.misc ...Non-unique columns via ODBC driver - comp.lang.perl.misc ... join to a table itself takes too long - comp.databases.mysql ..... get a lot of results if `RemoteAddr` is ... How to connect to a remote MySQL database using Java - comp.lang ...I suggest trying a type-4 (all Java) driver. Joe ... How to set HTTP connect timeout - comp.lang.javascript. join to a table itself takes too long - comp.databases.mysql ... Table with 400000 rows takes a long time to count - comp.databases ...... the table itself ... self-join - comp.databases ... Table with 400000 rows takes a long ... long? ..... COUNT(*) takes long ... count(*) from my table: I have noticed too and ... How best to detect duplicate values in a column? - comp.databases ...- Join the table with itself, looking for matching title but different questIDs. ... But "surveyquestion" is too long a name for a table, and "question" is totally ... How to optimize primary key change on a large table? - comp ...... the table? I would first be worried about the data itself, and only after that about the time it takes. ... Long PK is to ... order) FROM table ... two files with join ... reading a huge HDF5 file in matlab...takes forever - comp.soft-sys ...When i used the following syntax, it ended up taking a very long ... in increments Part 2 is the actual function itself ... (printing them on command window took too much time ... Compiled Matlab GUI code starts up too slow... - comp.soft-sys ...Obviously, I think it takes too much time, but I wonder ... program before even opening the matlab, how long does it take ... and when it completes the job terminates itself. IBM COBOL Migration to Windows COBOL - comp.lang.cobolThese objects are generated from your RDB table ... > > I guess the replacement system development is taking too long ... probably not a good move as the XML standard itself ... Left outer join takes a long time to excute - Dev ShedVisit Dev Shed to discuss Left outer join takes a long time to ... This is scaled down table structure for testing ... metadata and firebird supports updateable views too) ... Join (SQL) - Wikipedia, the free encyclopediaAs a special case, a table (base table, view, or joined table) can JOIN to itself in a self-join. ... and left outer joins may replace each other as long as the table ... 7/28/2012 3:10:57 AM
|