join to a table itself takes too long

  • Follow


I made a query with a join of one table to itself, but the execution time 
of thequery takes too long. Afer 200 sec I killed it.

SELECT COUNT(*)
  FROM `statistics_psm` AS `psm1`
    LEFT JOIN `statistics_psm` AS `psm2`
      USING(`RemoteAddr`)

I know, the query above makes no sense, its just a simplified version of 
the original.

Result of EXPLAIN:

+-------+-------+---------------+------------+

| table | type  | possible_keys | key        |

+-------+-------+---------------+------------+

| psm1  | index | NULL          | RemoteAddr |

| psm2  | ref   | RemoteAddr    | RemoteAddr |

+-------+-------+---------------+------------+
+-------+---------------------+-------+-------------+

| table | ref                 | rows  | Extra       |

+-------+---------------------+-------+-------------+

| psm1  | NULL                | 47034 | Using index |

| psm2  | xxx.psm1.RemoteAddr |     3 | Using index |

+-------+---------------------+-------+-------------+


Table structure:

CREATE TABLE IF NOT EXISTS `priz24_statistics_psm` (
  `psm_id` int(10) unsigned NOT NULL default '0',
  `products_id` int(10) unsigned NOT NULL default '0',
  `RemoteAddr` varchar(39) NOT NULL default '',
  `Datetime` datetime NOT NULL default '0000-00-00 00:00:00',
  `Referer` varchar(255) NOT NULL default '',
  PRIMARY KEY  (`psm_id`,`products_id`,`RemoteAddr`,`Datetime`),
  KEY `RemoteAddr` (`RemoteAddr`)
) TYPE=MyISAM;

Name        Typ      Kardinalität Feld
PRIMARY     PRIMARY  47006        psm_id
products_id
RemoteAddr
Datetime
RemoteAddr  INDEX    15668        RemoteAddr

Angaben        Value
Format         dynamic
rows           47,006
rowlength   ø  41
rowsize     ø  85 Bytes

I have tested the query on two different MySQL versions, 3.0.x und 5.x, 
with no differnece.
Can somebody tell me, why is this query soo slow and how to speed it up?

PS:
PSM means comparison shopping site.
The table counts the clicks to products in an online shop from different 
comparison shopping sites.
0
Reply Frank 2/15/2008 7:47:20 AM

Frank Arthur schreef:
> I made a query with a join of one table to itself, but the execution time 
> of thequery takes too long. Afer 200 sec I killed it.
> 
> SELECT COUNT(*)
>   FROM `statistics_psm` AS `psm1`
>     LEFT JOIN `statistics_psm` AS `psm2`
>       USING(`RemoteAddr`)
> 

you did not read the manual completly...
you need to specify the relationship between `psm1` and `psm2`
http://dev.mysql.com/doc/refman/5.0/en/join.html

  SELECT COUNT(*)
    FROM `statistics_psm` AS `psm1`
      LEFT JOIN `statistics_psm` AS `psm2`
      ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
        USING(`RemoteAddr`)


-- 
Luuk
0
Reply Luuk 2/15/2008 8:38:36 AM


Luuk wrote:

> Frank Arthur schrieb:
> you did not read the manual completly... you need to specify the
> relationship between `psm1` and `psm2`
> http://dev.mysql.com/doc/refman/5.0/en/join.html
> 
>   SELECT COUNT(*)
>     FROM `statistics_psm` AS `psm1`
>       LEFT JOIN `statistics_psm` AS `psm2`
>       ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
>         USING(`RemoteAddr`)

You are wrong.

I specified the Relationship with:
USING(`RemoteAddr`)
This is the same as:
ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`

I don't want to join by psm_id, because (for the original query) I need 
the relation between RemoteAddr with and without psm_id.
0
Reply Frank 2/15/2008 9:22:54 AM

Frank Arthur schreef:
> Luuk wrote:
> 
>> Frank Arthur schrieb:
>> you did not read the manual completly... you need to specify the
>> relationship between `psm1` and `psm2`
>> http://dev.mysql.com/doc/refman/5.0/en/join.html
>>
>>   SELECT COUNT(*)
>>     FROM `statistics_psm` AS `psm1`
>>       LEFT JOIN `statistics_psm` AS `psm2`
>>       ON (`psm1`.`psm_id`=`psm2`.`psm_id`)
>>         USING(`RemoteAddr`)
> 
> You are wrong.
> 
> I specified the Relationship with:
> USING(`RemoteAddr`)
> This is the same as:
> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`
> 
> I don't want to join by psm_id, because (for the original query) I need 
> the relation between RemoteAddr with and without psm_id.

sorry, i overlooked...

But a second look at your query is think your result set can be large, 
and because of that slow.

for every `RemoteAddr` you are linking al other `RemoteAddr` values, so 
if a `RemoteAddr` is used often in your database you will get a lot of 
results

if `RemoteAddr` is unique, you'll only get about 47K records
if `RemoteAddr` is used on two records your result set is 47K*2 = 94K 
records
....
if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8 
milion records...


can you post the results of ?:
select RemoteAddr, count(*) c from statistics_psm group by RemoteAddr 
order by c desc limit 10;

-- 
Luuk
0
Reply Luuk 2/15/2008 10:11:57 AM

Luuk wrote:

> Frank Arthur schreef:
>> I specified the Relationship with:
>> USING(`RemoteAddr`)
>> This is the same as:
>> ON `psm1`.`RemoteAddr` = `psm2`.`RemoteAddr`
> 
> sorry, i overlooked...

No problem.^^

> But a second look at your query is think your result set can be large,
> and because of that slow.
> 
> for every `RemoteAddr` you are linking al other `RemoteAddr` values, so
> if a `RemoteAddr` is used often in your database you will get a lot of
> results
> 
> if `RemoteAddr` is unique, you'll only get about 47K records if
> `RemoteAddr` is used on two records your result set is 47K*2 = 94K
> records
> ...
> if `RemoteAddr` is used on 400 records your result set is 47K*400 = 18.8
> milion records...
> 
> 
> can you post the results of ?:

mysql> SELECT `RemoteAddr`
    ->      , COUNT(*) AS `c`
    ->   FROM `statistics_psm`
    ->   GROUP BY `RemoteAddr`
    ->   ORDER BY `c` DESC
    ->   LIMIT 10;
+----------------+-------+
| RemoteAddr     | c     |
+----------------+-------+
| 66.249.66.20   | 19303 |
| 38.98.120.68   |  3609 |
| 84.189.229.26  |   395 |
| 69.65.122.206  |   310 |
| 84.189.235.199 |   293 |
| 121.246.24.116 |   144 |
| 84.189.217.18  |    94 |
| 87.194.5.102   |    85 |
| 84.189.238.249 |    80 |
| 84.189.246.222 |    75 |
+----------------+-------+
10 rows in set (0.04 sec)

You may right. 19303 * 19303 = 372605809 rows
This is too much für a fast query.
Hmm, I may tray to use a temporary table and delete such IPs with too 
much entries.
0
Reply Frank 2/15/2008 10:51:43 AM

4 Replies
413 Views

(page loaded in 0.111 seconds)

Similiar Articles:













7/28/2012 3:10:57 AM


Reply: