f

#### Conversion List -> Multiset

```[Please redirect me to some appropriate group if I am off topic here]

For some practical application I need (something similar to...) some
algorithm that does the following:

Given a

list of size n containing pairs (l_i, 1)
(where possibly l_i == l_j for some i != j)

transform this to a

list containing pairs (u_j, card {i : l_i = u_j} )
(with u_i != u_j for i != j)

[This is essentially counting the duplicated elements]

Now I know that I can simply sort the first list in time O(n log n), and
collect identical elements in O(n) afterwards. So this is O(n log n) total.

I wonder whether there could be some better algorithm as I don't need the
u_i _ordered_ in the end.

Hashing comes to mind, but the l_i are 'large', and for practical reasons
I cannot copy the original list to a complete new structure like a hash
map.

[In case numbers might help: n is about 10^4...10^8, and there are not
more than about 20 repetitions of any element in the original list]

Andre'
``` 0 7/1/2003 10:19:19 AM comp.theory  5139 articles. 1 followers. 2 Replies 367 Views Similar Articles

[PageSpeed] 36

```Andr� P�nitz <poenitz@gmx.net> writes:

> [Please redirect me to some appropriate group if I am off topic here]
>
> For some practical application I need (something similar to...) some
> algorithm that does the following:
>
> Given a
>
>   list of size n containing pairs (l_i, 1)
>   (where possibly l_i == l_j for some i != j)
>
> transform this to a
>
>   list containing pairs (u_j, card {i : l_i = u_j} )
>   (with u_i != u_j for i != j)
>
> [This is essentially counting the duplicated elements]
>
> Now I know that I can simply sort the first list in time O(n log n), and
> collect identical elements in O(n) afterwards. So this is O(n log n) total.
>
> I wonder whether there could be some better algorithm as I don't need the
> u_i _ordered_ in the end.

Look up "semi-sorting".  Semi-sorting is making identical elements
identical elements.  I believe linear-time algorithms for semi-sorting
exist, but the linearity may be in the range of possible numbers
rather than the length of the list.

Torben Mogensen
``` 0 7/1/2003 11:03:32 AM
```Torben �gidius Mogensen <torbenm@diku.dk> wrote:
>> I wonder whether there could be some better algorithm as I don't need the
>> u_i _ordered_ in the end.
>
> Look up "semi-sorting".
> Semi-sorting is making identical elements adjacent, but with no
> requirements about the ordering of groups of identical elements.

This seems to be indeed what I am looking for.

Unfortunately, google is not too chatty on that subject, so I'll have to
dig deeper.

Thanks for the hint!

Andre'

--
Those who desire to give up Freedom in order to gain Security, will not have,
nor do they deserve, either one.     (T. Jefferson or B. Franklin or both...)
``` 0 7/2/2003 7:49:34 AM