COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### data mining

• Email
• Follow

```In data mining you are typically dealing with millions of rows of data
and if you are talking about internet browsing data, even 100 million
rows  or more.

Let us say that you have 30 attributes (explanatory variables) in each
row plus a response variable (0 = no response 1 = response).

There are all kinds of analysis one can do on such a data set and I
would like some advice on desigining a program to do one of them
("Decision Trees")  with Fortran.

We want to split the original data set into two subsets (after which
the analysis can be repeated on each of the two subsets) by splitting
on any one of the thirty attributes.  If it is a true-false type
attribute then there is only way to split on that attribute, but in
other cases there would be more choices.  The aim of the split is to
create "pure" subsets so that one subset has more responders (based on
count or the proportion or perhaps other measures) than the other and
the top level attribute to be split on would be the one that makes the
difference as high as possible.  There might be some constraints as to
how big or small each subset can be.

If we are talking about a small number of rows then this is a pretty
elementary problem as far as I can see it.  There are free and
commercial packages that offer to do this - but if one were to do this
from scratch in Fortran, I would appreciate the group's suggestions as
to how this kind of volume of data can be handled.

```
 0
Reply analyst41 (233) 7/1/2007 1:05:03 PM

See related articles to this posting

```<analyst41@hotmail.com> wrote in message
> In data mining you are typically dealing with millions of rows of data
> and if you are talking about internet browsing data, even 100 million
> rows  or more.
>
> Let us say that you have 30 attributes (explanatory variables) in each
> row plus a response variable (0 = no response 1 = response).
>
> There are all kinds of analysis one can do on such a data set and I
> would like some advice on desigining a program to do one of them
> ("Decision Trees")  with Fortran.
>
> We want to split the original data set into two subsets (after which
> the analysis can be repeated on each of the two subsets) by splitting
> on any one of the thirty attributes.  If it is a true-false type
> attribute then there is only way to split on that attribute, but in
> other cases there would be more choices.  The aim of the split is to
> create "pure" subsets so that one subset has more responders (based on
> count or the proportion or perhaps other measures) than the other and
> the top level attribute to be split on would be the one that makes the
> difference as high as possible.  There might be some constraints as to
> how big or small each subset can be.
>
> If we are talking about a small number of rows then this is a pretty
> elementary problem as far as I can see it.  There are free and
> commercial packages that offer to do this - but if one were to do this
> from scratch in Fortran, I would appreciate the group's suggestions as
> to how this kind of volume of data can be handled.
My own opinion is that so-called "data mining" is methodologically unsound,
to say nothing of its legality.  If you are "harvesting" millions of rows of
data, you might as well get your subsets by using the 29 dimensions of
compatability.  My suggestion for what to do with the data is throw it out
before a court tells you to.
--

```
 0
Reply invalid (121) 7/1/2007 6:16:24 PM

```On Jul 1, 2:16 pm, "Wade Ward" <inva...@invalid.nyet> wrote:
> <analys...@hotmail.com> wrote in message
>
>
>
>
> > In data mining you are typically dealing with millions of rows of data
> > and if you are talking about internet browsing data, even 100 million
> > rows  or more.
>
> > Let us say that you have 30 attributes (explanatory variables) in each
> > row plus a response variable (0 = no response 1 = response).
>
> > There are all kinds of analysis one can do on such a data set and I
> > would like some advice on desigining a program to do one of them
> > ("Decision Trees")  with Fortran.
>
> > We want to split the original data set into two subsets (after which
> > the analysis can be repeated on each of the two subsets) by splitting
> > on any one of the thirty attributes.  If it is a true-false type
> > attribute then there is only way to split on that attribute, but in
> > other cases there would be more choices.  The aim of the split is to
> > create "pure" subsets so that one subset has more responders (based on
> > count or the proportion or perhaps other measures) than the other and
> > the top level attribute to be split on would be the one that makes the
> > difference as high as possible.  There might be some constraints as to
> > how big or small each subset can be.
>
> > If we are talking about a small number of rows then this is a pretty
> > elementary problem as far as I can see it.  There are free and
> > commercial packages that offer to do this - but if one were to do this
> > from scratch in Fortran, I would appreciate the group's suggestions as
> > to how this kind of volume of data can be handled.
>
> My own opinion is that so-called "data mining" is methodologically unsound,
> to say nothing of its legality.  If you are "harvesting" millions of rows of
> data, you might as well get your subsets by using the 29 dimensions of
> compatability.  My suggestion for what to do with the data is throw it out
> before a court tells you to.
> --
> Wade Ward- Hide quoted text -
>
> - Show quoted text -

Let us assume that the data being "mined" can be done so legally.
>From what I understand, when credit card companies send those letters
outlining our "privacy rights", our usage of the card after receiving
those letters amounts to giving them our permission to "mine" our
purchase behavior.

If we split on all thirty attributes, the final subsets would most
likely be too small to be statistically stable.  The trick is to find
5-10 attributes that would give you the maximum differentiation among
the final subsets, while making sure that each subset is large enough
to allow a stable "scoring" of any customer/prospect who falls in to
that subset.

```
 0
Reply analyst41 (233) 7/1/2007 7:54:50 PM

```analyst41@hotmail.com wrote:
> On Jul 1, 2:16 pm, "Wade Ward" <inva...@invalid.nyet> wrote:
>> <analys...@hotmail.com> wrote in message
>>
>>
>>
>>
>>> In data mining you are typically dealing with millions of rows of data
>>> and if you are talking about internet browsing data, even 100 million
>>> rows  or more.
>>> Let us say that you have 30 attributes (explanatory variables) in each
>>> row plus a response variable (0 = no response 1 = response).
>>> There are all kinds of analysis one can do on such a data set and I
>>> would like some advice on desigining a program to do one of them
>>> ("Decision Trees")  with Fortran.
>>> We want to split the original data set into two subsets (after which
>>> the analysis can be repeated on each of the two subsets) by splitting
>>> on any one of the thirty attributes.  If it is a true-false type
>>> attribute then there is only way to split on that attribute, but in
>>> other cases there would be more choices.  The aim of the split is to
>>> create "pure" subsets so that one subset has more responders (based on
>>> count or the proportion or perhaps other measures) than the other and
>>> the top level attribute to be split on would be the one that makes the
>>> difference as high as possible.  There might be some constraints as to
>>> how big or small each subset can be.
>>> If we are talking about a small number of rows then this is a pretty
>>> elementary problem as far as I can see it.  There are free and
>>> commercial packages that offer to do this - but if one were to do this
>>> from scratch in Fortran, I would appreciate the group's suggestions as
>>> to how this kind of volume of data can be handled.
>> My own opinion is that so-called "data mining" is methodologically unsound,
>> to say nothing of its legality.  If you are "harvesting" millions of rows of
>> data, you might as well get your subsets by using the 29 dimensions of
>> compatability.  My suggestion for what to do with the data is throw it out
>> before a court tells you to.
>> --
>> Wade Ward- Hide quoted text -
>>
>> - Show quoted text -
>
> Let us assume that the data being "mined" can be done so legally.
>>From what I understand, when credit card companies send those letters
> outlining our "privacy rights", our usage of the card after receiving
> those letters amounts to giving them our permission to "mine" our
> purchase behavior.
>
> If we split on all thirty attributes, the final subsets would most
> likely be too small to be statistically stable.  The trick is to find
> 5-10 attributes that would give you the maximum differentiation among
> the final subsets, while making sure that each subset is large enough
> to allow a stable "scoring" of any customer/prospect who falls in to
> that subset.

What little I know is it is essentially the k-means clustering problem.
I don't know of Fortran source, but a related search some time ago had
led me to the C/C++ open-source clustering library.  Whether it will be
of any benefit to your problem I don't know...

--

http://bonsai.ims.u-tokyo.ac.jp/%7Emdehoon/software/cluster/software.htm#source

```
 0
Reply none1568 (7455) 7/1/2007 9:16:16 PM

```On Jul 1, 4:16 pm, dpb <n...@non.net> wrote:

<snip>

> What little I know is it is essentially the k-means clustering problem.
>   I don't know of Fortran source, but a related search some time ago had
> led me to the C/C++ open-source clustering library.

There is plenty of Fortran code for clustering, for example kmeans
clustering code at http://people.scs.fsu.edu/~burkardt/f_src/kmeans/kmeans.html
..

```
 0
Reply beliavsky (2211) 7/1/2007 9:43:57 PM

```What you are after is cluster analysis and segmentation analysis
methods.

I'm interested in such problems since I'm CEO Tau Systems which
supplies market reasearch data analysis program software since 1972.
And yes, all we sell is written in Fortran F77.

It is very easy to arrive at statistically incorrect conclusions,
using accurarate tools the wrong way. To continue in your project you
MUST use a professional, experienced statistician as advisor.

By all means write programs to manage and massage your data, and to
sort and  select data into specified subsets. But don't try writing
the statistical analysis part.
Buy a commercial dedicated segmentation analysis package, or use SAS
or similar vast systems.

Lastly I would point out that ascii-based systems are readable, but
binary systems are ingerently 128 time faster in parallel processing
and occupy one eght of the storage medium space and access time. Also
binary systems can have fixed format fields, whereas ascii systems
with multiple-response replies taking up variable lengths, need a data
map as part of the file.

```
 0
Reply tbwright (1098) 7/1/2007 10:53:13 PM

```On Jul 1, 5:53 pm, Terence <tbwri...@cantv.net> wrote:

<snip>

> It is very easy to arrive at statistically incorrect conclusions,
> using accurarate tools the wrong way. To continue in your project you
> MUST use a professional, experienced statistician as advisor.
>
> By all means write programs to manage and massage your data, and to
> sort and  select data into specified subsets. But don't try writing
> the statistical analysis part.
> Buy a commercial dedicated segmentation analysis package, or use SAS
> or similar vast systems.

R, which is open source (C and Fortran) may be of comparable quality
to commercial statistical software.

```
 0
Reply beliavsky (2211) 7/2/2007 12:43:42 AM

```On Jul 1, 8:43 pm, Beliavsky <beliav...@aol.com> wrote:
> On Jul 1, 5:53 pm, Terence <tbwri...@cantv.net> wrote:
>
> <snip>
>
> > It is very easy to arrive at statistically incorrect conclusions,
> > using accurarate tools the wrong way. To continue in your project you
> > MUST use a professional, experienced statistician as advisor.
>
> > By all means write programs to manage and massage your data, and to
> > sort and  select data into specified subsets. But don't try writing
> > the statistical analysis part.
> > Buy a commercial dedicated segmentation analysis package, or use SAS
> > or similar vast systems.
>
> R, which is open source (C and Fortran) may be of comparable quality
> to commercial statistical software.

Thanks to all for the replies.  I'll check out the clsutering Fortran
code and see if it can be adapted to what I want to do.

```
 0
Reply analyst41 (233) 7/2/2007 10:49:00 AM

```<analyst41@hotmail.com> wrote in message
> On Jul 1, 8:43 pm, Beliavsky <beliav...@aol.com> wrote:
>> On Jul 1, 5:53 pm, Terence <tbwri...@cantv.net> wrote:
>>
>> <snip>
>>
>> > It is very easy to arrive at statistically incorrect conclusions,
>> > using accurarate tools the wrong way. To continue in your project you
>> > MUST use a professional, experienced statistician as advisor.
>>
>> > By all means write programs to manage and massage your data, and to
>> > sort and  select data into specified subsets. But don't try writing
>> > the statistical analysis part.
>> > Buy a commercial dedicated segmentation analysis package, or use SAS
>> > or similar vast systems.
>>
>> R, which is open source (C and Fortran) may be of comparable quality
>> to commercial statistical software.
>
> Thanks to all for the replies.  I'll check out the clsutering Fortran
> code and see if it can be adapted to what I want to do.
And how is clustering going to restore order in these subsets of strip-mined
data, take for instance {'U','K','C','F'}?  Your output will be exactly what
you tell it to be.
--
ww

```
 0
Reply invalid (121) 7/2/2007 8:58:27 PM

```> And how is clustering going to restore order in these subsets of strip-mined
> data, take for instance {'U','K','C','F'}?  Your output will be exactly what
> you tell it to be.
> --

The point is, cluster analysis in n-dimensional space will locate the

Segmentation analysis allows the identification and marking (with a
code) of members of those groups for selecting out the members, or by
reaching ranking score levels of partial sums of given scoring
coeefficients.

Most commercial software (for surveys) will handle up to about 65k
cases (6000 is rare; the whole USA is covered by 1024 households by
A.J. Nielson; the largest I ever handled was a survey of US Veterans
with several hendreds of thousands).

Our data mining poster has no idea of what he is getting into with
100M cases.
But I DO have a sorting program for that.....

( And just what is  {'U','K','C','F'}?? )

```
 0
Reply tbwright (1098) 7/3/2007 5:08:33 AM

```"Terence" <tbwright@cantv.net> wrote in message
>> And how is clustering going to restore order in these subsets of
>> strip-mined
>> data, take for instance {'U','K','C','F'}?  Your output will be exactly
>> what
>> you tell it to be.
>
> The point is, cluster analysis in n-dimensional space will locate the
>
> Segmentation analysis allows the identification and marking (with a
> code) of members of those groups for selecting out the members, or by
> reaching ranking score levels of partial sums of given scoring
> coeefficients.
This sounds like statistics with a judge, which is not methodologically
flawed.

> Most commercial software (for surveys) will handle up to about 65k
> cases (6000 is rare; the whole USA is covered by 1024 households by
> A.J. Nielson; the largest I ever handled was a survey of US Veterans
> with several hendreds of thousands).
>
> Our data mining poster has no idea of what he is getting into with
> 100M cases.
> But I DO have a sorting program for that.....
>
> ( And just what is  {'U','K','C','F'}?? )
(You asked).  It's a clusterfuck, or any other of 4! outcomes.
--
WW

```
 0
Reply invalid (121) 7/3/2007 5:07:06 PM

```<analyst41@hotmail.com> wrote in message
> On Jul 1, 2:16 pm, "Wade Ward" <inva...@invalid.nyet> wrote:
>> <analys...@hotmail.com> wrote in message
>>
>>
>>
>>
>> > In data mining you are typically dealing with millions of rows of data
>> > and if you are talking about internet browsing data, even 100 million
>> > rows  or more.
>>
>> > Let us say that you have 30 attributes (explanatory variables) in each
>> > row plus a response variable (0 = no response 1 = response).
>>
>> > There are all kinds of analysis one can do on such a data set and I
>> > would like some advice on desigining a program to do one of them
>> > ("Decision Trees")  with Fortran.
>>
>> > We want to split the original data set into two subsets (after which
>> > the analysis can be repeated on each of the two subsets) by splitting
>> > on any one of the thirty attributes.  If it is a true-false type
>> > attribute then there is only way to split on that attribute, but in
>> > other cases there would be more choices.  The aim of the split is to
>> > create "pure" subsets so that one subset has more responders (based on
>> > count or the proportion or perhaps other measures) than the other and
>> > the top level attribute to be split on would be the one that makes the
>> > difference as high as possible.  There might be some constraints as to
>> > how big or small each subset can be.
>>
>> > If we are talking about a small number of rows then this is a pretty
>> > elementary problem as far as I can see it.  There are free and
>> > commercial packages that offer to do this - but if one were to do this
>> > from scratch in Fortran, I would appreciate the group's suggestions as
>> > to how this kind of volume of data can be handled.
>>
>> My own opinion is that so-called "data mining" is methodologically
>> unsound,
>> to say nothing of its legality.  If you are "harvesting" millions of rows
>> of
>> data, you might as well get your subsets by using the 29 dimensions of
>> compatability.  My suggestion for what to do with the data is throw it
>> out
>> before a court tells you to.
I just sorted my socks using the 29 dimensions and got all matches.  Does
this make me self-compatible?

> Let us assume that the data being "mined" can be done so legally.
>>From what I understand, when credit card companies send those letters
> outlining our "privacy rights", our usage of the card after receiving
> those letters amounts to giving them our permission to "mine" our
> purchase behavior.
Let's not.
--
mtp

```
 0
Reply invalid163 (957) 7/4/2007 12:08:23 AM

```> > ( And just what is  {'U','K','C','F'}?? )
>
> (You asked).  It's a clusterfuck, or any other of 4! outcomes.

Ah, No! It's a sorting problem, or even a permutation problem, but not
clustering!

```
 0
Reply tbwright (1098) 7/4/2007 2:49:42 AM

```> > Let us assume that the data being "mined" can be done so legally.
> >>From what I understand, when credit card companies send those letters
> > outlining our "privacy rights", our usage of the card after receiving
> > those letters amounts to giving them our permission to "mine" our
> > purchase behavior.

If you read the really small print, you'll find they "might only pass
it on to associated companies".
And guess what companies they are associated with?

```
 0
Reply tbwright (1098) 7/4/2007 2:51:45 AM

```Terence <tbwright@cantv.net> wrote:

> > > Let us assume that the data being "mined" can be done so legally.
> > >>From what I understand, when credit card companies send those letters
> > > outlining our "privacy rights", our usage of the card after receiving
> > > those letters amounts to giving them our permission to "mine" our
> > > purchase behavior.
>
> If you read the really small print, you'll find they "might only pass
> it on to associated companies".
> And guess what companies they are associated with?

You'll also find that they might do other things that they are legally
allowed to. It is often worded in such a way that you might think it is
things that they are legally required to do, but it invariably says
"allowed", not "required". They say that they can do anything with the
data that they are allowed to do, and they are allowed to do pretty much
anything with it as long as they cover it in the letter. Translation:
they reserve the right to do whatever they feel like.

My "language lawyer" skills help me in reading the fine print of
contracts and laws as well.

I'm afraid I'm wandering too far off topic,so I'll stop. Couldn't resist
the one post. (Besides, from the above short note I get the impression
it is one area where Terence and I have at least simillar perspectives.
:-))

--
Richard Maine                    | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle           |  -- Mark Twain
```
 0
Reply nospam47 (9744) 7/4/2007 3:21:58 AM

``` ...one area where Terence and I have at least simillar perspectives.
Richard Maine

(Perspective) == (same object seen from different viewpoints)?

>> obj <<

But remove the object and head-on crash?   :o)>

```
 0
Reply tbwright (1098) 7/4/2007 5:02:42 AM

```"Terence" <tbwright@cantv.net> wrote in message
> ...one area where Terence and I have at least simillar perspectives.
> Richard Maine
>
> (Perspective) == (same object seen from different viewpoints)?
>
>>> obj <<
>
> But remove the object and head-on crash?   :o)>
But that's part of the good news with geometries that are post-Gaussian.
The object defines the point of view, so almost anything works for frames of
reference to jibe.
--
WW

```
 0
Reply invalid (121) 7/4/2007 8:12:38 AM

```On Jul 3, 11:21 pm, nos...@see.signature (Richard Maine) wrote:
> Terence <tbwri...@cantv.net> wrote:
> > > > Let us assume that the data being "mined" can be done so legally.
> > > >>From what I understand, when credit card companies send those letters
> > > > outlining our "privacy rights", our usage of the card after receiving
> > > > those letters amounts to giving them our permission to "mine" our
> > > > purchase behavior.
>
> > If you read the really small print, you'll find they "might only pass
> > it on to associated companies".
> > And guess what companies they are associated with?
>
> You'll also find that they might do other things that they are legally
> allowed to. It is often worded in such a way that you might think it is
> things that they are legally required to do, but it invariably says
> "allowed", not "required". They say that they can do anything with the
> data that they are allowed to do, and they are allowed to do pretty much
> anything with it as long as they cover it in the letter. Translation:
> they reserve the right to do whatever they feel like.
>
> My "language lawyer" skills help me in reading the fine print of
> contracts and laws as well.
>
> I'm afraid I'm wandering too far off topic,so I'll stop. Couldn't resist
> the one post. (Besides, from the above short note I get the impression
> it is one area where Terence and I have at least simillar perspectives.
> :-))
>

Yup - the regulars here will be ready with a million posts on hoary
trivialities like " Are 'go to' statements good or bad ?" - but pose
a  practical problem such as repeatedly partitioning a large data set
- and one mostly gets infantile responses.

> --
> Richard Maine                    | Good judgement comes from experience;
> email: last name at domain . net | experience comes from bad judgement.
> domain: summertriangle           |  -- Mark Twain

```
 0
Reply analyst41 (233) 7/4/2007 5:00:18 PM

```analyst41@hotmail.com wrote:

> On Jul 3, 11:21 pm, nos...@see.signature (Richard Maine) wrote:
>
>>Terence <tbwri...@cantv.net> wrote:
>>
>>>>>Let us assume that the data being "mined" can be done so legally.
>>>>>>From what I understand, when credit card companies send those letters
>>>>>outlining our "privacy rights", our usage of the card after receiving
>>>>>those letters amounts to giving them our permission to "mine" our
>>>>>purchase behavior.
>>
>>>If you read the really small print, you'll find they "might only pass
>>>it on to associated companies".
>>>And guess what companies they are associated with?
>>
>>You'll also find that they might do other things that they are legally
>>allowed to. It is often worded in such a way that you might think it is
>>things that they are legally required to do, but it invariably says
>>"allowed", not "required". They say that they can do anything with the
>>data that they are allowed to do, and they are allowed to do pretty much
>>anything with it as long as they cover it in the letter. Translation:
>>they reserve the right to do whatever they feel like.
>>
>>My "language lawyer" skills help me in reading the fine print of
>>contracts and laws as well.
>>
>>I'm afraid I'm wandering too far off topic,so I'll stop. Couldn't resist
>>the one post. (Besides, from the above short note I get the impression
>>it is one area where Terence and I have at least simillar perspectives.
>>:-))
>>
>
>
> Yup - the regulars here will be ready with a million posts on hoary
> trivialities like " Are 'go to' statements good or bad ?" - but pose
> a  practical problem such as repeatedly partitioning a large data set
> - and one mostly gets infantile responses.

I consider the tone of responses mostly well reasoned and civil.  I do
agree that there seems to be a misconception of reality with regards to
over-emphasis of a goal of total portability, at times.

>
>
>>--
>>Richard Maine                    | Good judgement comes from experience;
>>email: last name at domain . net | experience comes from bad judgement.
>>domain: summertriangle           |  -- Mark Twain
>
>
>

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library:  http://www.fortranlib.com

Support the Original G95 Project:  http://www.g95.org
-OR-
Support the GNU GFortran Project:  http://gcc.gnu.org/fortran/index.html

If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
```
 0
Reply garylscott (1357) 7/4/2007 5:27:29 PM

```analyst41@hotmail.com wrote:
....

> Yup - the regulars here will be ready with a million posts on hoary
> trivialities like " Are 'go to' statements good or bad ?" - but pose
> a  practical problem such as repeatedly partitioning a large data set
> - and one mostly gets infantile responses.

Hey, it's still usenet... :)

I know there are folks doing humongous data mining of the sort you're
discussing but other than the rudiments of theory don't have enough
direct application experience in the area to provide any more on the
algorithms specifically for such large datasets.  I do recall back when
I was still reading JASA and Technometrics and such regularly I came
across some of that kind of thing but I've not even unpacked them since
the move back to the farm so no practical way to look for what I'm
thinking I recall as it wasn't in direct area of interest so wouldn't
have made it out of the original journals into the clippings files...

I guess I'd start w/ some literature searches though, if posed w/ the task.

--
```
 0
Reply none1568 (7455) 7/4/2007 5:42:23 PM

19 Replies
81 Views

Similar Articles

12/11/2013 8:37:12 AM
page loaded in 248170 ms. (1)

Similar Artilces:

Career in Data Mining
Hi all, I'm a Bachelor in Computer Engineering, and going to study Masters (major in Knowledge-Based systems). I'm quite fascinated by the concept of data-mining and knowledge-based systems, and so I'd like to pursue my career in this field. However, I'm not too sure about the opportunities available in the field. Apart from research, what else is (commonly) available? I'd be most interested in developing knowledge-based software (e.g. using neural networks), but I'd still be very interested in any of the computational side of things in this field. Another question I ...

Data mining using qualitative modelling
Hallo Can anybody suggest how to do data mining using qualitative modelling ? Regards Rob In article <1088940266.703578@nntp>, Ynot Ant <hartlebr@undergrad.ee.wits.ac.za> writes >Hallo > >Can anybody suggest how to do data mining using qualitative modelling ? > >Regards > >Rob > > Definition of Data Mining “A new discipline lying at the interface of statistics, data base technology, pattern recognition, and machine learning, and concerned with secondary analysis of large data bases in order to find previously unsuspected relationships, which ar...

Entry level Data mining jobs?
Hi , Can someone please tell me where can i find job postings n data mining. I already checked kdnuggets, they have senior level openings posted not entry level. Thanks Frist, see : http://www.kdnuggets.com/jobs/index.html Second, go see directly the business you're looking for.. About 50% of good job are never publish... So take the lead... Relating to Kdnuggets poll , Banking, CRM (Most big business), Credit Scoring and Direct Marketing/ Fundraising are the leading business who apply Datamining. Hope that helps. Bourne_Jason wrote: > Hi , > Can someone please tell me where...

Re: What exactly is DATA Mining? #24
> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Peter Flom > Sent: Friday, April 04, 2008 2:50 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: What exactly is DATA Mining? > > Sigurd Hermansen <HERMANS1@WESTAT.COM> wrote > > >A prior thread touched on your sample vs. population concern. Many > >statisticians consider almost all collections of data a > sample from a > >theoretically larger population. They will even reach out to an all > >possible worlds view if pressed on that issue...

Re: What exactly is DATA Mining? #19
I've done "Data Warehousing", and the term "Data Shaping" seems to me to = be very similar to data warehousing; which is to reformat the data, and = combine data from multiple sources, and put it into the form that will = most likely to be used by the users. What bothers me is that there has come to be a field known as "Data = Warehousing/Data Mining". In my mind, a good database/computer person = could certainly do data warehousing, but I'm more sceptical about = someone with no statistical background being able to do data mining. I don't have a de...

Data Mining Search Engine: About 1300 site
Dear all, I invite you to contribute to Data Mining Search Engine http://www.google.com/coop/cse?cx=008996274310193057962:1ggn7mmwz9i Data Mining Search Engine: Search papers, conferences, blogs, scientists home pages related to data mining Best Regards, -- Motaz K. Saad ...

"Data Mining" search engine
Hello, If you are interested in data mining (or a related field such as machine learning) feel free to use the Data Mining Search Engine: http://tinyurl.com/35c9jn Kind regards. -- Sandro Saitta http://www.dataminingblog.com http://www.website-ranking-search-engines.com ...

data mining using qualitative modelling #2
Hallo Can anybody suggest how to do data mining using qualitative modelling ? Regards Rob >Can anybody suggest how to do data mining using qualitative modelling ? > >Regards > >Rob > > > > > > > > Quantify the qualitative. That goes against the grain for many 'qualitative' researchers, but it usually produces richer, deeper insights. Laurie Ynot Ant wrote: > Hallo > > Can anybody suggest how to do data mining using qualitative modelling ? > > Regards > > Rob > > For example, have a look on fuzzy logic based...

Using GAs for data mining and anomaly detection
Greetings, I am looking for papers on using Genetic Algorithms or other evolutionary/nature inspired methods (such as GP or swarm methods) for anomaly detection in data. Thank you in advance for any help, Best, Ariel ...

New data mining technique: hidden decision trees
Hidden Decision Trees is a statistical and data mining methodology (just like logistic regression, SVM, neural networks or decision trees) to handle problems with large amounts of data, non-linearities and strongly correlated dependent variables. The technique is easy to implement in any programming language. It is more robust than decision trees or logistic regression. Implementations typically rely heavily on large, granular hash tables. No decision tree is actually built (thus the name hidden decision trees), but the final output of an hidden decision tree procedure consists of a few hund...

CFP: KDD Cup 2008 and the Workshop on Mining Medical Data
Call for Participation: KDD Cup 2008 and the Workshop on Mining Medical Data KDD Cup is the first and the oldest data mining competition, and is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Based on data provided by Siemens Medical Solutions USA, this year's KDD Cup competition focuses on the early detection of breast cancer from X-ray images of the breast. We are looking forward to an interesting competition and your participation. We particularly encourage the participation of students. There are 2 different parallel o...

Conference on Systems Analysis, Data Mining and Optimization in Biomedicine: Last Call for Participation
Conference on Systems Analysis, Data Mining and Optimization in Biomedicine - Last Call for Participation February 2-4, 2005 J. Wayne Reitz Union University of Florida Gainesville, FL http://www.ise.ufl.edu/cao/biomedicine2005 In recent years, experimental methods in biomedicine have resulted in massive amounts of data. The urgent need for efficient methods of processing and understanding this data has resulted in the rapid development of a new exciting research direction - applying interdisciplinary approaches incorporating systems analysis, data mining and optimization techniques to the ...

Fourth Summerschool on Advanced Statistics and Data Mining (Madrid, July 6th-17th, 2009)
Dear colleagues, the Polytechnical Univ. of Madrid organizes a summerschool on "Advanced Statistics and Data Mining" in Madrid between July 6th and July 17th. The summerschool comprises 18 courses divided in 2 weeks. Attendees may register in each course independently. Registration will be considered upon strict arrival order.For more information, please, visit http://www.dia.fi.upm.es/index.php?page=3Dpresentation&hl=3Des_ES or http://biocomp.cnb.csic.es/~coss/Docencia/ADAM/ADAM.htm. Best regards, Carlos Oscar *List of courses and brief description* (full descr...

Fourth Summerschool on Advanced Statistics and Data Mining (Madrid, July 6th-17th, 2009)
Dear colleagues, the Polytechnical Univ. of Madrid organizes a summerschool on "Advanced Statistics and Data Mining" in Madrid between July 6th and July 17th. The summerschool comprises 18 courses divided in 2 weeks. Attendees may register in each course independently. Registration will be considered upon strict arrival order.For more information, please, visit http://www.dia.fi.upm.es/index.php?page=3Dpresentation&hl=3Des_ES or http://biocomp.cnb.csic.es/~coss/Docencia/ADAM/ADAM.htm. Best regards, Carlos Oscar *List of courses and brief description* (full description at http:...

Call for Papers Reminder: The 2012 International Conference of Data Mining and Knowledge Engineering (ICDMKE 2012)
Call for Papers Reminder: The 2012 International Conference of Data Mining and Knowledge Engineering (ICDMKE 2012) CFP Reminder: The 2012 International Conference of Data Mining and Knowledge Engineering (ICDMKE 2012) From: International Association of Engineers (IAENG) Draft Paper Submission Deadline: 6 March, 2012 Camera-Ready papers & Registration Deadline: 31 March, 2012 ICDMKE 2012: London, U.K., 4-6 July, 2012 http://www.iaeng.org/WCE2012/ICDMKE2012.html The conference ICDMKE'12 is held under the World Congress on Engineering 2012. The WCE 2012 is organized by International Ass...

Call for Papers Reminder (extended): IAENG International Conference on Data Mining and Applications (ICDMA 2009)
CFP Extended: IAENG International Conference on Data Mining and Applications ICDMA 2009 From: International Association of Engineers The 2009 IAENG International Conference on Data Mining and Applications 18-20 March, 2009, Hong Kong http://www.iaeng.org/IMECS2009/ICDMA2009.html All submitted papers will be under peer review and accepted papers will be published in the conference proceeding (ISBN: 978-988-17012-2-0). Revised and expanded version of the selected papers may be included as book chapters in the standalone edited books under the framework of cooperation between Springer, America...

CFP with extended deadline of Mar. 24, 2011: The 2011 International Conference on Data Mining (DMIN'11), USA, July 18-21, 2011
CALL FOR PAPERS and Call For Workshop/Session Proposals DMIN'11 The 2011 International Conference on Data Mining Date and Location: July 18-21, 2011 Monte Carlo Resort & Casino, Las Vegas, Nevada, USA http://www.dmin--2011.com http://www.world-academy-of-science.org ****************************************************************** *** EXTENDED DEADLINE for Submission of Papers: March 24, 2011 *** ************************...

New type of data mining Run URL's in every combination lost infromation in every infinte order of numbers letters
New type of data mining Run URL's in every combination lost infromation in every infinte order of numbers letters - and _ .org ..net .com .tv .mil .gov both on https:// and http:// every webpage doen's have a link to another. If you could do this in the past and the future becuase the human brain only rembers so much % of there life time. But this information run with A.I on a internet crpto worm on all the web pages hard drives and video games you can find web pages and e-mails that the search engines can pick up and don't. and know every e-mail on the planet a build a ne...

New type of data mining Run URL's in every combination lost infromation in every infinte order of numbers letters
New type of data mining Run URL's in every combination lost infromation in every infinte order of numbers letters - and _ .org ..net .com .tv .mil .gov both on https:// and http:// every webpage doen's have a link to another. If you could do this in the past and the future becuase the human brain only rembers so much % of there life time. But this information run with A.I on a internet crpto worm on all the web pages hard drives and video games you can find web pages and e-mails that the search engines can pick up and don't. and know every e-mail on the planet a build a network of...

MINE-MINE! ALL MINE!
I just bought the rights to Loadstar, Commodore Mailink, Ahoy, RUN, Compute! and the Commodore brand name! OCTOBER FOOLS!!! ...