Would machine learning help here?

  • Follow


Hi, I have a problem for which I'm not sure some sort of
machine learning would be appropriate.

Suppose I have a dataset consisting of thousands of tuples
and a score for each tuple which was determined through an
unknown process. Now, for a new input tuple, I want
to get a new predicted score. I have already sifted through
the data and have determined there is no clear mapping
between the tuple and the score; two tuples with the same
data may have differing scores.

For example, the tuple could consist of a person's age,
address, gender, favourite book. There has already been
a score associated for each tuple through some means.
Now I want to input myself and predict my score (rather
than running it through the scoring process, which is
not what I want).

What could help me here? I was looking at a neural
network implementation. I have never taken AI or
machine learning classes, so some pointers to websites
or books would be appreciated.

0
Reply digital_puer (190) 8/16/2007 5:08:20 PM

On Aug 16, 12:08 pm, Digital Puer <digital_p...@hotmail.com> wrote:
> Hi, I have a problem for which I'm not sure some sort of
> machine learning would be appropriate.
>
> Suppose I have a dataset consisting of thousands of tuples
> and a score for each tuple which was determined through an
> unknown process. Now, for a new input tuple, I want
> to get a new predicted score. I have already sifted through
> the data and have determined there is no clear mapping
> between the tuple and the score; two tuples with the same
> data may have differing scores.

So the score isn't based on the tuple. There's no learning
process that can help you. As far as the tuples are concerned,
the scores are inconsistent.

>
> For example, the tuple could consist of a person's age,
> address, gender, favourite book. There has already been
> a score associated for each tuple through some means.
> Now I want to input myself and predict my score (rather
> than running it through the scoring process, which is
> not what I want).
>
> What could help me here? I was looking at a neural
> network implementation. I have never taken AI or
> machine learning classes, so some pointers to websites
> or books would be appreciated.


0
Reply mensanator 8/16/2007 5:27:45 PM


On Aug 16, 10:27 am, "mensana...@aol.com" <mensana...@aol.com> wrote:
> On Aug 16, 12:08 pm, Digital Puer <digital_p...@hotmail.com> wrote:
>
> > Hi, I have a problem for which I'm not sure some sort of
> > machine learning would be appropriate.
>
> > Suppose I have a dataset consisting of thousands of tuples
> > and a score for each tuple which was determined through an
> > unknown process. Now, for a new input tuple, I want
> > to get a new predicted score. I have already sifted through
> > the data and have determined there is no clear mapping
> > between the tuple and the score; two tuples with the same
> > data may have differing scores.
>
> So the score isn't based on the tuple. There's no learning
> process that can help you. As far as the tuples are concerned,
> the scores are inconsistent.
>

I should have said that these situations are rather rare.
They are outliers.

Generally speaking, there is some sort of pattern that maps
the tuples to the scores, but I cannot easily see what it is.
It is certainly is not a linear relationship between any one or
two of the tuple data points and the scores.




>
> > For example, the tuple could consist of a person's age,
> > address, gender, favourite book. There has already been
> > a score associated for each tuple through some means.
> > Now I want to input myself and predict my score (rather
> > than running it through the scoring process, which is
> > not what I want).
>
> > What could help me here? I was looking at a neural
> > network implementation. I have never taken AI or
> > machine learning classes, so some pointers to websites
> > or books would be appreciated.


0
Reply Digital 8/16/2007 6:04:54 PM

Digital Puer wrote:
> Suppose I have a dataset consisting of thousands of tuples
> and a score for each tuple which was determined through an
> unknown process. Now, for a new input tuple, I want
> to get a new predicted score. I have already sifted through
> the data and have determined there is no clear mapping
> between the tuple and the score; two tuples with the same
> data may have differing scores.
> What could help me here? I was looking at a neural
> network implementation. I have never taken AI or
> machine learning classes, so some pointers to websites
> or books would be appreciated.

This looks like a possible application to radial basis function 
networks, assuming that your target function somehow depends smoothly on 
the input data.
You can start reading here:

http://en.wikipedia.org/wiki/Radial_basis_function

	Christian
0
Reply Christian 8/17/2007 1:32:00 PM

On Aug 17, 2:32 pm, Christian Gollwitzer <Christian.Gollwit...@uni-
bayreuth.de> wrote:
> Digital Puer wrote:
> > Suppose I have a dataset consisting of thousands of tuples
> > and a score for each tuple which was determined through an
> > unknown process. Now, for a new input tuple, I want
> > to get a new predicted score. I have already sifted through
> > the data and have determined there is no clear mapping
> > between the tuple and the score; two tuples with the same
> > data may have differing scores.
> > What could help me here? I was looking at a neural
> > network implementation. I have never taken AI or
> > machine learning classes, so some pointers to websites
> > or books would be appreciated.
>
> This looks like a possible application to radial basis function
> networks, assuming that your target function somehow depends smoothly on
> the input data.
> You can start reading here:
>
> http://en.wikipedia.org/wiki/Radial_basis_function
>

A good suggestion.

For a quick (implementation) solution the o.p. might even try nearest
neighbour. Let's say the tuples and scores (values) are doubles.

vector = double []
double nn(vector tup){
  vector[] trainTup;
  double[] trainVal;
  find nearest to tup in train -- indexNN
  return trainVal[indexNN)
}

'nearest' might use a Euclidean metric, but maybe not -- depends on
the data; and if not, the problem may get more difficult. Radial basis
functions may assume something like a Euclidean metric. The nice thing
about NN and RBF is that whilst they qualify as 'machine learning',
the 'learning' consists mainly of memorising.

You might want to do something about the conflicting data; for example
averaging conflicting scores at a specific tuple value; but maybe that
means that the tuples contain integer components?

Also, you might need to 'standardise' the data -- so that each
component of the tuple contributes the same to a distance.

The first two books by Masters would be a good starting point, plus
the Bishop book. There's a rather nice new book by Bishop, see
http://research.microsoft.com/%7Ecmbishop/PRML/.

The following also might help

http://research.microsoft.com/~minka/statlearn/glossary/

and the newsgroup comp.ai.neural-nets. I suppose I should set follow-
ups to there but I'm never ceertain of the netiquette of that.

Best regards,

Jon C.

@Book{masters-pnnr93,
  author =       "T. Masters",
  title =        "Practical Neural Network Recipes in C++",
  publisher =    "Academic Press",
  address =      "London",
  year =         "1993"
} (the original, maybe basic enough to ignore, but maybe there are
tantalising references to it from [masters-aann95])

@Book{masters-aann95,
  author =       "T. Masters",
  title =        "Advanced Algorithms for Neural Networks: a C++
sourcebook",
  publisher =    "John Wiley \& Sons",
  address =      "New York",
  year =         "1995"
} (covers PNN and GRNN (roughly RBF)


@Book{masters-sipnn,
  author =       "T. Masters",
  title =        "Signal and Image Processing with Neural Networks: C+
+
sourcebook",
  publisher =    "John Wiley \& Sons",
  address =      "New York",
  year =         "1995"
} (a lot on wavelets, Fourier transforms, texture, shape; not I have
not used it)

@Book{masters-nnh95,
  author =       "T. Masters",
  title =        "Neural, Novel & Hybrid Algorithms for Time Series
Prediction",
  publisher =    "John Wiley \& Sons",
  address =      "New York",
  year =         "1995"
}

@Book{bishop95,
  author =       "C.M. Bishop",
  title =        "Neural Networks for Pattern Recognition",
  publisher =    "Oxford University Press",
  address =      "Oxford, U.K.",
  year =         "1995"}


0
Reply jg 8/17/2007 5:17:10 PM

On Aug 16, 1:08 pm, Digital Puer <digital_p...@hotmail.com> wrote:
> Hi, I have a problem for which I'm not sure some sort ofmachine learningwould be appropriate.
>
> Suppose I have a dataset consisting of thousands of tuples
> and a score for each tuple which was determined through an
> unknown process. Now, for a new input tuple, I want
> to get a new predicted score. I have already sifted through
> the data and have determined there is no clear mapping
> between the tuple and the score; two tuples with the same
> data may have differing scores.
>
> For example, the tuple could consist of a person's age,
> address, gender, favourite book. There has already been
> a score associated for each tuple through some means.
> Now I want to input myself and predict my score (rather
> than running it through the scoring process, which is
> not what I want).
>
> What could help me here? I was looking at a neural
> network implementation. I have never taken AI ormachine learningclasses, so some pointers to websites
> or books would be appreciated.

Any of a number of modeling processes might approximate this mapping,
such as neural networks, logistic regression, k-nearest neighbors,
etc.  Which one works best for your particular problem would need to
be determined experimentally.  Obviously, the existence of
observations which are inconsistent within the available input
variables means that a perfect approximation of the original mapping
is impossible, but that does not preclude a useful approximation.

Empirical modeling has been developed under a number of labels, so you
may find what you're looking for under "data mining", "supervised
learning", "inferential statistics", "machine learning", "pattern
recognition", etc.

More help could be provided if you could explain more specifically.
Some items of interest would be:

-How long are the tuples?
-What data types (numeric, categorical, etc.) comprise the tuples?
-How many historical observations do you have?
-What modeling tools (a statistical package, writing your own in Java,
etc.) do you have at your disposal?


-Will Dwinnell
http://matlabdatamining.blogspot.com/

0
Reply Predictor 8/18/2007 11:21:49 AM

On Aug 16, 1:27 pm, "mensana...@aol.com" <mensana...@aol.com> wrote:
> On Aug 16, 12:08 pm, Digital Puer <digital_p...@hotmail.com> wrote:
>
> > Hi, I have a problem for which I'm not sure some sort of
> >machine learningwould be appropriate.
>
> > Suppose I have a dataset consisting of thousands of tuples
> > and a score for each tuple which was determined through an
> > unknown process. Now, for a new input tuple, I want
> > to get a new predicted score. I have already sifted through
> > the data and have determined there is no clear mapping
> > between the tuple and the score; two tuples with the same
> > data may have differing scores.
>
> So the score isn't based on the tuple. There's no learning
> process that can help you. As far as the tuples are concerned,
> the scores are inconsistent.

While it's true that inconsistent data prohibits a perfect
approximation, that is rarely the goal in predictive modeling.  In
some situations, only a small improvement over chance ("guessing") is
quite profitable.  Being correct only 60% of the time in predicting
red/black in roulette, for example, would be very useful.  Naturally,
the larger the proportion of inconsistent observations and the more
dispersed the outcomes are of those inconsistent observations, the
worse the upper limit on possible performance.


-Will Dwinnell
http://matlabdatamining.blogspot.com/

0
Reply Predictor 8/18/2007 11:27:33 AM

On Aug 16, 1:27 pm, "mensana...@aol.com" <mensana...@aol.com> wrote:
> On Aug 16, 12:08 pm, Digital Puer <digital_p...@hotmail.com> wrote:
>
> > Hi, I have a problem for which I'm not sure some sort of
> >machine learningwould be appropriate.
>
> > Suppose I have a dataset consisting of thousands of tuples
> > and a score for each tuple which was determined through an
> > unknown process. Now, for a new input tuple, I want
> > to get a new predicted score. I have already sifted through
> > the data and have determined there is no clear mapping
> > between the tuple and the score; two tuples with the same
> > data may have differing scores.
>
> So the score isn't based on the tuple. There's no learning
> process that can help you. As far as the tuples are concerned,
> the scores are inconsistent.

While it's true that inconsistent data prohibits a perfect
approximation, that is rarely the goal in predictive modeling.  In
some situations, only a small improvement over chance ("guessing") is
quite profitable.  Being correct only 60% of the time in predicting
red/black in roulette, for example, would be very useful.  Naturally,
the larger the proportion of inconsistent observations and the more
dispersed the outcomes are of those inconsistent observations, the
worse the upper limit on possible performance.


-Will Dwinnell
http://matlabdatamining.blogspot.com/

0
Reply Predictor 8/18/2007 11:28:35 AM

7 Replies
177 Views

(page loaded in 0.113 seconds)

Similiar Articles:













7/14/2012 2:17:04 PM


Reply: