Hi! I am a newbie in case of artificial neural networks. Please answer simple ;-) I need to implement digit classification (OCR) with numbers 0..9 as output and approx. 3800 training images. I also have ~1800 test images. I found the OCR example shipped with Matlab and played with it. It uses a feedforward NN with backpropagation (newff). Here my questions: How can I determinate the optimal number of training images? If I use 3800 it takes some time to train. I played with subsets of 200, 400, 500, 800 images and the results where not linear increasing. Should I "re-train" the net with the same training set a few times? The train function shows its progress in a chart with the epochs on the X axis and the performance on the Y axis. What is the meaning and how does it relate to the overall "detection rate" of the net when feeding with test-data? I played with a performance goal between 1.0 and 0.5 and found 0.8 to be the best for my data-set. Is there a way to calculate the optimum? How is it related to the other parameters? The number of hidden units ;-) I tested using 10, 11, 12, 13 ... 20 hidden units and depending on the other parameters 13-16 where optimal. Does this sound reasonable? Even with (in my eyes) optimal settings I get only a success-rate of 95%. I test all 1800 test-images against the net and at the end: nummer_of_correct_results / total_test_images * 100 = XX% I know it depends on the image data but: Are there any other - general ways or most common mistakes - reasons for this "low" success-rate? And here my final question ;-) If I try a NN w/o hidden units (I already tried newp, gave me 83% success-rate) - which one would be recommendable for digit classification? I hope the questions are not tooo stupid :-) TIA, -- ---------------------------------------------------------------- ,yours Thomas Zangl - thomas@tzis.net - http://www.tzis.net/ - - Freelancer - IT Consulting & Software Development - Use Y.A.M.C! now! Get it at http://www.borg-kindberg.ac.at/yamc/

0 |

4/29/2006 7:18:34 AM

Thomas Zangl wrote: > Hi! > > I am a newbie in case of artificial neural networks. Please answer > simple ;-) > > I need to implement digit classification (OCR) with numbers 0..9 as > output and approx. 3800 training images. I also have ~1800 test images. > > I found the OCR example shipped with Matlab and played with it. It uses > a feedforward NN with backpropagation (newff). > > Here my questions: > > How can I determinate the optimal number of training images? If I use > 3800 it takes some time to train. I played with subsets of 200, 400, > 500, 800 images and the results where not linear increasing. Should I > "re-train" the net with the same training set a few times? Inputs: What and how many? Outputs: What and how many? Hidden nodes: How many and why? > The train function shows its progress in a chart with the epochs on the > X axis and the performance on the Y axis. What is the meaning Mean-Square-Error of the output nodes. >and how > does it relate to the overall "detection rate" of the net when feeding > with test-data? > There is no analytic relationship of classification error rate, a discontinuous quantity, and MSE (continuous) of class-conditional posterior probability (acheived with 0/1 targets). Cross Entropy is another continuous objective function that can be used for classification. However, it is not supported by MATLAB. > I played with a performance goal between 1.0 and 0.5 and found 0.8 to be > the best for my data-set. Is there a way to calculate the optimum? How > is it related to the other parameters? My rule of thumb is MSE/variance of targets < 0.01. > The number of hidden units ;-) I tested using 10, 11, 12, 13 ... 20 > hidden units and depending on the other parameters 13-16 where optimal. > Does this sound reasonable? Search archives using greg-heath Neq Nw H > Even with (in my eyes) optimal settings I get only a success-rate of > 95%. I test all 1800 test-images against the net and at the end: > nummer_of_correct_results / total_test_images * 100 = XX% > I know it depends on the image data but: Are there any other - general > ways or most common mistakes - reasons for this "low" success-rate? Typically, 95% sounds good. > And here my final question ;-) If I try a NN w/o hidden units (I already > tried newp, gave me 83% success-rate) - which one would be recommendable > for digit classification? newff without hidden nodes is the only one left. Don't expect it to be any better. Hope this helps. Greg > > I hope the questions are not tooo stupid :-) > > TIA, > -- > ---------------------------------------------------------------- > ,yours Thomas Zangl - thomas@tzis.net - http://www.tzis.net/ - > - Freelancer - IT Consulting & Software Development - > Use Y.A.M.C! now! Get it at http://www.borg-kindberg.ac.at/yamc/

0 |

4/30/2006 8:36:11 AM

Greg Heath wrote: Hi! > Inputs: What and how many? The image is represented as an 64 int values enabling 16 colors. After transforming they can be displayed as an image wit w=8 and h=8. So, inputs = 64 > Outputs: What and how many? Output is the classification, so numberrange is 0...9 (=10 outputs) > Hidden nodes: How many and why? As stated below we tried 1 .. 20 hidden units. >>I played with a performance goal between 1.0 and 0.5 and found 0.8 to be >>the best for my data-set. Is there a way to calculate the optimum? How >>is it related to the other parameters? > > My rule of thumb is MSE/variance of targets < 0.01. We have been using SSE as the performance function... > Search archives using > > greg-heath Neq Nw H I get ~ 10 hits, I assume you mean this part: ---- cite --- For an I-H-O MLP, the number of weight/bias unknowns is Nw = (I+1)*H+(H+1)*O = O+(I+O+1)*H and the number of training equations is Neq = Ntrn*O. ---- cite --- > Typically, 95% sounds good. Are there ways to improve this? Tweaks/tricks/pre-processing of test-data? > newff without hidden nodes is the only one left. Don't expect it to be > any better. Ok. I also tried 2 hidden layers but results did not increase. They even got a bit worse (~93%). TIA, -- ---------------------------------------------------------------- ,yours Thomas Zangl - thomas@tzis.net - http://www.tzis.net/ - - Freelancer - IT Consulting & Software Development - Use Y.A.M.C! now! Get it at http://www.borg-kindberg.ac.at/yamc/

0 |

4/30/2006 9:27:43 AM

Thomas Zangl wrote: > Greg Heath wrote: > > Hi! > > > Inputs: What and how many? > > The image is represented as an 64 int values enabling 16 colors. After > transforming they can be displayed as an image wit w=8 and h=8. > > So, inputs = 64 > > > Outputs: What and how many? > > Output is the classification, so numberrange is 0...9 (=10 outputs) > > > Hidden nodes: How many and why? > > As stated below we tried 1 .. 20 hidden units. Nw = 10+(64+10+1)*20 = 1,510 Neq = 9*Ntrn % Only 9 outputs are independent Neq >~ 10 *Nw ==> Ntrn >~ 15,100/9 ~ 1,700 Since you are not starved for training data, use 1700 or more. > >>I played with a performance goal between 1.0 and 0.5 and found 0.8 to be > >>the best for my data-set. Is there a way to calculate the optimum? How > >>is it related to the other parameters? > > > > My rule of thumb is MSE/variance of targets < 0.01. > > We have been using SSE as the performance function... MSE = SSE/(Ntrn-Nw) > > > Search archives using > > > > greg-heath Neq Nw H > > I get ~ 10 hits, I assume you mean this part: > ---- cite --- > For an I-H-O MLP, the number of weight/bias unknowns is > > Nw = (I+1)*H+(H+1)*O = O+(I+O+1)*H > > and the number of training equations is > > Neq = Ntrn*O. Acyually O should be replaced by the number of independent outputs: probably (O-1) in your case. > ---- cite --- > > > Typically, 95% sounds good. > > Are there ways to improve this? Tweaks/tricks/pre-processing of test-data? Search for ensembles and committees. > > newff without hidden nodes is the only one left. Don't expect it to be > > any better. > > Ok. I also tried 2 hidden layers but results did not increase. They even > got a bit worse (~93%). Hope this helps. Greg

0 |

5/1/2006 4:54:45 PM