:: HOME :: GET EMAIL UPDATES :: EMAIL :: | |
2007-05-29 9:52 PM Is the Netflix Prize winnable? (2) Read/Post Comments (0) |
Some more thoughts on the subject I wanted to note down before I forgot them, as I'm keen in giving this experiment a go soonish.
The SVD should probably be run with the regularisation fact (k) set to 0. Actually it migth not matter but you get the maximum amount of overfitting with k=0 and the RMSE becomes less overfit as we supply more data to the algorithm. We also need to keep the size of the feature arrays (vectors if you prefer) the same size for each data subset, so this means we need to keep the number of movies and customers constant. This presents a problem for the smaller sample sets were we have, say 0.78m ratings spread out over 0.48m customers, that's 1.625 ratings/customer on average, not nearly enough to get a good idea of a customer's preferences. This probably means I'm going to end up with a few less sample points on my RMSE convergence plot, say 6 points now instead of 8 (loose the two smallest data sets). It's also worth noting at this point that Netflix's 1 billion results are of course going to be made up largely of ratings on customers not in the netflix data set. This means they have an even larger matrix than the prize data set, but not necessarily a matrix that is significantly less sparse. Read/Post Comments (0) Previous Entry :: Next Entry Back to Top |
:: HOME :: GET EMAIL UPDATES :: EMAIL :: |
© 2001-2010 JournalScape.com. All rights reserved. All content rights reserved by the author. custsupport@journalscape.com |