Class OneClassTest

java.lang.Object
  extended by OneClassTest

public class OneClassTest
extends java.lang.Object

This is an One-class Collaborative Filtering Test main file.

Since:
2011. 7. 12
Version:
20110712
Author:
Joonseok Lee

Field Summary
static java.lang.String[] columnName
          The list of item names, provided with the dataset.
static java.lang.String dataFileName
          The name of data file used for test.
static int ITEM_ORIENTED
           
static boolean ITEM_SIM_PREFETCH
          Indicating whether loading pre-calculated item similarity file or not
static int itemCount
          The number of items.
static SparseVector itemRateAverage
          Average of ratings for each item.
static double[][] itemSimilarity
          Item Similarity
static java.lang.String itemSimilarityFileName
          The name of pre-calculated item similarity file, if it is used.
static int MAX_DIFF
           
static int maxValue
          Maximum value of rating, existing in the dataset.
static int minValue
          Minimum value of rating, existing in the dataset.
static int NEIGHBOR_SIZE
          The number of similar users/items to be used for estimation in neighborhood-based methods.
static SparseMatrix rateMatrix
          Rating matrix for each user (row) and item (column)
static int SCORE_REFLECT
           
static boolean SPLIT_PREFETCH
          Indicating whether loading split file or not
static double TEST_RATIO
          Proportion of items which will be used for test purpose.
static SparseMatrix testMatrix
          Rating matrix for test items.
static int THRESHOLD_ROULETTE
           
static int THRESHOLD_UNIFORM
           
static int UNIFORM_RANDOM
           
static int USER_ORIENTED
           
static boolean USER_SIM_PREFETCH
          Indicating whether loading pre-calculated user similarity file or not
static int userCount
          The number of users.
static SparseVector userRateAverage
          Average of ratings for each user.
 
Constructor Summary
OneClassTest()
           
 
Method Summary
private static void calculateAverage()
          Calculate average of ratings for each user and each item.
static void constantModelTest(int method)
          Test interface for fast Constant Model baselines.
static void fastNPCATest(double validationRatio, int maxIter)
          Test interface for fast NPCA.
private static double getItemSuitability(int[] itemList, int targetItem)
          Calculate the suitability of an item for a specific user, assuming that the user likes items in the itemList.
private static void hideRatingsAbs(double min, double max)
          Remove ratings from the rateMatrix, between min and max, inclusive.
private static void hideRatingsTic(double tic)
           
private static int locateInProbDist(double[] distribution, double value, int min, int max)
          Return the index of given data from a cumulative probability distribution.
static void main(java.lang.String[] argv)
          Test examples for every algorithm.
static void matrixFactorizationTest(int method, int features, double learningRate, double regularizer, double momentum, int maxIter)
          Test interface for fast Matrix-Factorization-based algorithms.
static void memoryBasedTest(int k, int method, int similarityMethod, boolean defaultUse, double defaultValue)
          Test interface for Memory-based algorithms.
static void rankBasedTest(double kernelWidth)
          Test interface for Mingxuan's Rank-based algorithm.
private static void readArff(java.lang.String fileName)
          Read the data file in ARFF format, and store it in rating matrix.
private static void readItemSimData()
          Read the pre-calculated item similarity data file.
private static void readSplitData(java.lang.String fileName)
          Split the rating matrix into train and test set, by given split data file.
private static void recoverTestItems()
          Items in testMatrix are moved to original rateMatrix.
private static void sampleNegative(int sampleCount, int method)
           
static void slopeOneTest()
          Test interface for slope-one algorithm.
private static void split(double testRatio)
          Items which will be used for test purpose are moved from rateMatrix to testMatrix.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TEST_RATIO

public static final double TEST_RATIO
Proportion of items which will be used for test purpose.

See Also:
Constant Field Values

NEIGHBOR_SIZE

public static final int NEIGHBOR_SIZE
The number of similar users/items to be used for estimation in neighborhood-based methods.

See Also:
Constant Field Values

SPLIT_PREFETCH

public static boolean SPLIT_PREFETCH
Indicating whether loading split file or not


USER_SIM_PREFETCH

public static boolean USER_SIM_PREFETCH
Indicating whether loading pre-calculated user similarity file or not


ITEM_SIM_PREFETCH

public static boolean ITEM_SIM_PREFETCH
Indicating whether loading pre-calculated item similarity file or not


UNIFORM_RANDOM

public static int UNIFORM_RANDOM

USER_ORIENTED

public static int USER_ORIENTED

ITEM_ORIENTED

public static int ITEM_ORIENTED

MAX_DIFF

public static int MAX_DIFF

THRESHOLD_UNIFORM

public static int THRESHOLD_UNIFORM

THRESHOLD_ROULETTE

public static int THRESHOLD_ROULETTE

SCORE_REFLECT

public static int SCORE_REFLECT

rateMatrix

public static SparseMatrix rateMatrix
Rating matrix for each user (row) and item (column)


testMatrix

public static SparseMatrix testMatrix
Rating matrix for test items. Not allowed to refer during training and validation phase.


userRateAverage

public static SparseVector userRateAverage
Average of ratings for each user.


itemRateAverage

public static SparseVector itemRateAverage
Average of ratings for each item.


userCount

public static int userCount
The number of users.


itemCount

public static int itemCount
The number of items.


maxValue

public static int maxValue
Maximum value of rating, existing in the dataset.


minValue

public static int minValue
Minimum value of rating, existing in the dataset.


columnName

public static java.lang.String[] columnName
The list of item names, provided with the dataset.


dataFileName

public static java.lang.String dataFileName
The name of data file used for test.


itemSimilarityFileName

public static java.lang.String itemSimilarityFileName
The name of pre-calculated item similarity file, if it is used.


itemSimilarity

public static double[][] itemSimilarity
Item Similarity

Constructor Detail

OneClassTest

public OneClassTest()
Method Detail

main

public static void main(java.lang.String[] argv)
Test examples for every algorithm. Also includes parsing the given parameters.

Parameters:
argv - The argument list. Each element is separated by an empty space. First element is the data file name, and second one is the algorithm name. Third and later includes parameters for the chosen algorithm. Please refer to our web site for detailed syntax.

constantModelTest

public static void constantModelTest(int method)
Test interface for fast Constant Model baselines. Print MAE, RMSE, and rank-based half-life score for given test data.

Parameters:
method - The code for algorithm to be tested.

memoryBasedTest

public static void memoryBasedTest(int k,
                                   int method,
                                   int similarityMethod,
                                   boolean defaultUse,
                                   double defaultValue)
Test interface for Memory-based algorithms. Print MAE, RMSE, and rank-based half-life score for given test data.

Parameters:
k - The neighborhood size.
method - The code for algorithm to be tested.

slopeOneTest

public static void slopeOneTest()
Test interface for slope-one algorithm. Builds a model with given data, and print MAE, RMSE, and rank-based half-life score.


matrixFactorizationTest

public static void matrixFactorizationTest(int method,
                                           int features,
                                           double learningRate,
                                           double regularizer,
                                           double momentum,
                                           int maxIter)
Test interface for fast Matrix-Factorization-based algorithms. Builds a model with given data, and print MAE, RMSE, and rank-based half-life score.

Parameters:
method - The code for algorithm to be tested.
features - The number of features in low-rank matrix representation.
learningRate - The learning rate for gradient-descent.
regularizer - The regularization parameter.
momentum - The momentum parameter.
maxIter - Maximum The number of iteration.

fastNPCATest

public static void fastNPCATest(double validationRatio,
                                int maxIter)
Test interface for fast NPCA. Builds a model with given data, and print MAE, RMSE, and rank-based half-life score.

Parameters:
validationRatio - Fraction of items which will be used for validation.
maxIter - maximum The number of iteration.

rankBasedTest

public static void rankBasedTest(double kernelWidth)
Test interface for Mingxuan's Rank-based algorithm. Builds a model with given data, and print MAE, RMSE, and rank-based half-life score.

Parameters:
kernelWidth - The kernel bandwidth.

split

private static void split(double testRatio)
Items which will be used for test purpose are moved from rateMatrix to testMatrix.

Parameters:
testRatio - proportion of items which will be used for test purpose.

recoverTestItems

private static void recoverTestItems()
Items in testMatrix are moved to original rateMatrix.


hideRatingsAbs

private static void hideRatingsAbs(double min,
                                   double max)
Remove ratings from the rateMatrix, between min and max, inclusive.

Parameters:
min - Minimum rating value to be deleted.
max - Maximum rating value to be deleted.

hideRatingsTic

private static void hideRatingsTic(double tic)

sampleNegative

private static void sampleNegative(int sampleCount,
                                   int method)

locateInProbDist

private static int locateInProbDist(double[] distribution,
                                    double value,
                                    int min,
                                    int max)
Return the index of given data from a cumulative probability distribution.

Parameters:
distribution - The cumulative probability distribution
value - The data value to locate
min - The minimum index to deal with
max - The maximum index to deal with
Returns:
The index in the probability distribution

getItemSuitability

private static double getItemSuitability(int[] itemList,
                                         int targetItem)
Calculate the suitability of an item for a specific user, assuming that the user likes items in the itemList.

Parameters:
itemList - The list of items which are liked by an user.
targetItem - The target item which wants to know suitability for the user.
Returns:
The suitability ranging from 0 (not relevant) to 1 (perfectly suited).

readItemSimData

private static void readItemSimData()
Read the pre-calculated item similarity data file. Make sure that the similarity file is compatible with the split file you are using, for a fair comparison.


readArff

private static void readArff(java.lang.String fileName)
Read the data file in ARFF format, and store it in rating matrix. Peripheral information such as max/min values, user/item count are also set in this method.

Parameters:
fileName - The name of data file.

readSplitData

private static void readSplitData(java.lang.String fileName)
Split the rating matrix into train and test set, by given split data file.

Parameters:
fileName - the name of split data file.

calculateAverage

private static void calculateAverage()
Calculate average of ratings for each user and each item. Calculated results are stored in two arrays, userRateAverage and itemRateAverage.