prea.data.splitter
Class DataSplitManager

java.lang.Object
  extended by prea.data.splitter.DataSplitManager
Direct Known Subclasses:
KfoldCrossValidation, PredefinedSplit, SimpleSplit

public abstract class DataSplitManager
extends java.lang.Object

This class implements data split functions, which are common in individual model selection methods.

Since:
2012. 4. 20
Version:
1.1
Author:
Joonseok Lee

Field Summary
protected  int itemCount
          The number of items.
protected static SparseVector itemRateAverage
          Average of ratings for each item.
static int K_FOLD_CROSS_VALIDATION
          Evaluation with K-fold cross-validation.
 int maxValue
          Maximum value of rating, existing in the dataset.
 int minValue
          Minimum value of rating, existing in the dataset.
static int PREDEFINED_SPLIT
          Use predefined split file.
protected  SparseMatrix rateMatrix
          Rating matrix for each user (row) and item (column)
static int SIMPLE_SPLIT
          Randomly split train/test set.
protected  SparseMatrix testMatrix
          Rating matrix for test items.
protected  int userCount
          The number of users.
protected static SparseVector userRateAverage
          Average of ratings for each user.
 
Constructor Summary
DataSplitManager(SparseMatrix originalMatrix, int max, int min)
          Construct a data set manager.
 
Method Summary
protected  void calculateAverage(double defaultValue)
          Calculate average of ratings for each user and each item.
 SparseVector getItemRateAverage()
          Getter method for average of each item's rating.
 SparseMatrix getTestMatrix()
          Getter method for rating matrix with test data.
 SparseVector getUserRateAverage()
          Getter method for average of each user's rating.
protected  void recoverTestItems()
          Items in testMatrix are moved back to original rateMatrix.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SIMPLE_SPLIT

public static final int SIMPLE_SPLIT
Randomly split train/test set.

See Also:
Constant Field Values

PREDEFINED_SPLIT

public static final int PREDEFINED_SPLIT
Use predefined split file.

See Also:
Constant Field Values

K_FOLD_CROSS_VALIDATION

public static final int K_FOLD_CROSS_VALIDATION
Evaluation with K-fold cross-validation.

See Also:
Constant Field Values

rateMatrix

protected SparseMatrix rateMatrix
Rating matrix for each user (row) and item (column)


testMatrix

protected SparseMatrix testMatrix
Rating matrix for test items. Not allowed to refer during training and validation phase.


userCount

protected int userCount
The number of users.


itemCount

protected int itemCount
The number of items.


maxValue

public int maxValue
Maximum value of rating, existing in the dataset.


minValue

public int minValue
Minimum value of rating, existing in the dataset.


userRateAverage

protected static SparseVector userRateAverage
Average of ratings for each user.


itemRateAverage

protected static SparseVector itemRateAverage
Average of ratings for each item.

Constructor Detail

DataSplitManager

public DataSplitManager(SparseMatrix originalMatrix,
                        int max,
                        int min)
Construct a data set manager.

Method Detail

recoverTestItems

protected void recoverTestItems()
Items in testMatrix are moved back to original rateMatrix.


calculateAverage

protected void calculateAverage(double defaultValue)
Calculate average of ratings for each user and each item. Calculated results are stored in two arrays, userRateAverage and itemRateAverage. This method should be called after splitting train and test data.


getTestMatrix

public SparseMatrix getTestMatrix()
Getter method for rating matrix with test data.

Returns:
Rating matrix with test data.

getUserRateAverage

public SparseVector getUserRateAverage()
Getter method for average of each user's rating.

Returns:
A sparse vector with each user's rating.

getItemRateAverage

public SparseVector getItemRateAverage()
Getter method for average of each item's rating.

Returns:
A sparse vector with each item's rating.