Class MemoryBased

java.lang.Object
  extended by MemoryBased

public class MemoryBased
extends java.lang.Object

This is a class implementing memory-based CF algorithms, including user-based CF and item-based CF (Sarwar et al, UAI 1998).

Since:
2011. 7. 12
Version:
20110712
Author:
Joonseok Lee

Field Summary
 double defaultValue
          The default voting value, if used.
 boolean defaultVote
          Indicating whether to use default vote value.
static int INVERSE_USER_FREQUENCY
          Similarity Measure Code for Inverse User Frequency
static int ITEM_BASED
          Algorithm Code for Item-based CF
 int itemCount
          The number of items.
 SparseVector itemRateAverage
          Average of ratings for each item.
 java.lang.String itemSimilarityFileName
          The name of pre-calculated item similarity file, if it is used.
 boolean itemSimilarityPrefetch
          Indicating whether the pre-calculated item similarity file is used.
 int maxValue
          Maximum value of rating, existing in the dataset.
static int MEAN_ABS_DIFF
          Similarity Measure Code for Mean Absolute Difference (MAD)
static int MEAN_SQUARE_DIFF
          Similarity Measure Code for Mean Squared Difference (MSD)
 int minValue
          Minimum value of rating, existing in the dataset.
 int neighborSize
          The number of neighbors, used for estimation.
static int PEARSON_CORR
          Similarity Measure Code for Pearson Correlation
 int rateCount
          The total number of ratings in the rating matrix.
 SparseMatrix rateMatrix
          Rating matrix for each user (row) and item (column)
 int similarityMethod
          The method code for similarity measure.
static int SIMPLE_WEIGHTED_AVG
          Estimation Method Code for Simple Weighted Average
 SparseMatrix testMatrix
          Rating matrix for test items.
static int USER_BASED
          Algorithm Code for User-based CF
 int userCount
          The number of users.
 SparseVector userRateAverage
          Average of ratings for each user.
 java.lang.String userSimilarityFileName
          The name of pre-calculated user similarity file, if it is used.
 boolean userSimilarityPrefetch
          Indicating whether the pre-calculated user similarity file is used.
static int VECTOR_COS
          Similarity Measure Code for Vector Cosine
static int WEIGHTED_SUM
          Estimation Method Code for Weighted Sum
 
Constructor Summary
MemoryBased(SparseMatrix rm, SparseMatrix tm, SparseVector ura, SparseVector ira, int uc, int ic, int max, int min, int ns, boolean usp, boolean isp, java.lang.String usfn, java.lang.String isfn, int sim, boolean df, double dv)
          Construct a memory-based model with the given data.
 
Method Summary
 double estimation(boolean rowOriented, int activeIndex, int targetIndex, int[] ref, int refCount, double[] refWeight, int method)
          Estimate a rating based on neighborhood data.
 EvaluationMetrics evaluate(int method)
          Evaluate the designated algorithm with the given test data.
 SparseVector itemBased(int userNo, int[] testItemIndex, int k, SparseMatrix itemSim)
          Predict ratings for a given user regarding given set of items, by item-based CF algorithm.
private  EvaluationMetrics itemBasedEvaluate()
          Evaluate the item-based CF algorithm with the given test data.
private  SparseMatrix readItemSimData(int[] validationItemSet)
          Read the pre-calculated item similarity data file.
 double similarity(boolean rowOriented, SparseVector i1, SparseVector i2, double i1Avg, double i2Avg, int method)
          Calculate similarity between two given vectors.
 SparseVector userBased(int userNo, int[] testItemIndex, int k, double[] userSim)
          Predict ratings for a given user regarding given set of items, by user-based CF algorithm.
private  EvaluationMetrics userBasedEvaluate()
          Evaluate the user-based CF algorithm with the given test data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

USER_BASED

public static final int USER_BASED
Algorithm Code for User-based CF

See Also:
Constant Field Values

ITEM_BASED

public static final int ITEM_BASED
Algorithm Code for Item-based CF

See Also:
Constant Field Values

PEARSON_CORR

public static final int PEARSON_CORR
Similarity Measure Code for Pearson Correlation

See Also:
Constant Field Values

VECTOR_COS

public static final int VECTOR_COS
Similarity Measure Code for Vector Cosine

See Also:
Constant Field Values

MEAN_SQUARE_DIFF

public static final int MEAN_SQUARE_DIFF
Similarity Measure Code for Mean Squared Difference (MSD)

See Also:
Constant Field Values

MEAN_ABS_DIFF

public static final int MEAN_ABS_DIFF
Similarity Measure Code for Mean Absolute Difference (MAD)

See Also:
Constant Field Values

INVERSE_USER_FREQUENCY

public static final int INVERSE_USER_FREQUENCY
Similarity Measure Code for Inverse User Frequency

See Also:
Constant Field Values

WEIGHTED_SUM

public static final int WEIGHTED_SUM
Estimation Method Code for Weighted Sum

See Also:
Constant Field Values

SIMPLE_WEIGHTED_AVG

public static final int SIMPLE_WEIGHTED_AVG
Estimation Method Code for Simple Weighted Average

See Also:
Constant Field Values

rateMatrix

public SparseMatrix rateMatrix
Rating matrix for each user (row) and item (column)


testMatrix

public SparseMatrix testMatrix
Rating matrix for test items. Not allowed to refer during training phase.


userRateAverage

public SparseVector userRateAverage
Average of ratings for each user.


itemRateAverage

public SparseVector itemRateAverage
Average of ratings for each item.


userCount

public int userCount
The number of users.


itemCount

public int itemCount
The number of items.


rateCount

public int rateCount
The total number of ratings in the rating matrix.


maxValue

public int maxValue
Maximum value of rating, existing in the dataset.


minValue

public int minValue
Minimum value of rating, existing in the dataset.


neighborSize

public int neighborSize
The number of neighbors, used for estimation.


userSimilarityPrefetch

public boolean userSimilarityPrefetch
Indicating whether the pre-calculated user similarity file is used.


itemSimilarityPrefetch

public boolean itemSimilarityPrefetch
Indicating whether the pre-calculated item similarity file is used.


userSimilarityFileName

public java.lang.String userSimilarityFileName
The name of pre-calculated user similarity file, if it is used.


itemSimilarityFileName

public java.lang.String itemSimilarityFileName
The name of pre-calculated item similarity file, if it is used.


defaultVote

public boolean defaultVote
Indicating whether to use default vote value.


defaultValue

public double defaultValue
The default voting value, if used.


similarityMethod

public int similarityMethod
The method code for similarity measure.

Constructor Detail

MemoryBased

public MemoryBased(SparseMatrix rm,
                   SparseMatrix tm,
                   SparseVector ura,
                   SparseVector ira,
                   int uc,
                   int ic,
                   int max,
                   int min,
                   int ns,
                   boolean usp,
                   boolean isp,
                   java.lang.String usfn,
                   java.lang.String isfn,
                   int sim,
                   boolean df,
                   double dv)
Construct a memory-based model with the given data.

Parameters:
rm - The rating matrix which will be used for training.
tm - The rating matrix which will be used for testing.
ura - The average of ratings for each user.
ira - The average of ratings for each item.
uc - The number of users in the dataset.
ic - The number of items in the dataset.
max - The maximum rating value in the dataset.
min - The minimum rating value in the dataset.
ns - The neighborhood size.
usp - Whether the pre-calculated user similarity file is used.
isp - Whether the pre-calculated item similarity file is used.
usfn - The name of pre-calculated user similarity file, if it is used.
isfn - The name of pre-calculated item similarity file, if it is used.
Method Detail

userBased

public SparseVector userBased(int userNo,
                              int[] testItemIndex,
                              int k,
                              double[] userSim)
Predict ratings for a given user regarding given set of items, by user-based CF algorithm.

Parameters:
userNo - The user ID.
testItemIndex - The list of items whose ratings will be predicted.
k - The neighborhood size.
userSim - The similarity vector between the target user and all the other users.
Returns:
The predicted ratings for each item.

itemBased

public SparseVector itemBased(int userNo,
                              int[] testItemIndex,
                              int k,
                              SparseMatrix itemSim)
Predict ratings for a given user regarding given set of items, by item-based CF algorithm.

Parameters:
userNo - The user ID.
testItemIndex - The list of items whose ratings will be predicted.
k - The neighborhood size.
itemSim - The similarity matrix containing similarity between every two-item-pair.
Returns:
The predicted ratings for each item.

evaluate

public EvaluationMetrics evaluate(int method)
Evaluate the designated algorithm with the given test data.

Parameters:
method - The code of algorithm to be tested. It can have one of the following: USER_BASED or ITEM_BASED.
Returns:
The result of evaluation, such as MAE, RMSE, and rank-score.

userBasedEvaluate

private EvaluationMetrics userBasedEvaluate()
Evaluate the user-based CF algorithm with the given test data.

Returns:
The result of evaluation, such as MAE, RMSE, and rank-score.

itemBasedEvaluate

private EvaluationMetrics itemBasedEvaluate()
Evaluate the item-based CF algorithm with the given test data.

Returns:
The result of evaluation, such as MAE, RMSE, and rank-score.

similarity

public double similarity(boolean rowOriented,
                         SparseVector i1,
                         SparseVector i2,
                         double i1Avg,
                         double i2Avg,
                         int method)
Calculate similarity between two given vectors.

Parameters:
rowOriented - Use true if user-based, false if item-based.
i1 - The first vector to calculate similarity.
i2 - The second vector to calculate similarity.
i1Avg - The average of elements in the first vector.
i2Avg - The average of elements in the second vector.
method - The code of similarity measure to be used. It can be one of the following: PEARSON_CORR, VECTOR_COS, MEAN_SQUARE_DIFF, MEAN_ABS_DIFF, or INVERSE_USER_FREQUENCY.
Returns:
The similarity value between two vectors i1 and i2.

estimation

public double estimation(boolean rowOriented,
                         int activeIndex,
                         int targetIndex,
                         int[] ref,
                         int refCount,
                         double[] refWeight,
                         int method)
Estimate a rating based on neighborhood data.

Parameters:
rowOriented - Use true if user-based, false if item-based.
activeIndex - The active user index for user-based CF; The item index for item-based CF.
targetIndex - The target item index for user-based CF; The user index for item-based CF.
ref - The indices of neighborhood, which will be used for estimation.
refCount - The number of neighborhood, which will be used for estimation.
refWeight - The weight of each neighborhood.
method - The code of estimation method. It can be one of the following: WEIGHTED_SUM or SIMPLE_WEIGHTED_AVG.
Returns:
The estimated rating value.

readItemSimData

private SparseMatrix readItemSimData(int[] validationItemSet)
Read the pre-calculated item similarity data file. Make sure that the similarity file is compatible with the split file you are using, for a fair comparison.

Parameters:
validationItemSet - The list of items which will be used for validation.
Returns:
The item similarity matrix.