prea.main
Class Splitter

java.lang.Object
  extended by prea.main.Splitter

public class Splitter
extends java.lang.Object

This class helps to save train/test split and similarity prefetch files. This may be used for repeated experiment on same environment.

Since:
2012. 4. 20
Version:
1.1
Author:
Joonseok Lee

Field Summary
static java.lang.String[] columnName
          The list of item names, provided with the dataset.
static int itemCount
          The number of items.
static SparseVector itemRateAverage
          Average rating for each item.
static int MEAN_ABS_DIFF
          Similarity measure code for Mean Absoulte Difference.
static int MEAN_SQUARE_DIFF
          Similarity measure code for Mean Squared Difference.
static int PEARSON_CORR
          Similarity measure code for Pearson Correlation.
static SparseMatrix rateMatrix
          Rating matrix for train dataset.
static double testRatio
          Ratio of dataset which will be used for test purpose.
static int userCount
          The number of users.
static SparseVector userRateAverage
          Average rating for each user.
static int VECTOR_COS
          Similarity measure code for Vector Cosine.
 
Constructor Summary
Splitter()
           
 
Method Summary
static void main(java.lang.String[] argv)
          Main method for reading the arff file, writing split and similarity results.
private static void readArff(java.lang.String fileName)
          Read the data file in ARFF format, and store it in rating matrix.
private static double similarity(boolean rowOriented, SparseVector i1, SparseVector i2, double i1Avg, double i2Avg, int method)
          Calculate similarity between two given vectors.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

testRatio

public static double testRatio
Ratio of dataset which will be used for test purpose.


PEARSON_CORR

public static final int PEARSON_CORR
Similarity measure code for Pearson Correlation.

See Also:
Constant Field Values

VECTOR_COS

public static final int VECTOR_COS
Similarity measure code for Vector Cosine.

See Also:
Constant Field Values

MEAN_SQUARE_DIFF

public static final int MEAN_SQUARE_DIFF
Similarity measure code for Mean Squared Difference.

See Also:
Constant Field Values

MEAN_ABS_DIFF

public static final int MEAN_ABS_DIFF
Similarity measure code for Mean Absoulte Difference.

See Also:
Constant Field Values

rateMatrix

public static SparseMatrix rateMatrix
Rating matrix for train dataset.


userRateAverage

public static SparseVector userRateAverage
Average rating for each user.


itemRateAverage

public static SparseVector itemRateAverage
Average rating for each item.


columnName

public static java.lang.String[] columnName
The list of item names, provided with the dataset.


userCount

public static int userCount
The number of users.


itemCount

public static int itemCount
The number of items.

Constructor Detail

Splitter

public Splitter()
Method Detail

main

public static void main(java.lang.String[] argv)
Main method for reading the arff file, writing split and similarity results.

Parameters:
argv - The argument list. First two are required: input file name and testset ratio. Next two are optional, indicating whether it computes and prints similarity for users and items.

similarity

private static double similarity(boolean rowOriented,
                                 SparseVector i1,
                                 SparseVector i2,
                                 double i1Avg,
                                 double i2Avg,
                                 int method)
Calculate similarity between two given vectors.

Parameters:
rowOriented - Use true if user-based, false if item-based.
i1 - The first vector to calculate similarity.
i2 - The second vector to calculate similarity.
i1Avg - The average of elements in the first vector.
i2Avg - The average of elements in the second vector.
method - The code of similarity measure to be used. It can be one of the following: PEARSON_CORR, VECTOR_COS, MEAN_SQUARE_DIFF, MEAN_ABS_DIFF, or INVERSE_USER_FREQUENCY.
Returns:
The similarity value between two vectors i1 and i2.

readArff

private static void readArff(java.lang.String fileName)
Read the data file in ARFF format, and store it in rating matrix. Peripheral information such as max/min values, user/item count are also set in this method.

Parameters:
fileName - The name of data file.