Tutorial

Table of Contents #

How to Setup and Complie
- Setting up with Eclipse
- Compiling on Command Line
How to Run
Using in Other Tools
- Using in Matlab

How to Setup and Compile #

Note that this tutorial is based on latest version (Helios Service Release 1, Build id: 20100917-0705) of Eclipse IDE. If you are using older version, you may see somewhat different screen. Please update your Eclipse in that case.

Setting up with Eclipse

Download Eclipse Project file.
Run Eclipse, and select "File > Import". You will see the following window.
Choose "Existing Projects into Workspace", then press "Next" button. You will see the following window.
Choose a radio button "Select archive file:", and select the zip file downloaded by step 1.
Make sure that both "Prea" and "UJMP_src" in the "Projects:" box are checked.
Press the button "Finish". Then, you can see that the project is added in your workspace.
(If you do not use auto-compile option) Select "Project > Build Project" to compile the project.
Download dataset from Download section and unzip them in the Prea directory.

Compiling on Command Line

Download source code file.
Unzip the downloaded file in a directory, and locate yourself in the same directory.
Run the next command. This automatically compiles all files needed to run this program.
> javac prea/main/Prea.java
> javac prea/main/PreaRank.java
* Note that we have two files containing main method. RankBasedSVD and PairedGlobalLLORMA are executed with PreaRank, and all other methods are done with Prea.

How to Run #

Running the Program

Make sure that you successfully compiled the source code, or have executables in the directory.
First, you can run the program without specifying anything. In this case, the program runs all algorithms with a default demo data file "netflix_3m1k", bundled with the install file.

> java prea/main/Prea

For more advanced use, you can specify three kinds of arguments. More than one argument can be used simultaneously, independent to each other. For an argument which was not specified, the default value in the table below would be applied. Detailed explanation about data split can be found in Documentation section.

Options	Syntax	Default value	Example
Input file	-f [File Name]	netflix_3m1k	java Prea -f movieLens_1M
Data Split	-s simple [Test ratio]	simple, 0.2	java Prea -s simple 0.2
	-s pred [Split file name]		java Prea -s pred netflix_3m1k_split.txt
	-s kcv [Fold count]		java Prea -s kcv 5
	-s rank [Train point per user]*		java Prea -s rank 20
Algorithm	-a [Algorithm code. See below.]	(Run all.)	java Prea -a regsvd [Parameters of the algorithm. See below.]

* This number contains the number of data points used for validation set.

As mentioned above, you can apply more than one option such like

> java prea/main/Prea -s kcv 5 -f movieLens_100K

[data file]: the name of data file, without the file extension.
Ex) netflix_3m1k for netflix_3m1k.arff.

RankBasedSVD and PairedGlobalLLORMA are executed with PreaRank. For example,

> java prea/main/PreaRank -s rank 20 -f movieLens_100K

The table below lists algorithm code and parameters.

Algorithm	Keyword	Parameter List Example
Constant	const	java prea/main/Prea -a const
Overall Average	allavg	java prea/main/Prea -a allavg
User Average	useravg	java prea/main/Prea -a useravg
Item Average	itemavg	java prea/main/Prea -a itemavg
Random	random	java prea/main/Prea -a random
User-based CF	userbased	java prea/main/Prea -a userbased [neighbor size(k)] [similarity method]* [(optional) default] [default value] [(optional) usersim]** [user similarity prefetch file name]
		java prea/main/Prea -a userbased 50 pearson java prea/main/Prea -a userbased 50 pearson default 3.0 java prea/main/Prea -s pred netflix_3m1k_split.txt -a userbased 50 mad default 3.0 usersim netflix_3m1k_userSim.txt
Item-based CF	itembased	java prea/main/Prea -a itembased [neighbor size(k)] [similarity method]* [(optional) default] [default value] [(optional) itemsim]** [item similarity prefetch file name]
		java prea/main/Prea -a itembased 50 cosine java prea/main/Prea -a itembased 50 cosine default 3.0 java prea/main/Prea -s pred netflix_3m1k_split.txt -a itembased 50 msd default 3.0 itemsim movielens_1M_itemSim.txt
Slope One	slopeone	java prea/main/Prea -a slopeone
Regularized SVD	regsvd	java prea/main/Prea -a regsvd [feature count] [learning rate] [regularizer] [max iteration]
		java prea/main/Prea -a regsvd 10 0.005 0.1 200
NMF	nmf	java prea/main/Prea -a nmf [feature count] [regularizer] [max iteration]
		java prea/main/Prea -a nmf 100 0.0001 5
PMF	pmf	java prea/main/Prea -a pmf [feature count] [learning rate] [regularizer] [momentum] [max iteration]
		java prea/main/Prea -a pmf 10 50 0.4 0.8 20
Bayesian PMF	bpmf	java prea/main/Prea -a bpmf [feature count] [max iteration]
		java prea/main/Prea -a bpmf 2 20
Nonlinear PMF	nlpmf	java prea/main/Prea -a nlpmf [feature count] [learning rate] [momentum] [max iteration] [kernel inverse width] [kernel variance RBF] [kernel variance bias] [kernel variance white]
		java prea/main/Prea -a nlpmf 10 0.0001 0.9 2 1 1 0.11 5
Fast NPCA	npca	java prea/main/Prea -a npca [validation ratio] [max iteration]
		java prea/main/Prea -a npca 0.15 5
Rank-based CF	rank	java prea/main/Prea -a rank [kernel width]
		java prea/main/Prea -a rank 1.0
Singleton Global LLORMA	sgllorma	java prea/main/Prea -a sgllorma [feature count] [learning rate] [regularizer] [max iteration] [model count]
		java prea/main/Prea -a sgllorma 5 0.1 0.001 100 50
Singleton Parallel LLORMA	spllorma	java prea/main/Prea -a spllorma [feature count] [learning rate] [regularizer] [max iteration] [model count] [thread count]
		java prea/main/Prea -a spllorma 10 0.01 0.001 100 50 8
Rank-based SVD	ranksvd	java prea/main/Prea -a ranksvd [loss code]*** [local model rank] [learning rate]
		java prea/main/Prea -a ranksvd log_mult 1 1500
Paired Global LLORMA	pgllorma	java prea/main/Prea -a pgllorma [loss code]*** [local model count] [local model rank] [learning rate]
		java prea/main/Prea -a pgllorma log_mult 5 1 1500

* Similarity method should be one of {pearson, cosine, msd, mad, invuserfreq}.
** Similarity prefetch file should be used together with corresponding train/test split file for accurate test.
*** log_mult

, log_add

,
exp_mult

, exp_add

,
hinge_mult

, hinge_add

,
abs

, sqr

,
expreg

,
l1reg

How to Read Result

The experimental result is shown in console with plain text. First, it shows properties of the data file, including file name, user/item count, and rating density. Each row represents each algorithm, and each column lists various evaluation metrics. Note that for HLU and NDCG, higher values mean better performance, while lower values are better for all other metrics. The following figure shows an example of execution with Netflix Tiny set.

* Note that exact values can be different for each run since train/test split is randomized as well as some algorithms have randomized component.

Tip: Increasing Heap Memory

As the data file grows bigger, this program requires lots of memory space. You may encounter the following message:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

In this case, the following Java option can be useful:

> java -Xms1024m -Xmx2048m Prea ...

This option means heap size is initially set to 1024MB (1GB), and can be expanded upto 2048MB (2GB). You may set both size equally.

Using in Other Tools #

PREA provides wrapper functions for other popular tools such as Matlab. With these, it is possible to conduct an experiment by calling PREA in other tools.

Using in Matlab

Download source code file, and unzip it in a directory.
Locate the Matlab current directory to the directory. In GUI version, you can use the following red box. Make sure that there are all executable files and two matlab files (matlab2arff.m, run.m).
Load the data file in Matlab. The matrix should in the format of users (row) by items (column). If it is reversed, it is necessary to transpose it at this point.

Run the wrapper function with the following formats.

The simplest format specifies only matrix name in Matlab. In this case, it runs constant model (Median) as a default. For example,

prea(movielens);
You can specify the output ARFF file from the Matlab matrix, if you want to use it again later. For example, the following code will create "movielens.arff" file in the directory.

prea(movielens, 'movielens');
With another parameter, you can specify a CF algorithm keyword to be used. For example,
prea(movielens, 'movielens', 'itembased');
prea(movielens, 'movielens', 'bpmf');
prea(movielens, 'movielens', 'npca');

Lastly, you can specify arguments for each algorithm. This takes two steps; first, you should make "options" with parameters you want to specify, then, add it as a parameter to "run" function. (Without specifying these parameters just as the previous case, default parameters are used.)

options = struct('feature_count', 5, 'maxiter', 20);
prea(movielens, 'movielens', 'bpmf');

The list of available options for each algorithm is as follows:

Algorithm	Keyword	Parameter List
Constant	const	None
Overall Average	allavg	None
User Average	useravg	None
Item Average	itemavg	None
Random	random	None
User-based CF	userbased	neighbor_size
Item-based CF	itembased	neighbor_size
Slope One	slopeone	None
Regularized SVD	regsvd	feature_count, learning_rate, regularizer, maxiter
NMF	nmf	feature_count, learning_rate, maxiter
PMF	pmf	feature_count, learning_rate, regularizer, momentum, maxiter
Bayesian PMF	bpmf	feature_count, maxiter
Nonlinear PMF	nlpmf	feature_count, learning_rate, momentum, maxiter, kernel_inverse_width, kernel_var_rbf, kernel_var_bias, kernel_var_white
Fast NPCA	npca	validation_ratio, maxiter
Rank-based CF	rank	kernel_width
Singleton Global LLORMA	sgllorma	feature_count, learning_rate, regularizer, maxiter, maxmodel
Singleton Parallel LLORMA	spllorma	feature_count, learning_rate, regularizer, maxiter, maxmodel, thread

The output from the toolkit is shown in Matlab screen.