Recommendation

The recommendation dataset should be in the "User x Item" format, a table where users are listed in the rows and items (products, movies, music, etc.) are listed in the columns. Each cell in the table contains the rating (score or evaluation) given by a user to an item. If the user has not rated the item, the cell should be 0.

STRUCTURE

  1. Rows: Each row represents a user, for example:

    • User 1 → Row 1

    • User 2 → Row 2

    • User 3 → Row 3

  2. Colunas: Each column represents an item, for example:

    • Item A → Column 1

    • Item B → Column 2

    • Item C → Column 3

  3. Values: Each cell contains the score/rating that the user gave to the item:

    • A higher number means the user liked the item more (e.g., 5 on a scale from 1 to 5).

    • A lower number means the user liked the item less (e.g., 1 or 2).

    • Cells with 0 indicate that the user did not rate that item.

EXAMPLE

ItemA
ItemB
ItemC
ItemD

5

3

0

4

4

0

2

0

0

1

5

3

2

0

0

0

  • ItemA, ItemB, etc.: Each column corresponds to an item.

  • Values: Ratings given by users (e.g., on a scale from 1 to 5).

DATA IMPORT

In all DelphAI objects, it is possible to import the dataset through a CSV file or a TDataset.

  1. CSV:

    • The CSV file must follow the same format as the table above.

    • Example of a CSV file:

      ItemA,ItemB,ItemC,ItemD
      5,3,0,4
      4,0,2,0
      0,1,5,3
      2,0,0,0
  2. TDataset/Query:

    • The dataset can be stored in a relational database.

    • Example of a SELECT query on the table in the database:

      ItemA | ItemB | ItemC | ItemD
      ------|-------|-------|-------
      5     | 3     | NULL  | 4
      4     | NULL  | 2     | NULL
      NULL  | 1     | 5     | 3
      2     | NULL  | NULL  | NULL
    • Use an SQL query to select the data:

      SELECT * FROM UserRatings;

RULES AND TIPS FOR CREATING THE DATASET

  1. Rating Scale:

    • Decide the value scale for the ratings (e.g., 1 to 5 or 0 to 10). All ratings must use the same scale.

  2. No Missing Values:

    • If a user did not rate an item, the cell must contain 0.

  3. Matrix Position:

    • It's important to store the indices of the users and items used in the database to restore the real information (such as names, IDs, etc.) after prediction.

EXAMPLE DATASET

It is possible to find in the official repository the CSV file containing the "User X Item" matrix, as well as the names of the movies for each column index for testing purposes.

Last updated