Classification

The classification dataset is similar to the regression dataset; it is an organized table where each row represents a unique sample, and each column contains specific information about that sample. The purpose of this type of dataset is to enable an AI model to learn how to categorize or classify each sample into a specific group or class.

STRUCTURE

Rows
- Each row is an independent sample and represents something that will be classified.
- For example, if you want to create a model that identifies different types of fruit, each row in the dataset would represent a specific fruit.
Property Columns (X)
- Known as input variables, used as the basis for the model to make predictions. Also referred to as "X".
- These columns contain the information that helps the model make decisions.
- Each column is a feature or attribute that describes the sample.
- Examples of features:
  - Weight of a fruit (in grams).
  - Size of the fruit (in cm).
  - Smooth or rough skin (yes[1] or no[0]).
Class column(Y)
- This is the output variable that indicates the class or category of each sample.
- The class is what we want the model to learn to predict. In the case of fruits, the class will be the name of the type of fruit, such as "apple," "banana," or "orange."
- This column can contain:
  - Strings: For example, "apple," "banana," "orange."
  - Numbers: For example, 1 = "apple," 2 = "banana," 3 = "orange."

EXAMPLE

Here is an example of a dataset for classifying fruits based on their features:

Weight

Size

Smooth skin

Fruit Type

150

6.5

Apple

120

Banana

200

4.9

Orange

180

Watermelon

The feature columns (X) are: Weight, Size, Smooth Skin.
The class column (Y) is: Fruit Type.

Each line represents a specific fruit, with its characteristics (weight, size, etc.) and the type (class) it belongs to.

DATA IMPORT

In all DelphAI objects, it is possible to import the dataset through a CSV file or a TDataset.

CSV:

The CSV file must follow the same format as the table above.

Example of a CSV file:

ParamA,ParamB,ParamC,Result
150,6.5,1,Maçã
120,12,0,Banana
200,4.9,0,Laranja
180,29,0,Melancia

TDataset/Query:

The dataset can be stored in a relational database.

Example of a SELECT query on the table in the database:

ParamA | ParamB | ParamC | Result
-------|--------|--------|--------
150    | 6.5    | 1      | Maçã
120    | 12     | 0      | Banana
200    | 4.9    | 0      | Laranja
180    | 29     | 0      | Melancia

Use an SQL query to select the data:
```
SELECT * FROM Fruits;
```

RULES AND TIPS FOR CREATING THE DATASET

Data Consistency:
- All rows must have the same number of columns.
- All values in the columns must be in numeric format, except for the class column (the last column)
No Missing Values:
- Every cell must have a value (no "gaps" are allowed).

EXAMPLE DATASET

You can find an example of the CSV file in the official repository.

PreviousRegression NextRecommendation

Last updated 6 months ago