Classification
The classification dataset is similar to the regression dataset; it is an organized table where each row represents a unique sample, and each column contains specific information about that sample. The purpose of this type of dataset is to enable an AI model to learn how to categorize or classify each sample into a specific group or class.
STRUCTURE
Rows
Each row is an independent sample and represents something that will be classified.
For example, if you want to create a model that identifies different types of fruit, each row in the dataset would represent a specific fruit.
Property Columns (X)
Known as input variables, used as the basis for the model to make predictions. Also referred to as "X".
These columns contain the information that helps the model make decisions.
Each column is a feature or attribute that describes the sample.
Examples of features:
Weight of a fruit (in grams).
Size of the fruit (in cm).
Smooth or rough skin (yes[1] or no[0]).
Class column(Y)
This is the output variable that indicates the class or category of each sample.
The class is what we want the model to learn to predict. In the case of fruits, the class will be the name of the type of fruit, such as "apple," "banana," or "orange."
This column can contain:
Strings: For example, "apple," "banana," "orange."
Numbers: For example, 1 = "apple," 2 = "banana," 3 = "orange."
EXAMPLE
Here is an example of a dataset for classifying fruits based on their features:
150
6.5
1
Apple
120
12
0
Banana
200
4.9
0
Orange
180
43
1
Watermelon
The feature columns (X) are: Weight, Size, Smooth Skin.
The class column (Y) is: Fruit Type.
Each line represents a specific fruit, with its characteristics (weight, size, etc.) and the type (class) it belongs to.
DATA IMPORT
In all DelphAI objects, it is possible to import the dataset through a CSV file or a TDataset.
CSV:
The CSV file must follow the same format as the table above.
Example of a CSV file:
TDataset/Query:
The dataset can be stored in a relational database.
Example of a SELECT query on the table in the database:
Use an SQL query to select the data:
RULES AND TIPS FOR CREATING THE DATASET
Data Consistency:
All rows must have the same number of columns.
All values in the columns must be in numeric format, except for the class column (the last column)
No Missing Values:
Every cell must have a value (no "gaps" are allowed).
EXAMPLE DATASET
You can find an example of the CSV file in the official repository.
Last updated