Regression

Regression methods are ways of using machine learning, mathematics, and statistics to find relationships between things. The goal is to make predictions of numbers, such as estimating the price of something or predicting a future value.

For example, imagine you want to predict the price of a house. To do this, you can use information such as the size of the house, the number of bedrooms, and its location. With this data, regression methods help calculate the most likely value of the house.

EXAMPLE OF USE

It is possible to replicate the example above with just a few lines. To generate the file with all the necessary configurations for later use:

uses
  UEasyAI;
  
procedure TrainModel;
var
  vEasyAIClass: TEasyAIRegression;
begin
  vEasyAIClass := TEasyAIRegression.Create;
  try
    vEasyAIClass.LoadDataset('C:\DelphAI\DelphAI\Datasets\Housing Price.csv');
    vEasyAIClass.FindBestModel('C:\Example\trainedFile-Housing-price');
  finally
    vEasyAIClass.Free;
  end;
end;

When the "FindBestModel" procedure is completed, the file will be created in the directory specified as a parameter and an alert message will be displayed, advising whether or not it is necessary to load the database again before use.

To use the model from the generated file:

uses
  UEasyAI;
  
procedure ShowPredictedHousesPrice;
var
  vEasyAIClass: TEasyAIRegression;
begin
  vEasyAIClass := TEasyAIRegression.Create;
  try
    vEasyAIClass.LoadDataset('C:\DelphAI\DelphAI\Datasets\Housing Price.csv'); // Só é necessário se foi alertado que o melhor modelo precisa do dataset.
    vEasyAIClass.LoadFromFile('C:\Example\trainedFile-Housing-price');
    // To predict a house with the same properties the model was trained on:
    // Square_Footage = 1
    // Num_Bedrooms = 1
    // Num_Bathrooms = 1
    // Year_Built = 1964
    // Lot_Size = 3.1047807561601664
    // Garage_Size = 0
    // Neighborhood_Quality = 4
    ShowMessage('House price: ' + FormatCurr('##0.00', vEasyAIClass.Predict([2459, 1, 1, 1964, 3.1047807561601664, 0, 4])));
  finally
    vEasyAIClass.Free;
  end;
end;

The database structure can be found here.

CLASSES AND METHODS GUIDE

TEasyAIRegression

  • constructor Create; : creates the object.

  • procedure LoadDataset(aDataSet : String; aHasHeader : Boolean = True); : loads the database for training or use in models that require it.

    • aDataSet : CSV file path.

    • aHasHeader : indicates whether the file has a header.

  • procedure LoadDataset(aDataSet : TDataSet); : loads the database for training or use in models that require it.

    • aDataSet : TDataSet object that contains the data that will be used for training.

  • procedure FindBestModel(aPathResultFile: String; aMode : TEasyTestingMode = tmStandard; aMaxThreads : Integer = 0; aCsvResultModels : String = ''; aLogFile : String = ''); : tests multiple model options to find and prepare the best one for use in predictions.

    • aPathResultFile : path where the configurations of the found model are saved; before use, just load it to prepare the entire object for predictions.

    • aMode : search mode that will be performed, having 3 options:

      • tmFast : tests only the most likely best models. It is the fastest mode;

      • tmStandard : tests the most likely best models, as well as exploring more extreme parameters;

      • tmExtensive : tests a large number of models, including those from other methods. It's the slowest mode.

    • aMaxThreads : optional, the maximum number of threads to be used simultaneously. If set to 0, it will use the number of threads available in the CPU.

    • aCsvResultModels : optional, the path where a CSV file will be saved containing each tested method along with its results.

    • aLogFile : optional, the path of the log file.

  • procedure LoadFromFile(aPath : String); : loads the file generated in "FindBestModel" to create and prepare the use of the best model found.

  • function Predict(aSample : TArray<Double>) : Double; : predicts the value of the sample.

    • aSample : sample to be analyzed. For example, if the model was trained with 5 property columns + 1 result column, now only the array with the values of the 5 properties should be passed.

Last updated