Definition: Classification is a type of predictive modeling that aims to categorize or assign observations or instances to a predefined set of classes or categories. It's used to predict the category or class of a new dataset, based on training from historical data.

Example: Email spam detection, sentiment analysis, disease diagnosis (e.g., classifying a patient as having a particular disease or not based on symptoms).


1. Select Independent Columns (X):

  • Identify and choose the independent columns in your dataset.

  • These columns, often referred to as features or predictors, are the variables that will be used to predict the dependent variable(Y).

2. Select Dependent Column (Y):

  • Identify the dependent variable or target variable (Y) that you aim to predict.

  • This column represents the output or the variable to be predicted based on the other independent variables (X).

3. Cross-Validation:

  • Determine the level or number of folds for cross-validation. Cross-validation is a resampling technique used to assess how the results of a predictive model will generalize to an independent dataset.

  • Common methods include k-fold cross-validation, where the dataset is divided into k subsets or folds. The model is trained on k-1 folds and tested on the remaining fold, repeated k times.


Summary Page

Simulator Overview:

Actionable Insights:

Last updated