lib package¶
Submodules¶
lib.data_processing module¶
-
lib.data_processing.copy_upsample(X, y, over_sampling)[source]¶ Apply upsampling on minority data
Parameters: - X (pandas DataFrame) – Training Features
- y (pandas Series) – Training Features
Returns: resampled data
Return type: tuple
-
lib.data_processing.fill_missing_knn(df, na_column, n_neighbors=10, algorithm='ball_tree')[source]¶ fill missing data using k Nearest Neighbors
Parameters: - df (pandas DataFrame) – Data Frame
- na_column (list) – column with nans
- n_neighbors (int) – number of neighbors
- algorithm (string) – nearest neighbors algorithm
Returns: dataframe without nans
Return type: pandas DataFrame
-
lib.data_processing.k_fold_prediction(X, y, n_splits, model, reg_params, fit_parameters, upsample_kwargs)[source]¶ performs train-predict k fold
Parameters: - X (pandas DataFrame) – Training Features
- y (pandas Series) – Training Labels
- n_splits (int) – number of k splits
- model (object) – model with fit and predict methods
- reg_params (dictonary) – model kwargs
- fit_parameters (dictonary) – model fit kwargs
- upsample_kwargs (dictonary) – syntetic_sampling kwargs
Returns: predictions
Return type: pandas Series
-
lib.data_processing.ohe(df, columns, drop_first=True)[source]¶ apply OneHotEncoder
Parameters: - df (pandas DataFrame) – Data Frame
- columns (list) – column with nans
- drop_first (boolean) – drop first ohe columns
Returns: dataframe with ohe
Return type: pandas DataFrame
-
lib.data_processing.read_data(path, remove_nans=True, apply_ohe=True)[source]¶ read and transform data
Parameters: - path (string) – path to csv file
- remove_nans (boolean) – remove nans from DataFrame
- apply_ohe (boolean) – apply OneHotEncoder on categorical data
Returns: clean data
Return type: pandas DataFrame with transformations
-
lib.data_processing.syntetic_sampling(X, y, over_sampling, under_sampling)[source]¶ Apply Synthetic Minority Oversampling Technique (SMOTE) to tn unbalanced class
Parameters: - X (pandas DataFrame) – Training Features
- y (pandas Series) – Training Features
Returns: resampled data
Return type: tuple
-
lib.data_processing.train_test_sample(X, y, test_size, upsample_type=None, over_sampling=None, under_sampling=None)[source]¶ Splits into train and test samples and applies transformations to the train sample
Parameters: - X (pandas DataFrame) – Training Features
- y (pandas Series) – Training Features
- test_size (float) – test size from split, defaults to None
- upsample_type (float) – sampling method
- upsample_type – oversample rate
- under_sampling (float) – undersample rate
Returns: train and test split
Return type: tuple
lib.model module¶
-
class
lib.model.BinaryClassifier[source]¶ Bases:
objectBinary Classifier Model
-
dump(path=None)[source]¶ export model to file
Parameters: path (string) – path to dump model, defaults to None
-
fit(X, y)[source]¶ fit binary classifier model
Parameters: - X (pandas DataFrame) – Training features
- y (pandas Series) – Training labels
-
k_fold_prediction(X, y, n_splits=5)[source]¶ performs train-predict k fold
Parameters: - X (pandas DataFrame) – Training Features
- y (pandas Series) – Training labels
- n_splits (int) – number of k splits, defaults to 5
Returns: k fold predictions
Return type: pandas Series
-
-
class
lib.model.MultiClassifier[source]¶ Bases:
objectMulticlass Classifier Model
-
fit(X, y)[source]¶ fit multiclass classifier model
Parameters: - X (pandas DataFrame) – Training features
- y (pandas Series) – Training labels
-
lib.utils module¶
-
lib.utils.Evaluate(true_label, predicted_label, predicted_prob, labels)[source]¶ Plot confusion Matrix and displays accuracy f1 and roc_auc scores
Parameters: - true_label (array) – ground truth values
- predicted_label (array) – predicted values
- predicted_prob (array) – probability for each predicted class
- labels (list) – list containing label strings
-
lib.utils.arg_nearest(array, value)[source]¶ Find index of nearest value for a given number
Parameters: - array (array) – numpy array
- value (float) – desired value
Returns: index
Return type: int
-
lib.utils.plot_confusion_matrix(cm, labels, suptitle='Confusion Matrix')[source]¶ _subplot_cm wapper - Plot normalized and not normilized confusion matrix
Parameters: - cm (array) – confusion matrix array
- labels (list) – list containing label strings
- suptitle (string) – plot title, defaults to Confusion Matrix