Mnist Dataset dealt with Common Machine Learning Algorithms and CNN model

发表于 2023-05-21 分类于 Python

This Chapter contains normal ML Algorithms in CIS Lectures. The mnist dataset was selected for demonstration. It is worth noting that only 1000 samples were used for training and testing here, in order to accelerate the demonstration process. If better performance is to be achieved, all training and testing data should be used.

Project Repository: Basic-Mnist-Classification

I will appreciate if you give my repository a star all follow my channel. Plzzzzzzzz!!!!

SVM: Support Vector Machine

Support Vector Machine (SVM) is a popular machine learning algorithm used for classification and regression analysis. The basic idea behind SVM is to find a hyperplane in a high-dimensional space that separates the different classes of data points as best as possible. In other words, SVM tries to find the boundary between two classes of data by maximizing the margin between them. This margin is the distance between the closest data points of each class to the separator or hyperplane.

SVM can also handle non-linearly separable datasets through a technique called kernel trick, which maps the input data into a higher-dimensional space where it becomes linearly separable. Some popular kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid.

SVM has been widely used in various fields such as image recognition, text classification, bioinformatics, and finance due to its effectiveness in handling complex datasets and relatively good performance in comparison to other algorithms.

from tensorflow.keras.datasets import mnist
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.svm import SVC
import seaborn as sns

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape((60000, 28*28))[:1000]
X_test = X_test.reshape((10000, 28*28))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

svm = SVC(kernel='linear')
svm.fit(X_train, y_train)

y_pred = svm.predict(X_test)

accuracy = svm.score(X_test, y_test)
print("Accuracy: ", accuracy)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Accuracy: 0.853

KNN: K-Nearest Neighbor

KNN (K-Nearest Neighbor) is one of the simplest machine learning algorithms, which can be used for classification and regression. It is a supervised learning algorithm. Its idea is that if most of the K most similar (i.e. closest) samples in the feature space belong to a certain category, then the sample also belongs to that category. That is to say, this method only determines the category of the sample to be divided based on the category of the nearest one or several samples in the classification decision.

from tensorflow.keras.datasets import mnist
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
import seaborn as sns

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape((60000, 28*28))[:1000]
X_test = X_test.reshape((10000, 28*28))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)

accuracy = knn.score(X_test, y_test)
print("Accuracy: ", accuracy)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Accuracy: 0.815

DTC: Decision Tree Classifier

Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations.

Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. More generally, the concept of regression tree can be extended to any kind of object equipped with pairwise dissimilarities such as categorical sequences.[1]

Decision trees are among the most popular machine learning algorithms given their intelligibility and simplicity.[2]

In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data (but the resulting classification tree can be an input for decision making).

from tensorflow.keras.datasets import mnist
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
import seaborn as sns

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape((60000, 28*28))[:1000]
X_test = X_test.reshape((10000, 28*28))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

dtc = DecisionTreeClassifier(max_depth=10)
dtc.fit(X_train, y_train)

y_pred = dtc.predict(X_test)

accuracy = dtc.score(X_test, y_test)
print("Accuracy: ", accuracy)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Accuracy: 0.646

RF: Random Forest

from tensorflow.keras.datasets import mnist
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape((60000, 28*28))[:1000]
X_test = X_test.reshape((10000, 28*28))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

rfc = RandomForestClassifier(n_estimators=100, max_depth=10)
rfc.fit(X_train, y_train)

y_pred = rfc.predict(X_test)

accuracy = rfc.score(X_test, y_test)
print("Accuracy: ", accuracy)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Accuracy: 0.862

CNN Network

import tensorflow as tf
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape((60000, 28, 28, 1))[:1000]
X_test = X_test.reshape((10000, 28, 28, 1))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

X_train = X_train / 255.0
X_test = X_test / 255.0

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(X_train, y_train, epochs=50, validation_data=(X_test, y_test))

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(12, 6))

# 绘制 Loss 图像
ax[0].plot(history.history['loss'], label='Training Loss')
ax[0].plot(history.history['val_loss'], label='Validation Loss')
ax[0].set_title('Loss During Training')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Loss')
ax[0].legend()

# 绘制 Accuracy 图像
ax[1].plot(history.history['accuracy'], label='Training Accuracy')
ax[1].plot(history.history['val_accuracy'], label='Validation Accuracy')
ax[1].set_title('Accuracy During Training')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('Accuracy')
ax[1].legend()

plt.show()

GBDT: Gradient Boosting Decision Tree

Classic gradient lifting tree: XGBoost and LightGBM. Some differences between them:

Computational efficiency: LightGBM uses a histogram like technique when constructing decision trees, which can quickly find the optimal splitting point. Therefore, when training large datasets, LightGBM is usually faster than XGBoost.
Memory usage: LightGBM uses column based storage to reduce memory usage. This makes LightGBM more suitable for processing large high-dimensional data.
Regularization: XGBoost supports regularization technology to avoid overfitting, such as L1 and L2 regularization, feature importance ranking, etc. LightGBM only supports L2 regularization.
Distributed computing: XGBoost has built-in distributed computing capabilities, which can train models in parallel on multiple nodes. LightGBM currently does not support distributed computing.
Data sampling method: XGBoost adopts an instance based sampling method, which randomly selects a subset as the training set for each decision tree. LightGBM adopts a feature based sampling method, which samples features to select subsets.

Overall, XGBoost and LightGBM perform well in practice, with the main difference being their design philosophy and implementation details. When choosing an algorithm, it should be based on the characteristics of the specific problem. If you need to process large-scale high-dimensional data, you can consider using LightGBM; If you need regularization or distributed computing, you can consider using XGBoost.

XGBoost

import xgboost as xgb
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.datasets import mnist
import numpy as np

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshape the data for training and testing
X_train = X_train.reshape((60000, 28*28))[:1000]
X_test = X_test.reshape((10000, 28*28))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

# Create DMatrix objects from the data
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Define the hyperparameters for the XGBoost model
param = {
    'max_depth': 3,
    'eta': 0.1,
    'objective': 'multi:softmax',
    'num_class': 10
}

# Train the XGBoost model
num_rounds = 50
bst = xgb.train(param, dtrain, num_rounds)

# Predict the labels for the test set
preds = bst.predict(dtest)

# Calculate the accuracy of the classifier
acc = accuracy_score(y_test, preds)
print("Accuracy: {:.2f}%".format(acc * 100))

# Plot the confusion matrix
cm = confusion_matrix(y_test, preds)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Accuracy: 82.60%

LightGBM

import lightgbm as lgb
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.datasets import mnist
import numpy as np

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Reshape the data for training and testing
X_train = X_train.reshape((60000, 28*28))[:1000]
X_test = X_test.reshape((10000, 28*28))[:1000]
y_train = y_train.astype(int)[:1000]
y_test = y_test.astype(int)[:1000]

# Create Dataset objects from the data
train_data = lgb.Dataset(X_train, label=y_train)
test_data = lgb.Dataset(X_test, label=y_test)

# Define the hyperparameters for the LightGBM model
param = {
    'max_depth': 3,
    'learning_rate': 0.1,
    'objective': 'multiclass',
    'num_class': 10
}

# Train the LightGBM model
num_rounds = 50
bst = lgb.train(param, train_data, num_rounds)

# Predict the labels for the test set
preds = bst.predict(X_test)
preds = np.argmax(preds, axis=1)

# Calculate the accuracy of the classifier
acc = accuracy_score(y_test, preds)
print("Accuracy: {:.2f}%".format(acc * 100))

# Plot the confusion matrix
cm = confusion_matrix(y_test, preds)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

Accuracy: 82.90%