BRAIN-STROKE-PREDICTION-APP
This application is built for predicting whether the person might affect by a Brain stroke or not. The application is built from a machine learning model which is hosted on Flask framework with HTML webpages. The Operations are carried out by, GIT GITHUB, Docker, AWS, AWS_ECR, Jenkins, Kubernetes.
Let's Quick Start the project with some concepts of MachineLearning!
Machine learning is based on algorithms that can learn from data without relying on rules-based programming. Simply, we can define it as machines learn algorithms and trains by datasets to perform tasks. Machine Learning algorithms /problems are classified into supervised learning and unsupervised learning.
Our Data set: — https://www.kaggle.com/fedesoriano/stroke-prediction-dataset
# Dataset Columns #
1) id: unique identifier
2) gender: “Male”, “Female” or “Other”
3) age: age of the patient
4) hypertension: 0 if the patient doesn’t have hypertension, 1 if the patient has hypertension
5) heart_disease: 0 if the patient doesn’t have any heart diseases, 1 if the patient has a heart disease
6) ever_married: “No” or “Yes”
7) work_type: “children”, “Govt_job”, “Never_worked”, “Private” or “Self-employed”
8) Residence_type: “Rural” or “Urban”
9) avg_glucose_level: average glucose level in blood
10) bmi: body mass index
11) smoking_status: “formerly smoked”, “never smoked”, “smokes” or “Unknown”
12) stroke: 1 if the patient had a stroke or 0 if not
From this data we have to predict patient had stroke or not .So,We have to use supervised classification algorithms to build our models.You can get detailed information about the dataset from the above link.
without late, Let’s dive into building a machine learning model from the dataset.
Let's get an overview architecture of our process in building a machine learning model.
Once Look at the jupyter notebook
Data Visualisation:
We come up to know whether the feature of the dataset is useful to predict our outcome or not by visualizing each feature is more important.
Data visualization is the graphical representation of information and data.
- I used the seaborn library to plot the graphs.
I Provided the Jupyter notebook above from that you can visualize properly each feature of the data set and the correlation between the features.
Data Preprocessing
The usual dataset contains a lot of stuff that can’t understand by a machine/system. We have to arrange them correctly so that the machine can understand.
The most important task of a machine learning engineer is to perform data preprocessing.
The more accurate/efficient data collected from the dataset results in a great accuracy/performance machine learning model.
More time on Data preprocessing leads to a better machine learning model.
In our dataset, by visualizing the graphs we come to know that in bmi column, some data points are missing we have to fill these to get an accurate model.
I used the mean of the bmi to fill these blanks.
From the above jupyter notebook, you can identify them clearly.
In bmi feature we have some data points around 70–100 this won't be a humans body mass index. These are known as outliers we used seaborn plots to identify these outliers.
we removed these outliers by the IQR method.
The next and major thing in our dataset we have English words like Male or Female which can't understand by the computer.We know that Computer only understands numbers.
I used One Hot Encoding to transform these words into numerical so that the computer can understand them.
Balancing the target field
We observed in stroke(target field) the data is imbalanced that is stroke-0 has around 4700 and with 1 around 270 If we create a model with this, it will impact our model. Our model will only consider the majority we get always 0 has a prediction. To overcome this we have to do Under-sampling or oversampling
I'm using Oversampling — SMOTE method to balance the data, This will bring minority class equal to the majority class
Now our dataset is clean.
In our dataset, We split our dataset into a training dataset and testing dataset.
I used train_test_split to split the data into training and testing.
So that I can train the model and test the model to get the accuracy score of the model
Building Model
I trained the models with LogisticRegression, DesicionTreeClassifier, RandomForestClassifier ,Support vector machine algorithms.
I got high accuracy in the RandomForestClassifer model so I saved the model for hosting this model in a flask application.
import joblib
joblib_file="Brain.h5"
joblib.dump(Random_model, joblib_file)
Video:- I have done the practical. You can check the below video.
Now It’s time to develop a web application for our ML-Model.
I used Flask -Web Framework for developing our MachineLearning application.
I used port 5000 to run our flask application which is a default port number of the flask.
This is the app.py code.
Html Pages:
Our web applications look like this:
Till now our application was built successfully its time to deploy our application into the production world.
For deploying our application we have to use DevOps practices.
The Architecture of Deployment
I wrote a Dockerfile for containerizing our ML application.
I used Centos image as my base image and installed some software like Keras,sci-kit-learn, Pandas, Tensorflow, Python, copied flask code, HTML pages. You can check them in my Dockerfile.
The First and foremost step is:
Our machine learning application code along with Dockerfile is on our local computer.
With the help of Git, I pushed the code to Github.
Now the important and interesting tool -that is JENKINS!
Prerequisite:
- Jenkins service should be started.
- In the Jenkins node, the Docker service should be started.
- Jenkins user should be added to the docker group.
- CloudBees Docker Build and Publish plugin, Docker Pipeline, Docker, Github, Kubernetes Continuous Deployment, plugins in Jenkins
I hosted Jenkins on Ec2-Instance.
ISSUE:-
For deploying Kubernetes pods in the cluster with the help of the YAML file by Jenkins we have to degrade the version of the Kubernetes continuous deployment plugin to version 1.
Reference for this issue is:- https://github.com/jenkinsci/kubernetes-cd-plugin/issues/134#issuecomment-810451397
1) Creating k8s cluster on AWS eks
prerequisites:- Install AWS CLI in our base os and login to our AWS account with Access Key and Secret Key. Set up eksctl in our Base os (I used Jenkins node -ec2 instance).
Now create a yml file for creating eks cluster.
eksctl create cluster -f eks.yml
Now install kubectl and connect to our cluster.
Note:
Now after creating the k8s cluster we should set a cluster role as cluster-admin
By default, cluster role binding has a system: anonymous set which blocks the cluster access. Execute the following command to set a cluster role as cluster-admin which will give you the required access.
command:
kubectl create cluster role binding cluster-system-anonymous — clusterrole=cluster-admin — user=system:anonymous
Jenkins pipeline:-
Stage 1 of the pipeline is it will go to GitHub and check out the code.
stage2: After stage1 a docker image is built and push to docker hub with my docker hub credentials which I updated in Jenkins.
Stage3: After pushing the image docker hub in stage 2, with help of the Kubernetes deployment YAML file which I updated in GitHub launches the pods in the Kubernetes cluster running on AWS EKS Cluster whose credentials(Kube config file) updated in Jenkins. After deploying the pod I exposed the deployment with k8s Service type LoadBalancer.
I installed the Kubectl command in the Jenkins node so that after deploying I can check with the command and retrieve the load balancer URL.
We can use Webhooks in Github so that whenever there is any code pushed to repo our pipeline starts automatically.
pipeline:-
Yahoo!!! Our application is Deployed. To connect to our application we should copy the k8s service — Loadbalancer URL with 5000 port (this is flask application port) (this is the home page of our application).
Video:- I have done the practical. You can check the below video.
Wait! What if we don't want to push the docker image to the public registry (Docker Hub).
We have a Private registry— AWS_ECR.
Prerequisites:
- AWS ECR REPO
- We have to create an IAM role with this ‘AmazonEC2ContainerRegistryFullAccess’power and attach it to the Jenkins instance so that Jenkins can connect to ECR.
Now let's build a Jenkins pipeline to push the docker image to the ECR repo and connect to the Kubernetes cluster to deploy the pods and expose them.
Architecture:
Stage1: Checkout the GitHub code.
Stage2: Docker image is built and tagged with ECR repo name.
Stage3: Docker image push to ECR.
Stage3: With the help of the Kubernetes YAML file, pods are launched in eks cluster with our ECR Image, and pods are exposed with a LoadBalancer.
pipeline :-
Issue:
When I try to log in to Aws ecr with Jenkins to push the image it failed with the default command. I found a solution in stack overflow -there is a slight change in command.