Machine Learning

Machine Learning Projects

Here are some machine learning projects. They include classification, regression clustering, data mining, dimensionality reduction, ensemble learning, neural networks, clustering and other methods.

Quick Link to Projects

Perceptron

Spam and Ham Email Filter

Hand Digit Number Recognition

Streaming Human Detection System

Amazon Seniment Analysis

Perceptron

Description

This is a basic perceptron written in python

Project Code: Perceptron

Posted on Nov 1, 2018 by Rob

Spam and Ham Email Filter

Description

This is a spam and ham email filter using the enron emails to train and test on. This model uses a unigram based model with naive bayes.

Project Code: Spam - Ham Filter

Posted on Nov 1, 2018 by Rob

Number Recognition

Description

I created the data set using the numbers 0 to 9. This is a CNN using Keras in python.

Project Code: Num Reg

Posted on Nov 1, 2018 by Rob

Streaming Human Detection System

Description

This is my first attempt exloring OpenCV's video system and their recognition system. Using Python and OpenCV this is a basic image recognition system for video. This system uses haarcascade_fullbody.xml, haarcascade_eye.xml and haarcascade_frontalface_default.xml which are pre trained models.

Project Code: Dect System

Posted on Nov 1, 2018 by Rob

Amazon Sentiment Analysis

Description

This is Sentiment Analysis.

The first approach is a Bag of Words model. This gets around 85-86% accuracy on determining sentiment. That is the good or bad. The user rainging of a project from 1-5 starts is binned good or bad, 0 or 1.

A few other approaches are explored using Keras embedding to include: LSTM (83%), RNN (78%) and a NN (84%).

All the Keras results were still between 78-84%, which is worse than the Unigram or Bigram models. The VectorCount for Unigram does around 85-86% accuracy.

Why these results ? The LSTM model did not have much tuning of hyperparameters, such as tuning the embedding dimensionality or tuning the output dimensionality. Also a lack of regularization likely hurt it. But the big thing is LTSM isn't best for sentiment analysis, but better for Question - answer type of problems.

The RNN also did not do that well. This is likely because it is a small network, and we only consider the first 100 words in a review. Thus the RNN has less information to train on than vs the bag of words model, and RNNs are not the best for very long sequences.

And lastly our convolutional network. This model didn't do the worst, but also not the best. Likely with some tuning it would do better, but again runs into the issue of not enough data. The CNN should perform as well, if not better than the LSTM, and has the benefit of being faster. Which in our case it did.

Project Code: Sentiment Analysis

Posted on Nov 1, 2018 by Rob

Description

Description

Description

Description

Description

Rob's Github Pages