Hello and welcome to my github pages. I created these pages to highlight and navigate the various projects I have worked on over the years. Some are school projects, some work related and most are done for fun. Some projects are set to private due to rights restrictions or due to being an assignment for a school class.
Here are some machine learning projects. They include classification, regression clustering, data mining, dimensionality reduction, ensemble learning, neural networks, clustering and other methods.
Project Code: Spam - Ham Filter
Project Code: Num Reg
This is my first attempt exloring OpenCV's video system and their recognition system. Using Python and OpenCV this is a basic image recognition system for video. This system uses haarcascade_fullbody.xml, haarcascade_eye.xml and haarcascade_frontalface_default.xml which are pre trained models.
Project Code: Dect System
This is Sentiment Analysis.
The first approach is a Bag of Words model. This gets around 85-86% accuracy on determining sentiment. That is the good or bad. The user rainging of a project from 1-5 starts is binned good or bad, 0 or 1.
A few other approaches are explored using Keras embedding to include: LSTM (83%), RNN (78%) and a NN (84%).
All the Keras results were still between 78-84%, which is worse than the Unigram or Bigram models. The VectorCount for Unigram does around 85-86% accuracy.
Why these results ? The LSTM model did not have much tuning of hyperparameters, such as tuning the embedding dimensionality or tuning the output dimensionality. Also a lack of regularization likely hurt it. But the big thing is LTSM isn't best for sentiment analysis, but better for Question - answer type of problems.
The RNN also did not do that well. This is likely because it is a small network, and we only consider the first 100 words in a review. Thus the RNN has less information to train on than vs the bag of words model, and RNNs are not the best for very long sequences.
And lastly our convolutional network. This model didn't do the worst, but also not the best. Likely with some tuning it would do better, but again runs into the issue of not enough data. The CNN should perform as well, if not better than the LSTM, and has the benefit of being faster. Which in our case it did.
Project Code: Sentiment Analysis