Machine Learning in Python
In this post I experiment with various classification models using subsets of some real-world data sets. In particular, I present use the K-Nearest-Neighbor algorithm to classify text documents, experiment with and compare classifiers that are part of the scikit-learn machine learning package for Python, and use some preprocessing capabilities of pandas and scikit-learn packages.
For this problem I use an image segmentation data set for clustering using PCA as an approach to reduce dimensionality and noise in the data.Results of clustering the data with and without PCA. The data set is divided in three sections. The first segment is data about images with each each line is one image. Images are represented with 19 features . The second segment contains the class labels / type of image and a numeric class label for each of the corresponding image. The photos are cluster ,using class labels to measure completeness and homogeneity of the generated clusters. The data set used in this problem is based on the Image Segmentation data set at the UCI Machine Learning Repository.
For the association rule descovery I experiment using Apriori algorithm. Using a modified version of the Apriori implementation in Machine Learning in Action (it has been modified to compute lift values for rules in addition to confidence). The data set is based on a music playlist data set obtained from Yes.com. The data includes on each line a sequence of songs played as part of one playlist. The songs are represented by integer values. In addition it contains the mapping between the integer codes and song tiles and artists (format of the song names is [song title]::[artist]).
Finally For I am using a joke ratings data set based on Jester Online Joke Recommender System. The data set contains ratings on 100 jokes provided by 1000 users. The ratings have been normalized to be between 1 and 21 (a 20-point scale), with 1 being the lowest rating. This experiment return the mos similar jokes based on a query provided by a user.
The tree image presented below is generated within Python, but convert to png with linux. You may see the following link for further information.
Python code in link: