Machine Learning

There are lists of algorithms which can be utilized for machine learning purpose. However, in below I will divide the list in to two major sections. Section 1 is based upon what I have seen being used in the field (cannot claim that I have seen everything). Section 2 is based on other references.


Section 1

Regression analysis, Bays and K-Means/kNN are very common  in the practical world, especially in the Government. Sometimes they are utilized as stand alone and in other times, they are used in conjunction with each other or with other models.

  1. C4.5 (Decision Trees)
  2. k-Means (clustering)
  3. k-Nearest Neighbors (kNN)
  4. Naive Bayes
  5. Regression Analysis (Linear/Multiple/Logistic)
  6. Bayesian Networks

Artificial Neural Networks (ANN) might become very popular in near future. I tried to implement this in some of my projects and still doing research on it as part of my PhD research. ANN has unlimited flexibility and prospect and can be utilized for dynamic Machine Learning Environment. However, not many places are using this algorithm as of yet. It is relatively complex to understand and to apply.


Section 2

In 2006, the IEEE Conference on Data Mining identified the top 10 ML algorithms as

  1. C4.5 (Decision Trees)
  2. k-Means (clustering)
  3. Support Vector Machines (SVM)
  4. Apriori
  5. Expectation Maximization (EM)
  6. PageRank
  7. AdaBoost
  8. k-Nearest Neighbors (kNN)
  9. Naive Bayes
  10. Classification and Regression Tree (CART)

An answer to the Quora question, in 2011, lists the following as potential candidates or additions:

  1. Kernel Density Estimation and Non-parametric Bayes Classifier
  2. K-Means
  3. Kernel Principal Components Analysis
  4. Linear Regression
  5. Neighbors (Nearest, Farthest, Range, k, Classification)
  6. Non-Negative Matrix Factorization
  7. Support Vector Machines
  8. Dimensionality Reduction
  9. Fast Singular Value Decomposition
  10. Decision Tree
  11. Bootstapped SVM
  12. Gaussian Processes
  13. Logistic Regression
  14. Logit Boost
  15. Model Tree
  16. Naïve Bayes
  17. PLS
  18. Random Forest
  19. Ridge Regression
  20. Support Vector Machine
  21. Attribute importance: MDL
  22. Anomaly detection: one-class SVM
  23. Clustering: k-means, orthogonal partitioning
  24. Association: A Priori
  25. Feature extraction: NNMF

And a 2015 answer provides the following:

  1. Linear regression
  2. Logistic regression
  3. k-means
  4. SVMs
  5. Random Forests
  6. Matrix Factorization/SVD
  7. Gradient Boosted Decision Trees/Machines
  8. Naive Bayes
  9. Artificial Neural Networks
  10. For the last one I’d let you pick one of the following:
  11. Bayesian Networks
  12. Elastic Nets
  13. Any other clustering algo besides k-means
  14. LDA
  15. Conditional Random Fields
  16. HDPs or other Bayesian non-parametric model

Fe other algorithms developed or re-developed at the Data Science Central’s research lab:

  • Jackknife regression
  • Feature extraction / selection (mentioned above, but this version is very different)
  • Hidden decision trees
  • Indexation and tagging algorithms



3,964 total views, 3 views today