DATA SCIENCE
The Covariance Matrix of data points is analyzed here to understand what dimensions(mostly)/ data points (sometimes) are more important.
If you have used Numerical Analysis code in college, you can use them to fit curves in Machine Learning for very small datasets with low dimensions.
The input the algorithm has taken is the number of clusters that are to be generated and the number of iterations in which it will try to converge clusters.
Logistic Regression is trained using optimization methods like Gradient Descent or L-BFGS. NLP people will often use it with the name of Maximum Entropy Classifier.
You can optimize the loss function using optimization methods like L-BFGS or even SGD. Another innovation in SVMs is the usage of kernels on data to feature engineers.
These are basically multilayered Logistic Regression classifiers. FFNNs can be used for classification and unsupervised feature learning as autoencoders.
Almost any state-of-the-art Vision-based Machine Learning result in the world today has been achieved using Convolutional Neural Networks.
Pure RNNs are rarely used now but their counterparts like LSTMs and GRUs are state of the art in most sequence modeling tasks.
CRF models each element of the sequence such that neighbors affect a label of a component in a sequence instead of all labels being independent of each other.
Earlier versions like CART trees were once used for simple data, but with a bigger and larger datasets, the bias-variance tradeoff needs to be solved with better algorithms.