How can I avoid overfitting?

The following is an answer I wrote on quora

Quora: How can I avoid overfitting?
[1] So first of all, let’s have a look at an overkilled case,

[1] From Christoph Bishop’s book: Pattern recognition and machine learning.
Blue dots are observed data X=\left\lbrace x_1, \ldots, x_n\right\rbrace associated with a noise model:
f(x) = sin(x) + \epsilon, where \epsilon could be a zero-mean Gaussian noise.
If you use a very complex 9-polynomial function to fit the data, the red line is what you got. And this is what we called “Overfitting”. In this case, the polynomial tries its best to fit the seen data, meanwhile the noises are actually dominating the information gain.
[2] Now the same case, but with more observed data

As you might expect, the function is getting smoother, even with a 9-polynomial curve fitting. In this case, nosies are averaged so as not to dominate the scenario. So the first countermeasure of overfitting is

What if getting more data is not possible? (In many real cases, it’s though a misfortune). Then we need to resort to control the complexity of your estimated function f(x), which usually can be done in following ways.
This is rather straightforward. Objective is to spot an optimal model parameter, e.g., power exponents, penalty term, etc.. For that you need to evaluate your model several times with various settings, plus a proper criteria of evaluation, e.g., the error rate, F1-score, and to choose the one with the optimal output.
This technique is commonly used in many models, even the model is not configured with adding a regularization term at the beginning, it can somehow be converted as being equivalent to a regularization. The basic idea if rather simple.

W = \sum V(f(x_i), t_i) + \lambda \Omega (f)
The V(\cdot) is a Loss function of the empirical error of observations. And the \Omega(f) is the penalty term which means that the complexity of f(x) should also be considered as small as possible. The coefficient \lambda  controls the trade-off of empirical loss and complexity. Note that if this objective is convex, the solution is unique.
One more thing….
It’s worth awhile also to mention the bias-variance decomposition. When you have only a few data points and try to train on it with a very complex function, it’s very possible that you hit overfitting. If you repeat this process many times, you will also notice that the function f(x)  you get are different from time to time. This is what we call Variance, where your model is quite unstable. This is very easy to understand from above, because your model is dominated by the random noises, even your training error (Bias) is minimal. There’s also a struggling tradeoff between Bias and Variance, which is a central role in Machine Learning.



Manually install a package in Miktex

Firstly, check README files, available documentation of the package, perhaps the beginning of the .dtx file to get installation information.

Installing a package available as dtx/ins bundle:

  • Download the content of the package directory. dtx is the extension of a documented source file, ins is the extension of an installation file.
  • Run LaTeX (or TeX) on the .ins file. This may be done using your editor or at the command prompt (latex packagename.ins). This would usually produce one or more files ending with .sty, perhaps some additional files. As you now have cls or sty files or the like, the remaining steps are the same like in the next alternative way:

Installing sty or cls files:

  • Create a new directory with the package name in your tex directory structure. With MiKTeX that directory might be C:\Program Files\MiKTeX 2.8\texmf\tex\latex\packagename\.
  • Copy the package files (*.sty*.cls etc.) into this directory.
  • Make the new package known to MiKTeX: refresh the MiKTeX filename database. To do this, click Start/ Programs/ MiKTeX 2.8/ Maintenance/ Settings (or similar) to get to the MiKTeX options, click the button “Refresh FNDB”. The installation is complete.
  • If you did not download the documentation already, you could get it by running pdfLaTeX or LaTeX on the .dtx file. Compile twice to get correct references.