• Model Selection and Evaluation

    In order to effectively evaluate our models we need something to evaluate them with. To achieve this we split our data into three groups: Training, Validation and Evaluation (sometimes known as Test). How we go about splitting our data will depend a lot on how much data we have. There are two techniques used for splitting data holdout and cross-validation. The former is only useful for large datasets as it removes a fairly sizeable chunk of the data. On smaller datasets this results in there being insufficient data in each group resulting in loss of trend and generalisation as well as an increased danger of overfitting. These issues can be counteracted to an extent using cross-validation.

  • The Curse Of Dimensionality

    The curse of dimensionality refers to the fact that the properties of high dimensional space are weird and counter intuitive, and the higher the number of dimensions the weirder it gets.