diff options
author | Phil Burton <phil@d3r.com> | 2019-02-25 13:37:59 +0000 |
---|---|---|
committer | Phil Burton <phil@d3r.com> | 2019-02-25 13:37:59 +0000 |
commit | 3431e667a5c6475043ebfd97b43a3fdc4b078596 (patch) | |
tree | cd9eb1249e42de8ee1c7e99fd83cb7f091637b7c /day1/supervised-learning.txt | |
parent | 4e8368f4d847e5c1352302fc53658dfab2c72a9b (diff) |
Diffstat (limited to 'day1/supervised-learning.txt')
-rw-r--r-- | day1/supervised-learning.txt | 77 |
1 files changed, 77 insertions, 0 deletions
diff --git a/day1/supervised-learning.txt b/day1/supervised-learning.txt new file mode 100644 index 0000000..1dce7da --- /dev/null +++ b/day1/supervised-learning.txt @@ -0,0 +1,77 @@ +# Learning: the hows and whys of machine learning + +Liam Wiltshire +https://liam-wiltshire.github.io/talks/?talk=machinelearning&conference=phpuk +https://joind.in/event/php-uk-conference-2019/learning-the-hows-and-whys-of-machine-learning + +## Overivew + +Charge backs + +## Supervised learning +Training data +Learning functions +Categorisation / Classification +Regression - Where do we sit on a line + +## Naive Bayes classifier +Standardise words +- Un pluralise +- Un gender +- Un tense +- etc + +More data == better + +## Tokenisation +https://en.wikipedia.org/wiki/Benford%27s_law +https://php-ml.readthedocs.io + +Unique tokens for each unique context + +## Imbalanced data +One category has more database +99% data not charge back +Just being accurate, not very helpful + - Started by flagging 100% as fine. + - Need to collect more data, change methods, resample data + +## Understand data +- context +- Common data vs specific data +- Continuous vs discrete data + +## KNN +K Nearest Number +https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm + - Distances + - less sensitive to imbalance + - Keep K odd (no draws) + +## Handling nominal data + +Binary +- Increase amounts of dimensions +- normalisation required +- equal scales + +## Contextless data is meaningless +Is it normal? + +## Next to try +Weighting +Different dimensions +Change K value (was 3NN) +Remove outliers +Diff distance function +weighted distance + + + + +# Useful links +https://en.wikipedia.org/wiki/Benford%27s_law +https://php-ml.readthedocs.io +https://liam-wiltshire.github.io/talks/?talk=machinelearning&conference=phpuk +https://joind.in/event/php-uk-conference-2019/learning-the-hows-and-whys-of-machine-learning +https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm |