Today's query is:
So Alpha is coming along (Some folks have taken to calling him HAL) though for now the Alpha team has to actively curate the information, which means they are scrubbing all the data sets he has to work with. This is obviously not the ideal - he should be acquiring and scrubbing his own data sets. Turns out though that this is mind bogglingly difficult. For several reasons.
Seems to me that a very important next step for these guys is to implement a feedback system so that a user can tell Alpha directly how valuable his answer was. Once they do this he can begin learning by SARSA methods. (SARSA is an acronym for an algorithmic paradigm used for machine learning. It stands for State Action Reward State Action)
Using SARSA, Alpha can begin trying new things with users and when they rate the results highly (give him a big Reward) he will have a better idea of how to answer (what Action to take) when he receives that same question (is in the same State again.) later.
While this will mitigate some of the difficulties above (the curators don't need to decide boundaries and interfaces etc, Alpha can figure them out for himself) It creates an entirely new - and very large - problem set. It probably has some fancy name in the machine learning literature which I've not learned yet, but what it is, is parenting.
I'll put together some thoughts about that for my next note.
-j
No comments:
Post a Comment