Hi All,
I am facing a data cleaning problem (I operate in r and microsoft azure) and I was hoping to get some suggestions.
My problem is outlined as follows:
In horse racing data a common field may be a horses time or weight rating for each of its last 10 runs. Where a horse has only had say 5 runs, data for runs 6-10 will logically be missing, and this missing data has meaning. Are there any recommendations on how to clean data in this instance, in order to train a model - the method of cleaning would need to also work on new data, with the same transformations it applied to the training set, when used for predictions.
I would really appreciate any ideas!
Many thanks
I am facing a data cleaning problem (I operate in r and microsoft azure) and I was hoping to get some suggestions.
My problem is outlined as follows:
In horse racing data a common field may be a horses time or weight rating for each of its last 10 runs. Where a horse has only had say 5 runs, data for runs 6-10 will logically be missing, and this missing data has meaning. Are there any recommendations on how to clean data in this instance, in order to train a model - the method of cleaning would need to also work on new data, with the same transformations it applied to the training set, when used for predictions.
I would really appreciate any ideas!
Many thanks


Comment