Training a model - data preparation

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MiddaC
    Junior Member
    • Nov 2012
    • 2

    #1

    Training a model - data preparation

    Hi All,
    I am facing a data cleaning problem (I operate in r and microsoft azure) and I was hoping to get some suggestions.

    My problem is outlined as follows:

    In horse racing data a common field may be a horses time or weight rating for each of its last 10 runs. Where a horse has only had say 5 runs, data for runs 6-10 will logically be missing, and this missing data has meaning. Are there any recommendations on how to clean data in this instance, in order to train a model - the method of cleaning would need to also work on new data, with the same transformations it applied to the training set, when used for predictions.

    I would really appreciate any ideas!

    Many thanks
  • jabe
    Senior Member
    • Dec 2014
    • 705

    #2
    You could consider what typically happens with the horses from the same stable and/or trainer. Gaps between runs may be worth considering too, not to mention the standard of each race.

    Comment

    • MiddaC
      Junior Member
      • Nov 2012
      • 2

      #3
      Thanks, Jabe. One to think about for sure.

      Comment

      • SystematicBettingDotCom
        Junior Member
        • Jul 2013
        • 11

        #4
        Are you modelling in Python or R in Python you have to take care of the missing values. There are a few options such as replace with the mean of that column but as you say this may lose the meaning of those smaller run horses. You could attach an extra field to each one of the run fields which is either 0 meaning its a genuine run or 1 if its a run created by averaging or some other method ie a missing value

        Comment

        Working...
        X