lundi 31 août 2015

How to preprocess high cardinality categorical features?

I have a data file which has features of different mobile devices. One column with categorical data type has 1421 distinct types of values. I am trying to train a logistic regression model along with other data that I have. My question is: Will the high cardinality column described above affect the model I am training? If yes, how do I go about preprocessing this column so that it has lower number of distinct values?



via Chebli Mohamed

Aucun commentaire:

Enregistrer un commentaire