Label Encoder vs. One Hot Encoder in Machine Learning
Originally published here: http://blog.contactsunny.com/data-science/label-encoder-vs-one-hot-encoder-in-machine-learning
Update: SciKit has a new library called the ColumnTransformer which has replaced LabelEncoding. You can check out this updated post about ColumnTransformer to know more.
If you’re new to Machine Learning, you might get confused between these two — Label Encoder and One Hot Encoder. These two encoders are parts of the SciKit Learn library in Python, and they are used to convert categorical data, or text data, into numbers, which our predictive models can better understand. Today, let’s understand the difference between the two with a simple example.
Label Encoding
To begin with, you can find the SciKit Learn documentation for Label Encoder here. Now, let’s consider the following data:
In this example, the first column is the country column, which is all text. As you might know by now, we can’t have text in our data if we’re going to run any kind of model on it. So before we can run a model, we need to make this data ready for the model.
And to convert this kind of categorical text data into model-understandable numerical data, we use the Label Encoder class. So all we have to do, to label encode the…