.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/preprocessing/plot_dummy_code_encoder.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via JupyterLite or Binder. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_preprocessing_plot_dummy_code_encoder.py: =============================================== Comparing DummyCode Encoder with Other Encoders =============================================== .. currentmodule:: scikitplot.preprocessing The :class:`DummyCodeEncoder` to encode each categorical features into dummy/indicator 0/1 variables. In this example, we will compare various different approaches for handling categorical features: :class:`GetDummies`, :class:`TargetEncoder`, :class:`OrdinalEncoder`, :class:`OneHotEncoder`, and dropping the category. .. note:: `fit(X, y).transform(X)` does not equal `fit_transform(X, y)` because a cross fitting scheme is used in `fit_transform` for encoding. See the :ref:`User Guide ` for details. .. GENERATED FROM PYTHON SOURCE LINES 20-24 .. code-block:: Python # Authors: The scikit-plots developers # SPDX-License-Identifier: BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 25-28 Loading Data ======================== First, we load the "autoscout24" dataset: .. GENERATED FROM PYTHON SOURCE LINES 28-33 .. code-block:: Python from scikitplot.datasets import load_dataset df = load_dataset("autoscout24") df .. raw:: html
id price make model model_version registration_date mileage_km_raw vehicle_type body_type fuel_category primary_fuel transmission power_kw power_hp nr_seats nr_doors country_code zip city latitude longitude is_used seller_is_dealer offer_type description equipment_comfort equipment_entertainment equipment_extra equipment_safety
0 1df9ec13-c5a4-4d6d-85cd-dc198b43ed36 36900.0 Mercedes-Benz CLA 180 SB AMG Night Edition Pano/StdHG/Dist/Tot 2025-07-01 3500.0 Car Other Gasoline Super 95 Automatic 100.0 136.0 5.0 5.0 DE 86633 Neuburg an der Donau 48.727590 11.200660 False True U <ul><li>Fahrzeug-Nr. für Kundenanfragen: 05502... ['360° camera', 'Armrest', 'Automatic climate ... ['Android Auto', 'Apple CarPlay', 'Bluetooth',... ['Alloy wheels', 'Ambient lighting', 'Automati... ['ABS', 'Adaptive Cruise Control', 'Adaptive h...
1 9f455135-68ca-4dd8-aa39-139b27a7dcf2 147990.0 Porsche Panamera 4 E-Hybrid*SPORT-DESIGN*EXCLUSIVE-MANUFAKTUR*B... 2025-05-01 16990.0 Car Compact Electric/Gasoline Super 95 Automatic 346.0 470.0 4.0 5.0 AT 8141 Premstätten 46.956710 15.402450 True True U Dieser neue <strong>Porsche Panamera 4 E-Hybri... ['360° camera', 'Air conditioning', 'Air suspe... ['Android Auto', 'Apple CarPlay', 'Bluetooth',... ['Alloy wheels (21")', 'Ambient lighting', 'Au... ['ABS', 'Adaptive Cruise Control', 'Adaptive h...
2 b1c0eaff-a414-49f3-bf20-dd3672fe5233 26900.0 BMW 320 d Touring mhev 48V Luxury auto 2020-09-01 62503.0 Car Station wagon Electric/Diesel Electricity Automatic 140.0 190.0 5.0 5.0 IT 12100 Cuneo - Cn 44.425960 7.556430 True True U <strong>Prima di recarsi presso una nostra sed... [] [] [] []
3 e5f80b61-74ac-4394-8c0a-ec756145ae9b 69900.0 Porsche 997 4S WLS GT3 AeroCup Klappe Chrono Bose 1.Hd ! 2008-02-01 88017.0 Car Coupe Gasoline Regular/Benzine 91 Automatic 280.0 381.0 4.0 2.0 DE 89155 Erbach 48.302620 9.907480 True True U <strong>Porsche 997 Carrera 4S WLS Exclusive -... ['Air conditioning', 'Armrest', 'Automatic cli... ['CD player', 'Hands-free equipment', 'On-boar... ['Alloy wheels', 'Automatically dimming interi... ['ABS', 'Alarm system', 'Central door lock', '...
4 9fc53b43-2210-4515-a499-2aa1c8e83e4d 42264.0 Mercedes-Benz A 250 e con tecnologí híbrida EQ 2025-07-01 6000.0 Car Sedan Electric/Gasoline NaN Automatic 160.0 218.0 5.0 5.0 ES 46470 VALENCIA 39.403970 -0.385390 True True U <strong>Precio al contado: 44800 euros</strong... ['Automatic climate control', 'Cruise control'] ['Bluetooth'] ['Alloy wheels'] ['ABS', 'Side airbag']
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
111 649dbdd4-d01b-4bd6-b712-47809b2651df 1495.0 Alfa Romeo 147 1.6 T.Spark Progression 2001-10-01 214745.0 Car Compact Gasoline Super 95 Manual 77.0 105.0 5.0 3.0 NL 1704 RX HEERHUGOWAARD 52.685610 4.830900 True True U Inruiler, zo mee!!<br /><br /><strong>Meer inf... ['Air conditioning', 'Electrical side mirrors'... ['CD player', 'Radio'] [] ['ABS', 'Alarm system', 'Central door lock', '...
112 6dda33d9-579c-46e8-905c-885bd870d006 2500.0 BMW 320 320i 2005-07-01 242100.0 Car Sedan Gasoline NaN Manual 125.0 170.0 5.0 4.0 DE 90411 Nürnberg 49.474428 11.103918 True False U Schäden vorne krilo, tsprapina Tür vorne recht... ['Air conditioning', 'Armrest', 'Electrical si... ['On-board computer'] ['Alloy wheels', 'Emergency tyre'] ['ABS', 'Alarm system', 'Central door lock wit...
113 f4168e59-46f6-417c-a51f-1e07f8e0a082 34350.0 Alfa Romeo Junior Ibrida Q4 1.2 MHEV eAWD e-DCT6 2025-10-01 10.0 Car Off-Road/Pick-up Gasoline NaN Automatic 100.0 136.0 5.0 5.0 AT 4240 Freistadt 48.492650 14.503010 False True N Wunderschöner Alfa Romeo Junior Ibrida Q4, eAW... ['Air conditioning', 'Armrest', 'Automatic cli... ['Android Auto', 'Apple CarPlay', 'Bluetooth',... ['Alloy wheels', 'Automatically dimming interi... ['ABS', 'Adaptive Cruise Control', 'Adaptive h...
114 9e36a6c7-b78d-45d2-acae-f42fb4d48239 25999.0 BMW 535 535d Touring Sport-Aut. 2016-07-01 128000.0 Car Station wagon Diesel NaN Automatic 230.0 313.0 5.0 5.0 DE 85276 pfaffenhofen 48.529860 11.503230 True False U Saison Fahrzeug 5-10 ,original M-Paket von wer... ['Air suspension', 'Armrest', 'Automatic clima... ['Bluetooth', 'CD player', 'Hands-free equipme... ['Alloy wheels', 'Cargo barrier', 'Electronic ... ['ABS', 'Bi-Xenon headlights', 'Central door l...
115 5add7191-7888-4d00-af1e-f279f9d5fdd7 1800.0 BMW 320 320i touring 2001-01-01 231000.0 Car Station wagon Gasoline NaN Manual 125.0 170.0 5.0 5.0 DE 37351 NaN 51.342832 10.252152 True False U Zum Verkauf Steht ein BMW e46 320i vfl mit dem... ['Air conditioning', 'Armrest', 'Automatic cli... ['Android Auto', 'Bluetooth', 'CD player', 'Di... ['Alloy wheels', 'Catalytic Converter', 'Emerg... ['ABS', 'Central door lock', 'Central door loc...

116 rows × 29 columns



.. GENERATED FROM PYTHON SOURCE LINES 34-36 .. code-block:: Python df.info() .. rst-class:: sphx-glr-script-out .. code-block:: none RangeIndex: 116 entries, 0 to 115 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 116 non-null object 1 price 116 non-null float64 2 make 116 non-null object 3 model 116 non-null object 4 model_version 115 non-null object 5 registration_date 116 non-null object 6 mileage_km_raw 116 non-null float64 7 vehicle_type 116 non-null object 8 body_type 116 non-null object 9 fuel_category 116 non-null object 10 primary_fuel 57 non-null object 11 transmission 114 non-null object 12 power_kw 116 non-null float64 13 power_hp 116 non-null float64 14 nr_seats 112 non-null float64 15 nr_doors 111 non-null float64 16 country_code 116 non-null object 17 zip 116 non-null object 18 city 115 non-null object 19 latitude 116 non-null float64 20 longitude 116 non-null float64 21 is_used 116 non-null bool 22 seller_is_dealer 116 non-null bool 23 offer_type 116 non-null object 24 description 113 non-null object 25 equipment_comfort 116 non-null object 26 equipment_entertainment 116 non-null object 27 equipment_extra 116 non-null object 28 equipment_safety 116 non-null object dtypes: bool(2), float64(8), object(19) memory usage: 24.8+ KB .. GENERATED FROM PYTHON SOURCE LINES 37-39 For this example, we use the following subset of numerical and categorical features in the data. Candidate target_features = ["seller_is_dealer", "price"] .. GENERATED FROM PYTHON SOURCE LINES 39-70 .. code-block:: Python target_name = "price" numerical_features = [ "seller_is_dealer", "mileage_km_raw", "power_kw", "power_hp", # "nr_seats", "latitude", "longitude", ] categorical_features = [ # "id", "make", "model", "body_type", "fuel_category", # "primary_fuel", # "transmission", ] equipment_features = [ "equipment_comfort", "equipment_entertainment", "equipment_extra", "equipment_safety", ] df = df[numerical_features + categorical_features + equipment_features + [target_name]] df[equipment_features] = df[equipment_features].replace(r"\[|\]|'", "", regex=True) df.T .. raw:: html
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
seller_is_dealer True True True True True True True True True True True True True True True True True True True True True True True True True True True False True True True False True True True True True True True False ... True True True False False True True True True True True True True True False True True True True True True True True True False True False True True True False True True False True True False True False False
mileage_km_raw 3500.0 16990.0 62503.0 88017.0 6000.0 15500.0 94600.0 25400.0 7033.0 72500.0 20300.0 160600.0 64308.0 12500.0 62280.0 53500.0 23196.0 87915.0 14000.0 85977.0 85497.0 235300.0 50.0 85593.0 21358.0 64626.0 177253.0 128000.0 114999.0 57600.0 37172.0 89500.0 109304.0 34000.0 150000.0 13819.0 77000.0 84000.0 1.0 196000.0 ... 8000.0 121188.0 55445.0 182300.0 110000.0 157495.0 62588.0 117554.0 93586.0 12850.0 88700.0 15250.0 14500.0 52000.0 199000.0 91000.0 9900.0 2900.0 56330.0 131703.0 9900.0 10000.0 5788.0 103000.0 159000.0 61885.0 123164.0 157526.0 87741.0 162000.0 380541.0 52041.0 67000.0 95000.0 39936.0 214745.0 242100.0 10.0 128000.0 231000.0
power_kw 100.0 346.0 140.0 280.0 160.0 200.0 195.0 283.0 150.0 404.0 220.0 140.0 183.0 110.0 111.0 135.0 127.0 103.0 230.0 100.0 140.0 140.0 588.0 110.0 346.0 110.0 85.0 110.0 120.0 100.0 135.0 293.0 110.0 375.0 120.0 150.0 150.0 100.0 92.0 110.0 ... 145.0 140.0 110.0 190.0 140.0 390.0 272.0 404.0 135.0 353.0 100.0 90.0 430.0 110.0 110.0 190.0 145.0 185.0 180.0 165.0 110.0 180.0 66.0 100.0 125.0 280.0 125.0 293.0 96.0 140.0 120.0 283.0 210.0 206.0 455.0 77.0 125.0 100.0 230.0 125.0
power_hp 136.0 470.0 190.0 381.0 218.0 272.0 265.0 385.0 204.0 549.0 299.0 190.0 249.0 150.0 151.0 184.0 173.0 140.0 313.0 136.0 190.0 190.0 799.0 150.0 470.0 150.0 116.0 150.0 163.0 136.0 184.0 398.0 150.0 510.0 163.0 204.0 204.0 136.0 125.0 150.0 ... 197.0 190.0 150.0 258.0 190.0 530.0 370.0 549.0 184.0 480.0 136.0 122.0 585.0 150.0 150.0 258.0 197.0 252.0 245.0 224.0 150.0 245.0 90.0 136.0 170.0 381.0 170.0 398.0 131.0 190.0 163.0 385.0 286.0 280.0 619.0 105.0 170.0 136.0 313.0 170.0
latitude 48.72759 46.95671 44.42596 48.30262 39.40397 51.78649 50.79976 50.88688 48.3468 51.09111 51.4595 48.50503 52.29444 48.11777 40.30452 49.11204 49.49586 52.24283 40.50052 43.38405 52.29892 52.56014 49.1482 38.34967 52.08303 49.77554 52.19726 45.6602 39.40666 51.6189 52.4318 50.9747 42.13092 45.25556 50.98218 52.20721 45.18686 41.72291 44.9959 47.8944 ... 50.01429 48.27601 41.55724 50.08384 38.0983 52.22616 48.79136 47.04049 51.9964 51.33054 48.20804 52.31028 51.05798 45.07848 49.75478 47.91965 50.94582 49.75162 51.89442 52.72111 41.78959 49.90362 51.11182 50.63921 52.0119 50.97362 40.17597 53.14879 44.70997 50.62295 48.30522 50.94001 43.48807 42.4125 48.76962 52.68561 49.474428 48.49265 48.52986 51.342832
longitude 11.20066 15.40245 7.55643 9.90748 -0.38539 4.64821 6.77983 4.66458 10.90495 11.09989 6.99468 9.21102 9.8299 13.15253 -3.4558 8.4725 0.1479 6.75995 -3.89211 -5.81072 13.27423 13.36774 9.21955 -0.47338 11.58187 6.68476 4.92202 8.79348 -0.38353 7.63174 4.87335 4.895 -0.41563 4.72005 9.79119 8.7816 11.25692 12.60516 7.69085 13.12509 ... 10.21599 14.02283 2.03801 8.78389 13.3488 5.97542 9.77233 15.4663 4.69895 3.29288 13.52729 4.93382 5.21715 7.55672 6.63915 16.21685 6.89468 8.11854 10.1577 8.2684 12.59509 8.85771 4.08961 3.05859 4.36026 11.06165 18.03049 7.05146 8.02503 3.03501 12.38638 4.01635 -5.71842 12.76893 11.97135 4.8309 11.103918 14.50301 11.50323 10.252152
make Mercedes-Benz Porsche BMW Porsche Mercedes-Benz Mercedes-Benz BMW Porsche Mercedes-Benz Porsche Porsche BMW Volvo Mercedes-Benz BMW BMW BMW BMW Mercedes-Benz BMW BMW BMW Mercedes-Benz Volvo Porsche BMW BMW BMW Audi BMW BMW BMW BMW BMW Audi Mercedes-Benz Audi BMW Ford BMW ... Mercedes-Benz BMW Audi BMW BMW BMW Porsche Porsche BMW Porsche BMW Mercedes-Benz Mercedes-Benz BMW BMW BMW Mercedes-Benz Audi Porsche BMW BMW BMW Suzuki BMW BMW Porsche Audi BMW Honda BMW BMW Porsche BMW Alfa Romeo BMW Alfa Romeo BMW Alfa Romeo BMW BMW
model CLA 180 Panamera 320 997 A 250 CLA 250 530 992 GLC 200 Panamera Boxster 420 XC90 CLA 200 X1 120 i3 218 CLE 300 118 320 320 G 63 AMG XC40 Cayenne 318 316 118 Q5 116 420 X5 318 M3 A6 C 300 Q3 118 Focus 418 ... GLC 300 520 Q3 330 X4 M850 991 Cayenne 330 992 118 Citan G 63 AMG X2 318 430 E 220 A5 Macan 325 X2 330 Swift 218 320 Macan A5 X5 HR-V X3 320 911 X5 Giulia iX 147 320 Junior 535 320
body_type Other Compact Station wagon Coupe Sedan Sedan Station wagon Coupe Off-Road/Pick-up Sedan Convertible Sedan Off-Road/Pick-up Station wagon Off-Road/Pick-up Sedan Sedan Sedan Coupe Compact Station wagon Station wagon Off-Road/Pick-up Off-Road/Pick-up Off-Road/Pick-up Station wagon Compact Sedan Off-Road/Pick-up Sedan Compact Off-Road/Pick-up Station wagon Sedan Station wagon Station wagon Off-Road/Pick-up Sedan Sedan Coupe ... Off-Road/Pick-up Station wagon Off-Road/Pick-up Sedan Off-Road/Pick-up Sedan Convertible Off-Road/Pick-up Station wagon Convertible Compact Transporter Off-Road/Pick-up Off-Road/Pick-up Sedan Coupe Sedan Sedan Off-Road/Pick-up Station wagon Off-Road/Pick-up Station wagon Compact Station wagon Coupe Off-Road/Pick-up Sedan Off-Road/Pick-up Off-Road/Pick-up Off-Road/Pick-up Other Coupe Off-Road/Pick-up Sedan Off-Road/Pick-up Compact Sedan Off-Road/Pick-up Station wagon Station wagon
fuel_category Gasoline Electric/Gasoline Electric/Diesel Gasoline Electric/Gasoline Electric Diesel Gasoline Gasoline Gasoline Gasoline Diesel Diesel Diesel Diesel Gasoline Electric Gasoline Electric/Gasoline Gasoline Diesel Diesel Gasoline Diesel Electric/Gasoline Diesel Gasoline Diesel Diesel Gasoline Gasoline Electric/Gasoline Diesel Gasoline Diesel Electric/Gasoline Diesel Gasoline Electric/Gasoline Diesel ... Electric/Diesel Diesel Gasoline Diesel Diesel Gasoline Gasoline Electric/Gasoline Electric/Gasoline Gasoline Gasoline Electric Gasoline Diesel Diesel Diesel Diesel Electric/Gasoline Gasoline Diesel Diesel Gasoline Gasoline Gasoline Gasoline Gasoline CNG Electric/Gasoline Gasoline Diesel Diesel Gasoline Diesel Gasoline Electric Gasoline Gasoline Gasoline Diesel Gasoline
equipment_comfort 360° camera, Armrest, Automatic climate contro... 360° camera, Air conditioning, Air suspension,... Air conditioning, Armrest, Automatic climate c... Automatic climate control, Cruise control Air conditioning, Automatic climate control, C... Air conditioning, Air suspension, Armrest, Aut... Air conditioning, Armrest, Automatic climate c... 360° camera, Air conditioning, Automatic clima... 360° camera, Air conditioning, Air suspension,... Air conditioning, Automatic climate control, 2... Air conditioning, Armrest, Automatic climate c... Air conditioning, Armrest, Automatic climate c... Automatic climate control, Electric tailgate, ... Armrest, Automatic climate control, Cruise con... Air conditioning, Automatic climate control, 2... Air conditioning, Armrest, Automatic climate c... Air conditioning, Armrest, Automatic climate c... 360° camera, Air suspension, Armrest, Automati... Automatic climate control, Electrical side mir... 360° camera, Air conditioning, Air suspension,... Armrest, Automatic climate control, Cruise con... Air conditioning, Cruise control, Electrical s... Armrest, Automatic climate control, Cruise con... Automatic climate control, Cruise control Air conditioning, Armrest, Automatic climate c... Air conditioning, Automatic climate control, 2... Air suspension, Armrest, Automatic climate con... 360° camera, Armrest, Automatic climate contro... Air conditioning, Armrest, Automatic climate c... Armrest, Automatic climate control, Cruise con... Air conditioning, Armrest, Automatic climate c... Air conditioning, Armrest, Automatic climate c... Automatic climate control, 2 zones, Cruise con... Armrest, Automatic climate control, Cruise con... ... 360° camera, Automatic climate control, 4 zone... Air suspension, Armrest, Automatic climate con... Air conditioning, Automatic climate control, M... Armrest, Automatic climate control, Electrical... Armrest, Cruise control, Electric tailgate, El... Air conditioning, Automatic climate control, C... Air conditioning, Armrest, Automatic climate c... 360° camera, Air suspension, Cruise control, H... 360° camera, Air conditioning, Automatic clima... 360° camera, Air conditioning, Armrest, Automa... Air conditioning, Automatic climate control, C... Air conditioning, Automatic climate control, A... 360° camera, Air conditioning, Armrest, Automa... Air conditioning, Armrest, Automatic climate c... 360° camera, Armrest, Automatic climate contro... Armrest, Automatic climate control, Cruise con... 360° camera, Air conditioning, Automatic clima... 360° camera, Armrest, Automatic climate contro... Air conditioning, Armrest, Automatic climate c... Air conditioning, Armrest, Automatic climate c... Armrest, Automatic climate control, 3 zones, C... 360° camera, Air conditioning, Armrest, Automa... Air conditioning, Cruise control, Electrical s... Air conditioning, Armrest, Automatic climate c... Air conditioning, Cruise control, Leather seat... Air conditioning, Automatic climate control, 3... Armrest, Automatic climate control, Cruise con... 360° camera, Air conditioning, Air suspension,... Armrest, Automatic climate control, Cruise con... Air conditioning, Armrest, Automatic climate c... Air conditioning, Armrest, Automatic climate c... 360° camera, Air conditioning, Air suspension,... Air conditioning, Electrical side mirrors, Lum... Air conditioning, Armrest, Electrical side mir... Air conditioning, Armrest, Automatic climate c... Air suspension, Armrest, Automatic climate con... Air conditioning, Armrest, Automatic climate c...
equipment_entertainment Android Auto, Apple CarPlay, Bluetooth, Digita... Android Auto, Apple CarPlay, Bluetooth, Digita... CD player, Hands-free equipment, On-board comp... Bluetooth Android Auto, Apple CarPlay, Bluetooth, Digita... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, On-board computer Android Auto, Apple CarPlay, Digital cockpit, ... Android Auto, Apple CarPlay, Bluetooth, Digita... Apple CarPlay, Bluetooth, CD player, Digital r... Bluetooth, CD player, Hands-free equipment, On... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, Digital cockpit, Hands-free equipme... Bluetooth, CD player, Hands-free equipment, On... Digital radio Android Auto, Apple CarPlay, Bluetooth, Digita... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, CD player, Digital radio, Hands-fre... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, USB Android Auto, Apple CarPlay, Bluetooth, Digita... Android Auto, Apple CarPlay, Bluetooth, Digita... CD player, On-board computer, Radio Bluetooth, Digital radio, Hands-free equipment... Bluetooth, USB Bluetooth, CD player, On-board computer, Radio Android Auto, Apple CarPlay, Bluetooth, Digita... Apple CarPlay, Bluetooth, Digital cockpit, Han... Apple CarPlay, Bluetooth, CD player, Induction... Bluetooth, CD player, Hands-free equipment, In... Android Auto, Apple CarPlay, Bluetooth, Digita... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, Digital radio, Hands-free equipment... On-board computer, USB CD player, Hands-free equipment, On-board comp... ... Android Auto, Apple CarPlay, Digital cockpit, ... Bluetooth, Digital cockpit, Digital radio, Han... Bluetooth Bluetooth, CD player, Hands-free equipment, MP... Bluetooth, Hands-free equipment, MP3, On-board... Android Auto, Apple CarPlay, Bluetooth, Digita... Apple CarPlay, Bluetooth, CD player, Digital r... Android Auto, Apple CarPlay, Bluetooth, Digita... Android Auto, Apple CarPlay, Bluetooth, Digita... On-board computer Digital radio, Hands-free equipment, Radio Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, Digital cockpit, Digital radio, Han... Bluetooth, Hands-free equipment, MP3, On-board... Bluetooth, Hands-free equipment, MP3, On-board... Android Auto, Apple CarPlay, Digital cockpit, ... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, Digital cockpit, Digital radio, Han... Apple CarPlay, Bluetooth, CD player, Digital r... Android Auto, Apple CarPlay, Bluetooth, CD pla... Android Auto, Apple CarPlay, Bluetooth, Digita... Radio, Sound system, USB Apple CarPlay, Bluetooth, Digital radio, Integ... Android Auto, Apple CarPlay, Bluetooth, Digita... CD player, Digital radio, On-board computer, R... Bluetooth, Hands-free equipment, MP3, On-board... Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, CD player, MP3, On-board computer, ... Android Auto, Apple CarPlay, Bluetooth, CD pla... Apple CarPlay, Bluetooth, Digital radio, On-bo... Android Auto, Apple CarPlay, Bluetooth, Digita... CD player, Radio On-board computer Android Auto, Apple CarPlay, Bluetooth, Digita... Bluetooth, CD player, Hands-free equipment, MP... Android Auto, Bluetooth, CD player, Digital ra...
equipment_extra Alloy wheels, Ambient lighting, Automatically ... Alloy wheels (21"), Ambient lighting, Automati... Alloy wheels, Automatically dimming interior m... Alloy wheels Alloy wheels (19"), Ambient lighting, Automati... Alloy wheels (18"), Ambient lighting, Automati... Shift paddles, Touch screen Ambient lighting, Automatically dimming interi... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels, Automatically dimming interior m... All season tyres, Alloy wheels, Ambient lighti... Alloy wheels, Automatically dimming interior m... Alloy wheels, Automatically dimming interior m... All season tyres, Alloy wheels, Ambient lighti... Alloy wheels Alloy wheels (16"), Voice Control Alloy wheels, Automatically dimming interior m... All season tyres, Alloy wheels, Trailer hitch All season tyres, Alloy wheels (23"), Ambient ... Alloy wheels Alloy wheels, Ambient lighting, Automatically ... Alloy wheels, Automatically dimming interior m... Alloy wheels (17"), Sport seats, Sport suspension Alloy wheels, Ambient lighting, Sport seats Alloy wheels, Emergency tyre repair kit, Sport... Alloy wheels (17"), Automatically dimming inte... Alloy wheels, Cargo barrier, Electronic parkin... Alloy wheels, Sport seats, Sport suspension, V... Alloy wheels, Automatically dimming interior m... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels (19"), Ambient lighting, Automati... Alloy wheels, Automatically dimming interior m... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels, Automatically dimming interior m... ... Alloy wheels (20"), Ambient lighting, Automati... Alloy wheels (18"), Ambient lighting, Automati... Alloy wheels Alloy wheels, Particle filter, Sport seats Alloy wheels, Ambient lighting, Automatically ... Alloy wheels (20"), Automatically dimming inte... Alloy wheels, Automatically dimming interior m... Alloy wheels, Roof rack, Ski bag, Spoiler, Spo... Alloy wheels (19"), Ambient lighting, Automati... Alloy wheels (21"), Ambient lighting, Automati... Alloy wheels Alloy wheels (16") Alloy wheels, Ambient lighting, Automatically ... Alloy wheels (19"), Ambient lighting, Automati... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels (20"), Ambient lighting, Automati... Alloy wheels, Ambient lighting, Automatically ... All season tyres, Alloy wheels, Automatically ... Alloy wheels, Roof rack, Shift paddles, Sport ... Alloy wheels (19"), Ambient lighting, Shift pa... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels, Automatically dimming interior m... Touch screen Alloy wheels, Automatically dimming interior m... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels (21"), Automatically dimming inte... Alloy wheels, Cargo barrier, Headlight washer ... Automatically dimming interior mirror, Ski bag... Alloy wheels (20"), Automatically dimming inte... Alloy wheels, Ambient lighting, Automatically ... Alloy wheels, Emergency tyre Alloy wheels, Automatically dimming interior m... Alloy wheels, Cargo barrier, Electronic parkin... Alloy wheels, Catalytic Converter, Emergency t...
equipment_safety ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Alarm system, Central door lock, Daytime ... ABS, Side airbag ABS, Adaptive Cruise Control, Alarm system, Bl... ABS, Adaptive headlights, Alarm system, Blind ... ABS, Central door lock, Central door lock with... Adaptive Cruise Control, Blind spot monitor, E... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Adaptive headlights, Bi-Xenon headlights,... ABS, Adaptive headlights, Bi-Xenon headlights,... ABS, Adaptive Cruise Control, Alarm system, Bi... Adaptive Cruise Control, Blind spot monitor, C... ABS, Central door lock, Driver-side airbag, El... Emergency system, Passenger-side airbag ABS, Alarm system, Central door lock, Central ... ABS, Alarm system, Central door lock, Daytime ... ABS, Alarm system, Central door lock, Daytime ... ABS, Adaptive Cruise Control, Alarm system, Bl... ABS, Central door lock, Driver-side airbag, El... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Central door lock, Driver-side airbag, El... ABS, Alarm system, Central door lock, Central ... Adaptive headlights, Alarm system, Central doo... Isofix ABS, Central door lock, Daytime running lights... ABS, Alarm system, Central door lock, Central ... ABS, Alarm system, Central door lock with remo... ABS, Adaptive headlights, Alarm system, Centra... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Blind spot monitor, Central door lock wit... ABS, Adaptive Cruise Control, Alarm system, Ce... ABS, Adaptive headlights, Central door lock, C... Central door lock, Central door lock with remo... ABS, Central door lock, Driver-side airbag, El... ... ABS, Adaptive Cruise Control, Blind spot monit... ABS, Adaptive headlights, Alarm system, Centra... Isofix ABS, Central door lock with remote control, Dr... ABS, Adaptive headlights, Alarm system, Bi-Xen... ABS, Adaptive headlights, Alarm system, Blind ... ABS, Adaptive headlights, Alarm system, Centra... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Central door lock, Driver-side airbag, Fo... ABS, Alarm system, Driver drowsiness detection... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Central door lock, Central door lock with... ABS, Alarm system, Blind spot monitor, Central... ABS, Central door lock, Daytime running lights... Adaptive Cruise Control, Blind spot monitor, D... ABS, Adaptive Cruise Control, Blind spot monit... ABS, Adaptive headlights, Alarm system, Centra... ABS, Central door lock, Daytime running lights... ABS, Adaptive headlights, Alarm system, Bi-Xen... ABS, Adaptive Cruise Control, Alarm system, Ce... ABS, Daytime running lights, Driver-side airba... ABS, Central door lock with remote control, Da... ABS, Alarm system, Central door lock with remo... ABS, Adaptive Cruise Control, Adaptive headlig... Alarm system, Central door lock, Central door ... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Central door lock with remote control, Dr... ABS, Alarm system, Central door lock, Driver-s... ABS, Alarm system, Central door lock, Central ... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Alarm system, Central door lock, Central ... ABS, Alarm system, Central door lock with remo... ABS, Adaptive Cruise Control, Adaptive headlig... ABS, Bi-Xenon headlights, Central door lock wi... ABS, Central door lock, Central door lock with...
price 36900.0 147990.0 26900.0 69900.0 42264.0 58945.0 33849.0 110995.0 51890.0 79971.0 65500.0 17490.0 39550.0 40900.0 30890.0 19455.0 18900.0 23900.0 56900.0 16990.0 21440.0 9950.0 505750.0 23142.0 92890.0 27910.0 3240.0 16490.0 31390.0 12980.0 41950.0 57500.0 18590.0 107990.0 20750.0 47570.0 35990.0 18490.0 19900.0 21300.0 ... 66800.0 25890.0 35273.0 14900.0 20000.0 64900.0 108880.0 89950.0 33945.0 169911.0 19900.0 38599.0 222999.0 33990.0 14400.0 24900.0 61990.0 70990.0 53900.0 18490.0 43890.0 48500.0 16750.0 17900.0 7000.0 69890.0 26500.0 52990.0 15280.0 15490.0 3500.0 108799.0 61900.0 16300.0 67890.0 1495.0 2500.0 34350.0 25999.0 1800.0

15 rows × 116 columns



.. GENERATED FROM PYTHON SOURCE LINES 71-77 .. code-block:: Python X = df[numerical_features + categorical_features + equipment_features] y = df[target_name] X.shape, y.shape, y.hist() .. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_dummy_code_encoder_001.png :alt: plot dummy code encoder :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_dummy_code_encoder_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none ((116, 14), (116,), ) .. GENERATED FROM PYTHON SOURCE LINES 78-84 Training and Evaluating Pipelines with Different Encoders ========================================================= In this section, we will evaluate pipelines with :class:`~sklearn.ensemble.HistGradientBoostingRegressor` with different encoding strategies. First, we list out the encoders we will be using to preprocess the categorical features: .. GENERATED FROM PYTHON SOURCE LINES 84-112 .. code-block:: Python import re from sklearn.compose import ColumnTransformer # To use the experimental IterativeImputer, we need to explicitly ask for it: from sklearn.experimental import enable_iterative_imputer # noqa: F401 from sklearn.impute import IterativeImputer, KNNImputer, SimpleImputer from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, TargetEncoder from scikitplot.preprocessing import DummyCodeEncoder categorical_preprocessors = [ ("drop", "drop"), ( "ordinal", OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1), ), ( "one_hot", OneHotEncoder(handle_unknown="ignore", sparse_output=False), ), ( "target", TargetEncoder(target_type="continuous"), ), ( "dummy_code", DummyCodeEncoder(sep=lambda s: re.split(r'\s*[,;|/]\s*', s.lower()), sparse_output=False), ), ] .. GENERATED FROM PYTHON SOURCE LINES 113-114 Next, we evaluate the models using cross validation and record the results: .. GENERATED FROM PYTHON SOURCE LINES 114-160 .. code-block:: Python from sklearn.ensemble import HistGradientBoostingRegressor from sklearn.model_selection import cross_validate from sklearn.pipeline import make_pipeline n_cv_folds = 5 max_iter = 100 results = [] def evaluate_model_and_store(name, pipe): result = cross_validate( pipe, X, y, scoring="neg_root_mean_squared_error", cv=n_cv_folds, return_train_score=True, ) rmse_test_score = -result["test_score"] rmse_train_score = -result["train_score"] results.append( { "preprocessor": name, "rmse_test_mean": rmse_test_score.mean(), "rmse_test_std": rmse_train_score.std(), "rmse_train_mean": rmse_train_score.mean(), "rmse_train_std": rmse_train_score.std(), } ) for name, categorical_preprocessor in categorical_preprocessors: preprocessor = ColumnTransformer( [ ("numerical", "passthrough", numerical_features), ("categorical", categorical_preprocessor, categorical_features + equipment_features), ], verbose_feature_names_out = False, )#.set_output(transform="pandas") pipe = make_pipeline( preprocessor, HistGradientBoostingRegressor(random_state=0, max_iter=max_iter) )#.set_output(transform="pandas") # display(pipe) evaluate_model_and_store(name, pipe) .. GENERATED FROM PYTHON SOURCE LINES 161-167 Native Categorical Feature Support ================================== In this section, we build and evaluate a pipeline that uses native categorical feature support in :class:`~sklearn.ensemble.HistGradientBoostingRegressor`, which only supports up to 255 unique categories. In our dataset, the most of the categorical features have more than 255 unique categories: .. GENERATED FROM PYTHON SOURCE LINES 167-170 .. code-block:: Python n_unique_categories = df[categorical_features + equipment_features].nunique().sort_values(ascending=False) n_unique_categories .. rst-class:: sphx-glr-script-out .. code-block:: none equipment_safety 103 equipment_comfort 103 equipment_extra 94 equipment_entertainment 84 model 70 make 9 body_type 8 fuel_category 6 dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 171-175 To workaround the limitation above, we group the categorical features into low cardinality and high cardinality features. The high cardinality features will be target encoded and the low cardinality features will use the native categorical feature in gradient boosting. .. GENERATED FROM PYTHON SOURCE LINES 175-205 .. code-block:: Python high_cardinality_features = n_unique_categories[n_unique_categories > 25].index low_cardinality_features = n_unique_categories[n_unique_categories <= 25].index mixed_encoded_preprocessor = ColumnTransformer( [ ("numerical", "passthrough", numerical_features), ( "high_cardinality", TargetEncoder(target_type="continuous"), high_cardinality_features, ), ( "low_cardinality", OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1), low_cardinality_features, ), ], verbose_feature_names_out=False, ) # The output of the of the preprocessor must be set to pandas so the # gradient boosting model can detect the low cardinality features. mixed_encoded_preprocessor.set_output(transform="pandas") mixed_pipe = make_pipeline( mixed_encoded_preprocessor, HistGradientBoostingRegressor( random_state=0, max_iter=max_iter, categorical_features=low_cardinality_features ), ) mixed_pipe .. raw:: html
Pipeline(steps=[('columntransformer',
                     ColumnTransformer(transformers=[('numerical', 'passthrough',
                                                      ['seller_is_dealer',
                                                       'mileage_km_raw', 'power_kw',
                                                       'power_hp', 'latitude',
                                                       'longitude']),
                                                     ('high_cardinality',
                                                      TargetEncoder(target_type='continuous'),
                                                      Index(['equipment_safety', 'equipment_comfort', 'equipment_extra',
           'equipment_entertainment', 'model'],
          dtype='object')),
                                                     ('low_cardinality',
                                                      OrdinalEncoder(handle_unknown='use_encoded_value',
                                                                     unknown_value=-1),
                                                      Index(['make', 'body_type', 'fuel_category'], dtype='object'))],
                                       verbose_feature_names_out=False)),
                    ('histgradientboostingregressor',
                     HistGradientBoostingRegressor(categorical_features=Index(['make', 'body_type', 'fuel_category'], dtype='object'),
                                                   random_state=0))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 206-207 Finally, we evaluate the pipeline using cross validation and record the results: .. GENERATED FROM PYTHON SOURCE LINES 207-209 .. code-block:: Python evaluate_model_and_store("mixed_target", mixed_pipe) .. GENERATED FROM PYTHON SOURCE LINES 210-244 .. code-block:: Python mixed_encoded_preprocessor = ColumnTransformer( [ ("numerical", "passthrough", numerical_features), ( "high_cardinality", TargetEncoder(target_type="continuous"), list(set(high_cardinality_features) - set(equipment_features)), ), ( "low_cardinality", OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=-1), low_cardinality_features, ), ( "equipment", DummyCodeEncoder(sep=lambda s: re.split(r'\s*[,;|/]\s*', s.lower()), sparse_output=False), equipment_features, ), ], verbose_feature_names_out=False, ) # The output of the of the preprocessor must be set to pandas so the # gradient boosting model can detect the low cardinality features. mixed_encoded_preprocessor.set_output(transform="pandas") mixed_pipe = make_pipeline( mixed_encoded_preprocessor, HistGradientBoostingRegressor( random_state=0, max_iter=max_iter, ), ) mixed_pipe .. raw:: html
Pipeline(steps=[('columntransformer',
                     ColumnTransformer(transformers=[('numerical', 'passthrough',
                                                      ['seller_is_dealer',
                                                       'mileage_km_raw', 'power_kw',
                                                       'power_hp', 'latitude',
                                                       'longitude']),
                                                     ('high_cardinality',
                                                      TargetEncoder(target_type='continuous'),
                                                      ['model']),
                                                     ('low_cardinality',
                                                      OrdinalEncoder(handle_unknown='use_encoded_value',
                                                                     unknown_value=-1),
                                                      Index(['make', 'body_type', 'fuel_category'], dtype='object')),
                                                     ('equipment',
                                                      DummyCodeEncoder(sep=<function <lambda> at 0x7f5ea35d2e80>,
                                                                       sparse_output=False),
                                                      ['equipment_comfort',
                                                       'equipment_entertainment',
                                                       'equipment_extra',
                                                       'equipment_safety'])],
                                       verbose_feature_names_out=False)),
                    ('histgradientboostingregressor',
                     HistGradientBoostingRegressor(random_state=0))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 245-246 Finally, we evaluate the pipeline using cross validation and record the results: .. GENERATED FROM PYTHON SOURCE LINES 246-248 .. code-block:: Python evaluate_model_and_store("mixed_dummy", mixed_pipe) .. GENERATED FROM PYTHON SOURCE LINES 249-252 Plotting the Results ==================== In this section, we display the results by plotting the test and train scores: .. GENERATED FROM PYTHON SOURCE LINES 252-293 .. code-block:: Python import matplotlib.pyplot as plt import pandas as pd results_df = ( pd.DataFrame(results).set_index("preprocessor").sort_values("rmse_test_mean") ) fig, (ax1, ax2) = plt.subplots( 1, 2, figsize=(12, 8), sharey=True, constrained_layout=True ) xticks = range(len(results_df)) name_to_color = dict( zip((r["preprocessor"] for r in results), ["C0", "C1", "C2", "C3", "C4", "C5", "C6"]) ) for subset, ax in zip(["test", "train"], [ax1, ax2]): mean, std = f"rmse_{subset}_mean", f"rmse_{subset}_std" data = results_df[[mean, std]].sort_values(mean) ax.bar( x=xticks, height=data[mean], yerr=data[std], width=0.9, color=[name_to_color[name] for name in data.index], ) ax.set( title=f"RMSE ({subset.title()})", xlabel="Encoding Scheme", xticks=xticks, xticklabels=data.index, ) # plt.xticks(rotation=9, ha='right') # ax.set_xticks(ax.get_xticks(), ax.get_xticklabels(), rotation=9, ha='right') ax.tick_params(axis='x', labelrotation=9) # iterate through every other container; the even containers are ErrorbarContainer # The BarContainer objects are at the odd indices, which can be extracted with ax.containers[1::2] # The BarContainer objects are at the even indices, which can be extracted with ax.containers[0::2] for c in ax.containers[1::2]: # add the annotation ax.bar_label(c, label_type='center') .. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_dummy_code_encoder_002.png :alt: RMSE (Test), RMSE (Train) :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_dummy_code_encoder_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 294-318 When evaluating the predictive performance on the test set, dropping the categories perform the worst and the target encoders performs the best. This can be explained as follows: - Dropping the categorical features makes the pipeline less expressive and underfitting as a result; - Due to the high cardinality and to reduce the training time, the one-hot encoding scheme uses `max_categories=20` which prevents the features from expanding too much, which can result in underfitting. - If we had not set `max_categories=20`, the one-hot encoding scheme would have likely made the pipeline overfitting as the number of features explodes with rare category occurrences that are correlated with the target by chance (on the training set only); - The ordinal encoding imposes an arbitrary order to the features which are then treated as numerical values by the :class:`~sklearn.ensemble.HistGradientBoostingRegressor`. Since this model groups numerical features in 256 bins per feature, many unrelated categories can be grouped together and as a result overall pipeline can underfit; - When using the target encoder, the same binning happens, but since the encoded values are statistically ordered by marginal association with the target variable, the binning use by the :class:`~sklearn.ensemble.HistGradientBoostingRegressor` makes sense and leads to good results: the combination of smoothed target encoding and binning works as a good regularizing strategy against overfitting while not limiting the expressiveness of the pipeline too much. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.148 seconds) .. _sphx_glr_download_auto_examples_preprocessing_plot_dummy_code_encoder.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-plots/scikit-plots/main?urlpath=lab/tree/notebooks/auto_examples/preprocessing/plot_dummy_code_encoder.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../../lite/lab/index.html?path=auto_examples/preprocessing/plot_dummy_code_encoder.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dummy_code_encoder.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dummy_code_encoder.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_dummy_code_encoder.zip ` .. include:: plot_dummy_code_encoder.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_