Skip to contents

Regression task to predict house sale prices for King County, including Seattle, between May 2014 and May 2015.

Contains 19 features and 21613 observations. Target column is "price".

Pre-processing

  • Id column has been removed.

  • Dates in column "date" have been converted from strings to POSIXct.

  • Values 0 in feature "yr_renovated" have been replaced with NA.

  • Values 0 in feature "sqft_basement" have been replaced with NA.

  • Feature "waterfront" has been converted to logical.

Examples

data("kc_housing", package = "mlr3data")
str(kc_housing)
#> 'data.frame':	21613 obs. of  20 variables:
#>  $ date         : POSIXct, format: "2014-10-13" "2014-12-09" ...
#>  $ price        : num  221900 538000 180000 604000 510000 ...
#>  $ bedrooms     : int  3 3 2 4 3 4 3 3 3 3 ...
#>  $ bathrooms    : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
#>  $ sqft_living  : int  1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
#>  $ sqft_lot     : int  5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
#>  $ floors       : num  1 2 1 1 1 1 2 1 1 2 ...
#>  $ waterfront   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
#>  $ view         : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ condition    : int  3 3 3 5 3 3 3 3 3 3 ...
#>  $ grade        : int  7 7 6 7 8 11 7 7 7 7 ...
#>  $ sqft_above   : int  1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
#>  $ sqft_basement: int  NA 400 NA 910 NA 1530 NA NA 730 NA ...
#>  $ yr_built     : int  1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
#>  $ yr_renovated : int  NA 1991 NA NA NA NA NA NA NA NA ...
#>  $ zipcode      : int  98178 98125 98028 98136 98074 98053 98003 98198 98146 98038 ...
#>  $ lat          : num  47.5 47.7 47.7 47.5 47.6 ...
#>  $ long         : num  -122 -122 -122 -122 -122 ...
#>  $ sqft_living15: int  1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
#>  $ sqft_lot15   : int  5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...
#>  - attr(*, "index")= int(0)