Regression task to predict house sale prices for King County, including Seattle, between May 2014 and May 2015.

Contains 19 features and 21613 observations. Target column is "price".

Source

https://www.kaggle.com/harlfoxem/housesalesprediction

Pre-processing

  • Id column has been removed.

  • Dates in column "date" have been converted from strings to POSIXct.

  • Values 0 in feature "yr_renovated" have been replaced with NA.

  • Values 0 in feature "sqft_basement" have been replaced with NA.

  • Feature "waterfront" has been converted to logical.

Examples

data("kc_housing", package = "mlr3data") str(kc_housing)
#> 'data.frame': 21613 obs. of 20 variables: #> $ date : POSIXct, format: "2014-10-13" "2014-12-09" ... #> $ price : num 221900 538000 180000 604000 510000 ... #> $ bedrooms : int 3 3 2 4 3 4 3 3 3 3 ... #> $ bathrooms : num 1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ... #> $ sqft_living : int 1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ... #> $ sqft_lot : int 5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ... #> $ floors : num 1 2 1 1 1 1 2 1 1 2 ... #> $ waterfront : logi FALSE FALSE FALSE FALSE FALSE FALSE ... #> $ view : int 0 0 0 0 0 0 0 0 0 0 ... #> $ condition : int 3 3 3 5 3 3 3 3 3 3 ... #> $ grade : int 7 7 6 7 8 11 7 7 7 7 ... #> $ sqft_above : int 1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ... #> $ sqft_basement: int NA 400 NA 910 NA 1530 NA NA 730 NA ... #> $ yr_built : int 1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ... #> $ yr_renovated : int NA 1991 NA NA NA NA NA NA NA NA ... #> $ zipcode : int 98178 98125 98028 98136 98074 98053 98003 98198 98146 98038 ... #> $ lat : num 47.5 47.7 47.7 47.5 47.6 ... #> $ long : num -122 -122 -122 -122 -122 ... #> $ sqft_living15: int 1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ... #> $ sqft_lot15 : int 5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ... #> - attr(*, "index")= int(0)