Classification data to predict the fate of passengers on the ocean liner "Titanic".
Contains 10 features and 1309 observations. Target column is "Survived"
.
Pre-processing
All column names have been changed to
snake_case
.training and test set have been joined. Observations of the test set have a missing value in the target column
"survived"
.Column '"survived"' has been re-encoded to a factor with levels '"yes"' and '"no"'.
Id column has been removed.
Passenger class
"pclass"
has been converted to an ordered factor.Features
"sex"
and"embarked"
have been converted to factors.Empty strings in
"cabin"
and"embarked"
have been encoded as missing values.
Examples
data("titanic", package = "mlr3data")
str(titanic)
#> 'data.frame': 1309 obs. of 11 variables:
#> $ survived: Factor w/ 2 levels "yes","no": 2 1 1 1 2 2 2 2 1 1 ...
#> $ pclass : Ord.factor w/ 3 levels "1"<"2"<"3": 3 1 3 1 3 3 1 3 3 2 ...
#> $ name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
#> $ sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
#> $ age : num 22 38 26 35 35 NA 54 2 27 14 ...
#> $ sib_sp : int 1 1 0 1 0 0 0 3 0 1 ...
#> $ parch : int 0 0 0 0 0 0 0 1 2 0 ...
#> $ ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
#> $ fare : num 7.25 71.28 7.92 53.1 8.05 ...
#> $ cabin : chr NA "C85" NA "C123" ...
#> $ embarked: Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...