Input Formats#
Vowpal Wabbit text
This is the text format that Vowpal Wabbit uses. See the docs on the VW wiki to learn how to use it.
DsJson
This is the DsJson format that is offered by Vowpal Wabbit. This is provided for compatibility.
Json format#
This is a newline delimited (ndjson
) format. Each line of input is one example to be provided to the reduction stack. This means each complete json input object json needs to be flattened to a single line.
The exact format to be used depends on the type of input being provided (features and label), which is determined based on the reductions being used.
In each format the label
property is optional. If not provided, the input can only be used for predictions and not training.
Feature format#
Each format uses the same feature definition structure. It is an object where the keys are namespaces and the values are the contents of each namespace.
Namespace contents can be one of the following:
List of floats
List of strings
Object where the keys are feature names and the values are one of the following:
Float
String
Bool
Example#
{
"my_namespace": {
"feature1": 1.0,
"feature2": "value",
"feature3": true
},
"my_namespace2": [1.0, 2.0, 3.0],
"my_namespace3": ["value1", "value2", "value3"]
}
Note: the namespace name :default
can be used to correspond to the Default namespace.
Variants#
Currently there are two variants which share common structures.
Simple#
Simple input is for regression scenarios.
This format produces a SimpleLabel
and SparseFeatures
.
{
"label": {
"value": "float",
"weight": "float"
},
"features": "<feature format>"
}
If the
weight
property is not provided, it is assumed to be 1.0.label
is optional. If not provided, the input can only be used for predictions and not training.
CB#
CB input is for contextual bandit scenarios.
This format produces a CBLabel
and CBAdfFeatures
.
{
"label": {
"action": "integer",
"cost": "float",
"probability": "float"
},
"shared": "<feature format>",
"actions": [
"<feature format>",
"..."
]
}
action
in the label is 0 indexed.label
is optional. If not provided, the input can only be used for predictions and not training.shared
is optional. If not provided, there are no shared features to be used.