Input Formats¶
Vowpal Wabbit text
This is the text format that Vowpal Wabbit uses. See the docs on the VW wiki to learn how to use it.
DsJson
This is the DsJson format that is offered by Vowpal Wabbit. This is provided for compatibility.
Json format¶
This is a newline delimited (ndjson) format. Each line of input is one example to be provided to the reduction stack. This means each complete json input object json needs to be flattened to a single line.
The exact format to be used depends on the type of input being provided (features and label), which is determined based on the reductions being used.
In each format the label property is optional. If not provided, the input can only be used for predictions and not training.
Feature format¶
Each format uses the same feature definition structure. It is an object where the keys are namespaces and the values are the contents of each namespace.
Namespace contents can be one of the following:
List of floats
List of strings
Object where the keys are feature names and the values are one of the following:
Float
String
Bool
Example¶
{
"my_namespace": {
"feature1": 1.0,
"feature2": "value",
"feature3": true
},
"my_namespace2": [1.0, 2.0, 3.0],
"my_namespace3": ["value1", "value2", "value3"]
}
Note: the namespace name :default can be used to correspond to the Default namespace.
Variants¶
Currently there are two variants which share common structures.
Simple¶
Simple input is for regression scenarios.
This format produces a SimpleLabel and SparseFeatures.
{
"label": {
"value": "float",
"weight": "float"
},
"features": "<feature format>"
}
If the
weightproperty is not provided, it is assumed to be 1.0.labelis optional. If not provided, the input can only be used for predictions and not training.
CB¶
CB input is for contextual bandit scenarios.
This format produces a CBLabel and CBAdfFeatures.
{
"label": {
"action": "integer",
"cost": "float",
"probability": "float"
},
"shared": "<feature format>",
"actions": [
"<feature format>",
"..."
]
}
actionin the label is 0 indexed.labelis optional. If not provided, the input can only be used for predictions and not training.sharedis optional. If not provided, there are no shared features to be used.