Data2Vis - Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks
Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks 을 읽고 정리한 글입니다.
Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks
Abstract
- end-to-end trainable neural translation model
- formulate visualization generation as a language translation problem, where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite).
- Vege-Lite -> JSON format
- multilayered ateention-based encoder-decoder network with LSTM
- introduce 2 metrics - language syntax validity, visualization grammar syntax validity
Related Work
Declarative Visualization Specification
One of our aims with Data2Vis is to bridge this gap between the speed and expressivity in specifying visualizations.
Automated Visulaization
We pose visualization specifica- tion as a machine translation problem and intro- duce Data2Vis, a deep neural translation model trained to automatically translate data specifica- tions to visualization specifications. Data2Vis emphasizes the creation of visualizations using rules learned from examples, without resorting to a predefined enumeration or extraction of con- straints, rules, heuristics, and features.
- Machine Translation Problem
DNNs for Machine Translation
Data2Vis is also a sequence- to-sequence model using the textual source and target specifications directly for translation, with- out relying on explicit syntax representations.
Model
- the data visualization problem as a Seq2Seq translation problem
1 2
input : dataset (fields, values in JSON format) output : valid Vega-Lite visualization specification
- encoder-decoder archi.
where the encoder reads and encodes a source sequence into a fixed length vector, and a decoder outputs a translation based on this vec- tor.
- Attention Mechanism
Atten- tion mechanisms allow a model to focus on aspects of an input sequence while generating out- put tokens.
- Beam Search algorithm
The beam search algorithm used in sequence-to-sequence neural translation models keeps track of k most probable output tokens at each step of decoding, where k is known as the beamwidth. This enables the generation of k most likely output sequences for a given input sequence.
THREE techniques : bidirectional encoding, differential weighing of context via an attention mechanism, and beam search
- character-based sequence model
Data and Preprocessing
- the model must select a subset of fields to focus on when creating visual- izations (most datasets have multiple fields that cannot all be simultaneously visualized)
- the model must learn differences in data types across the data fields (numeric, string, temporal, ordinal, categorical, etc.), which in turn guides how each field is specified in the generation of a visualiza- tion specification.
- the model must learn the appropriate transformations to apply to a field given its data type (e.g., aggregate transform does not apply to string fields).
- view-level transforms : aggregate, bin, calculate, filter, timeUnit
- field-level transforms : aggregate, bin, sort, timeUnit
Evaluation Metrics
- language syntax validity(lsv)
- measure of how well a model learns the syntax of the underlying language used to specify the visualization.
- grammar syntax validity(gsv)
- a measure of how well a model learns the syntax of the grammar for visualization specification.
Experiments
Results
Limitations
- Field Selection and Transformation
- Training Data
Future Work
- Training Data and Training Strategy
- Extending Data2Vis to Generate Multiple Plausible Visualizations
- Targeting Additional Grammars
- Natural Language and Visualization Specification