Data2Vis - Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks

Posted May 6, 2021

By Yonsoo Kim

2 min read

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks 을 읽고 정리한 글입니다.

Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks

Abstract

end-to-end trainable neural translation model
formulate visualization generation as a language translation problem, where data specifications are mapped to visualization specifications in a declarative language (Vega-Lite).
- Vege-Lite -> JSON format
multilayered ateention-based encoder-decoder network with LSTM
introduce 2 metrics - language syntax validity, visualization grammar syntax validity

Declarative Visualization Specification

One of our aims with Data2Vis is to bridge this gap between the speed and expressivity in specifying visualizations.

Automated Visulaization

We pose visualization specifica- tion as a machine translation problem and intro- duce Data2Vis, a deep neural translation model trained to automatically translate data specifica- tions to visualization specifications. Data2Vis emphasizes the creation of visualizations using rules learned from examples, without resorting to a predefined enumeration or extraction of con- straints, rules, heuristics, and features.
Machine Translation Problem

DNNs for Machine Translation

Data2Vis is also a sequence- to-sequence model using the textual source and target specifications directly for translation, with- out relying on explicit syntax representations.

Model

the data visualization problem as a Seq2Seq translation problem

input : dataset (fields, values in JSON format)
output : valid Vega-Lite visualization specification

encoder-decoder archi.
where the encoder reads and encodes a source sequence into a fixed length vector, and a decoder outputs a translation based on this vec- tor.
Attention Mechanism
Atten- tion mechanisms allow a model to focus on aspects of an input sequence while generating out- put tokens.
Beam Search algorithm
The beam search algorithm used in sequence-to-sequence neural translation models keeps track of k most probable output tokens at each step of decoding, where k is known as the beamwidth. This enables the generation of k most likely output sequences for a given input sequence.
THREE techniques : bidirectional encoding, differential weighing of context via an attention mechanism, and beam search
character-based sequence model

Data and Preprocessing

the model must select a subset of fields to focus on when creating visual- izations (most datasets have multiple fields that cannot all be simultaneously visualized)
the model must learn differences in data types across the data fields (numeric, string, temporal, ordinal, categorical, etc.), which in turn guides how each field is specified in the generation of a visualiza- tion specification.
the model must learn the appropriate transformations to apply to a field given its data type (e.g., aggregate transform does not apply to string fields).

view-level transforms : aggregate, bin, calculate, filter, timeUnit
field-level transforms : aggregate, bin, sort, timeUnit

Evaluation Metrics

language syntax validity(lsv)
- measure of how well a model learns the syntax of the underlying language used to specify the visualization.
grammar syntax validity(gsv)
- a measure of how well a model learns the syntax of the grammar for visualization specification.

Experiments

Results

Limitations

Field Selection and Transformation
Training Data

Future Work

Training Data and Training Strategy
Extending Data2Vis to Generate Multiple Plausible Visualizations
Targeting Additional Grammars
Natural Language and Visualization Specification

Paper Review, NLP

Deep Learning Visualization

This post is licensed under CC BY 4.0 by the author.