Originally posted: 2018-08-22. Last updated: 2024-09-02. View source code for this page here.
The range of data visualisation tools available to data scientists is vast¹. If they’re anything like me, beginner data scientists often don’t put too much thought into which tool to learn — and often just pick a tool on the basis of some impressive outputs they’ve seen online.
On any given team, there will tend to be a range of personal favourites. This inconsistency imposes a cost on the team in two ways:
Data visualisation tools can take months to master, so a range of tools makes it harder to collaborate and share work, and harder to maintain codebases.
Not all tools are equal. The quality of tools varies in their flexibility, the quality of their APIs, and the quality of the data visualisation they produce.
As a result, it makes sense for teams to spend time choosing a ‘sensible default’, and stick with it in most circumstances. I think Vega-Lite is the right choice for standardisation, for the following reasons²:
Consistent, disciplined API underpinned by well-designed grammar of graphics. Data visualisation is the process of mapping different elements of a dataset to visual encodings such as positions on the x and y axes, colours, or sizes. I’ve never seen this more clearly expressed than in the Vega-Lite API. This makes the API extremely predictable, and easy to understand.
Easily editable charts with embedded data. Vega-Lite embeds data directly in the chart, enabling anyone with access to modify the visualization in their web browswer using the Vega-Lite editor, without needing to request access to the data or how the chart was generated.
Founded on academic research. Vega-Lite is a product of the UW Interactive Data Lab, one of the leading academic centres for data visualisation. It’s heavily informed by research on visual perception and how to convey data visually as accurately as possible. This is a great video intro to some of this research³.
Advanced interactivity. Vega-Lite provides the most powerful declarative system I’ve seen for advanced interactivity between charts — such as brushing one chart to filter the data displayed in another one.
Language agnostic. Vega-Lite charts are fully specified by a declarative .json based syntax. You can build Vega Lite charts by writing raw json, from Python (using Altair), from R (using the R interface to Altair), and from Julia.
Looks great. Good data visualisation can never be purely theoretical — an interesting visualisation which ‘breaks’ the rules can be more compelling than something more grounded in theory that looks boring or outdated. Vega-Lite looks modern and slick using the default settings, and is also themable.
Coherent ‘escape valves’ when Vega-Lite is not sufficiently flexible. Visualisation libraries involve a trade off between flexibility and ease of use. With most libraries, there are some customisations or chart types which are simply not possible, and the user has to resort to a learning a completely different tool (e.g. d3.js) to achieve their goals. Vega-Lite is is part of a more coherent set of abstractions. Users can fall back on the more verbose, but more customisable Vega syntax when Vega-Lite is not sufficient, and Vega will be instantly familiar. And both Vega and Vega-Lite share a range of core concepts with d3.js (an imperative language), which gives essentially unlimited flexibility. Coming to Vega-Lite from d3.js, I found that it immediately ‘made sense’ because of these shared concepts.
Web-enabled by default, and easy to host. It is trivial to embed Vega-Lite into any website, which can then be hosted on a static web host because data processing and interaction is all dealt with by the front-end. The ease of embedding charts has resulted in it being adopted within a number of high profile websites and tools, such as Wikipedia and Kibana. If you need to embed graphics elsewhere, you can convert them into a svg or high resolution png.
Vega-Lite charts are easy to incorporate into more complex webapps. Vega View provides hooks so that Vega-Lite charts can be modified by external events, and also means that charts can expose events to other parts of the website. This means that user interaction with a chart can drive the behaviour of other parts of your website.
GUIs and other spec generation tools. Voyager2 is visualisation recommendation tool which will allow the user to build specs using point-and-click, or will automatically build a range of Vega charts based on a dataset. Draco is a cutting edge visualisation recommendation tool that won the InfoVis 2018 Best Paper Award, that suggests specs based optimising ease of visual perception. Lyra provide a GUI in which users can create heavily customised charts using point-and-click drawing tools.
Built into key data science tools by default. A Vega-Lite viewer is built into Jupyterlab by default. Voyager can be installed into JupyterLab as an extension, enabling datasets to be opened in Voyager. Pandas dataframes can be extended to support auto-generation of vega charts. JupyterLab looks set to become the most popular development environment for data scientists, so Vega’s popularity looks set to continue growing.
Reactive dataflows and strong separation of concerns result in high performance. Vega-Lite is underpinned by a sophisticated engine that is aware of what needs to be re-computed when inputs change or in reaction to other events. This eliminates unnecessary code execution. More generally, there is a strong separation of concerns between the specification and its evaluation, enabling iterative performance enhancement behind the scenes. More details are here.
Free and open source. You can find the main Vega-Lite codebase here.
The Vega-Lite docs have a good getting started tutorial.
If you’re a Python user, I recommend Altair to write your Vega-Lite specs. Jake VanderPlas is the author of Altair, and ran an excellent tutorial at Pycon 2018.
For further details, the Vega-Lite team did a great talk at OpenVis 2017, and there is another more in depth talk by the team here.
Jupyter users who just want to get their feet wet can try pdvega.
¹ For a sense of the scale of the landscape, see here for a graphic showing just the Python part.
² I have come to this conclusion after working with quite a wide range of highly-recommended (and very good) tools, including matplotlib, ggplot2, d3.js, highcharts, ggivs, Vega, Vega-Lite and plotly.
³ These influences run deep. The defaults built into vega-lite are carefully thought out, and grounded in research. For instance, how much does a typical data scientist think about which colour scales to use? This is actually the subject of rigorous academic studies — and the best ones are the defaults which are built into Vega-Lite.