Creating Basic Charts using d3.js

by Ben Lorica (last updated Apr/2012)

The set of tools I use to create charts include Excel & R (for generating static images), Processing, Protovis, and the Google Visualization API (for interactive graphics). I tend to customize the charts I create so any tool I choose to learn & use needs to be flexible in that regard. I use Processing and R for prototyping and designing visualizations that I plan to deliver on the web -- the final product is either a static image or something done through Javascript. Both Protovis and the Google Visualization API use JSON and Javascript, and are great for delivering charts on a web browser. Recently the creators of Protovis announced that they would cease development, and instead focus their efforts on a new visualization library called d3.js:
D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction.

D3 is not a traditional visualization framework. Rather than provide a monolithic system with all the features anyone may ever need, D3 solves only the crux of the problem: efficient manipulation of documents based on data. This gives D3 extraordinary flexibility, exposing the full capabilities of underlying technologies such as CSS3, HTML5 and SVG. It avoids learning a new intermediate proprietary representation. With minimal overhead, D3 is extremely fast, supporting large datasets and dynamic behaviors for interaction and animation. And, for those common needs, D3's functional style allows code reuse through a diverse collection of optional modules.

The creators of d3.js provide examples for creating a wide variety of static and interactive charts. In this note, I'll recreate a few basic charts using d3, and in the process provide a few more examples that might help you learn this elegant but relatively new package. I intend to update this document whenever I come across a chart that I think would be interesting to see in d3.

Paired Bar Charts Time-series: Two Line Charts (with Title & Legend)
Color-coded Bubble Charts & Scatterplots Beyond simple averages: The Median and the Interquartile Range
Static, Stacked Bar Chart (with Title & Legend) Annotated Stacked Bar Charts ("magazine-style")
Dot Plot: Annotated & Color-coded Markers Multiple Histograms: Trellis-style Comparison
Basic Treemap: U.S. Unemployment & Elections Time-series: Line chart from Finance/Economics
Geographic Heatmaps: 2010 U.S. Census Transitioning between two Scatterplots
2 x 2 Matrix Chart

Paired Bar Charts

A July/2011 blog post on stalled budget negotiations in the US, came with a bar chart that placed Obama's proposed tax increases and spending cuts, alongside those recent presidents. In the chart below, maroon bars represent budget cuts (% share of total budget deal), while blue bars represent tax increases. I didn't have access to the raw data, so the chart below is based on eyeballing the values in the original bar chart. Since the chart relies on very few data points, I just hard-coded the data. (For a similar example, see my Decision 2012 page.)

Back to top

Time-series: Two Line Charts (with Title & Legend)

This example comes from a recent Econbrowser post on income inequality in the US, from 1920-2008. The chart gives the percentage share of income (excluding capital gains) of the Top 1% and Top 5% households. For this particular example, the underlying data set had too many observations so hard-coding wasn't a good option. I used d3 to read the data associated with the chart, from a csv file. (I have a much more polished version of this chart on my Decision 2012 page.)

From 1980-present, there is a clear upward trend in the share of income belonging to both the Top 1% and Top 5% wealthiest households.

Back to top

Color-coded Bubble Charts & Scatterplots

The following example was inspired by one of Hans Rosling's Gap Minder demos. I used d3 to read the data associated with the chart from this csv file.
  • Horizontal Axis => Murders per 100,000 population
  • Vertical Axis => Burglaries per 100,000 population
  • Bubble Size => State Population
  • Bubble color ("Red states, Blue states") => Political classification of each state into Red / Blue / Purple.
  • Back to top

    Beyond simple averages: The Median and the Interquartile Range

    Whenever possible, I like to draw charts that provide visual representations for both averages and deviations. There are several standard options available, such as boxplots, histograms, or even approximate density functions. I have found that simply drawing the middle 50% of a distribution, in addition to highlighting the median, does a pretty good job of capturing variation and average behavior. It has the added benefit of being easy to explain: the bar represents the region from the 25th to the 75th percentile, and the red LINE is the median.

    Below is an example of such a chart. Data is from the Dow Jones / Credit Suisse Hedge Fund Indices, and are the statistics for the monthly (percentage) returns from Jan/2006 to Aug/2011. As you can see, it's very easy to pick out highly-volatile trading styles -- simply look at the width of the bars! For another example, see the following chart from my page on Decision 2012.

    Back to top

    Static, Stacked Bar Chart (with Title & Legend)

    The creators of d3 provide an example a bar chart, that transitions between a stacked & grouped layout. I've already provided a separate example of a static, grouped (horizontal) bar chart. Here is an example of a static, stacked (vertical) bar chart, with accompanying legend and title.

    Over the most recent quarter, combined revenue from iPhones and iPads exceeded $19 Billion! iPod revenues are relatively smaller, although there is a noticeable spike during the Christmas Holidays (Q1 of Apple's fiscal year).

    Back to top

    Annotated Stacked Bar Charts ("magazine-style")

    A graphic that's common in financial/business publications, is the annotated stacked bar chart. Below is a recent example I found in the Atlantic blog of James Fallows. The chart is actually from a NY Times editorial on the size of the U.S. budget deficit.
    ... under Mr. Bush, tax cuts and war spending were the biggest policy drivers of the swing from projected surpluses to deficits from 2002 to 2009. Budget estimates that didn't foresee the recessions in 2001 and in 2008 and 2009 also contributed to deficits. Mr. Obama's policies, taken out to 2017, add to deficits, but not by nearly as much.
    In my d3 rendition of the graphic, I didn't quite capture all the subtle features: for example I didn't actually format font elements in different sections, as the NY Times did. But I was surprised how quickly one can create "magazine-style" graphics like this in d3. (I have a similar graphic -- on the Buffett Tax Rule -- on my Decision 2012 page.)

    Back to top

    Dot Plot: Annotated & Color-coded Markers

    This example was inspired by a April/2011 infographic from the Economist. I don't usually use this type of chart to represent inequality or concentration. But seeing the markers displayed along with the Gini Coefficients of the corresponding counties, piqued my interest. The smaller the distance between the markers (which usually corresponds to a lower share of the top 20%), the lower the Gini Coefficient.

    In my d3 rendition, I didn't use the blue background used by the Economist, and I also didn't stylize the font elements. (For another example of a Dot Plot, see my Decision 2012 page.)

    Back to top

    Multiple Histograms: Trellis-style Comparison

    From the time I started using S/S-Plus, I found myself relying on Trellis displays: the use of common axes to display relationships & distributions, conditional on values of other variables. I find this graphical style useful in exploring data sets with many variables and observations. Luckily Trellis graphs are very easy to create in R, and other statistics packages. This next example is an attempt to render something similar in d3.

    Below is what histograms of a variable look like for three distinct values of the country variable (these conditional distributions use fake data). For "special effects", I added a transition element, which you see in action if you reload the page by clicking on the reload button below:

    Back to top

    Basic Treemap: U.S. Unemployment & Elections

    I took the treemap example that comes with d3, and I fed it different data and tweaked the colors and labels. In the example below, the size and color of the squares are as follows:
  • Electoral Votes: Size (number of electoral college votes), Color (year-over-year change in unemployment rate from Jun/2010 to Jun/2011; green corresponds to an improvement/decrease, red corresponds to a worsening/increase)

  • Population: Size (state population), Color (current classification as a Red / Blue / Purple state)
  • You'll notice that relative to their population size, smaller states have a larger share of electoral votes. (Use the buttons below to toggle back and forth between the two "views".)

    Back to top

    Time-series: Line chart from Finance/Economics

    In this example I wanted to create charts common in finance and economics. In finance, many time-series charts display related series -- such as a stock price chart, with trading volume underneath. Economists like to display time-series charts with some temporal regions higlighted, such as periods when an economy is in recession.

    In the chart below, I tried to do both. The data set was big enough, that in order to get a bar chart "effect" for the bottom graph, I only drew bars twice a year (otherwise the bars would be too thin or appear as an area chart).

    I highlighted the two terms of Reagan and Clinton, as well as the 4-month period prior to November of their re-election years. Notice that under Reagan unemployment surged close to 11%, but by the time he ran for re-election, unemployment was falling and hovered around 7.5%. Under Clinton unemployment trended down, and during his re-election campaign it was around 5.2%. I also higlighted the period during the re-election campaigns of Bush I & II. Clinton's predecessor (Bush I) ran at a time when unemployment had just peaked. In contrast Bush II ran for re-election when unemployment was around 5.5%.

    If recent trends continue, Obama's re-election prospects look good.

    Back to top

    Geographic Heatmaps: 2010 U.S. Census

    I'm developing this as a standalone page, please click here. I created another geographic heat map, this time at the state level.

    Back to top

    Transitioning between two Scatterplots

    In this example, I attempt a smooth transition between two charts (in this case, two scatterplots). As you toggle between the two buttons, two different data sets get drawn. As an alternative to a trellis-style presentation, I sometimes use this form of animation to highlight differences between data sets. (I have a similar chart on my Decision 2012 page.)

    Back to top

    2 x 2 Matrix Chart

    The matrix chart is especially popular among management consultants. I'm not a big user, but when I do use matrix charts, I try to highlight statistically interesting regions. I have an example in a page dedicated to charts for Decision 2012.

    Back to top

    NOTE: Reproduction & reuse allowed under Creative Commons Attribution.    Creative Commons Attribution