08 July 2017 | by Tim Bock

Customization of Bubble Charts for Correspondence Analysis in Q

Using Q

When you insert a bubble chart in Q (Create > Charts > Visualization > Labeled Bubbleplot), you can customize some aspects of its appearance from the controls that appear in the object inspector on the right of the screen. More advanced customizations can be performed by instead inserting an R Output (Create > R Output), and writing code. I illustrate this by explaining how I created the visualizations in my Using Bubble Charts to Show Significant Relationships and Residuals in Correspondence Analysis, shown below.

The visualization above is shown at the end of the post. It is created by a quite lengthy chunk of code. Fortunately, you do not need to understand all of it! In this post I walk through some of the key steps of customizing bubble charts by modifying this code.

Hooking up the code (not as scary as it looks)

The code below creates a correspondence analysis, and then presents this using a bubble chart. To reproduce a similar visualization with your own data:

  1. Create a table in Q that contains the data you want to analyze. This is no different to when you would normally do correspondence analysis.
  2. Right-click on the table in the Report tree, select Reference name, and copy its contents.  When I did this, the reference name of my table was table.Q9.
  3. Duplicate the table by pressing + Duplicate.
  4. Right-click on the table and select Statistics – Cells and choose z-Statistic. Repeat this process to remove any other statistics (or, hold down Ctrl on your keyboard and do this in a single step).
  5. Right-click on the table in the Report tree, select Reference name, and change the reference name to table.zScores (or anything else you want).
  6. From the Create menu, select R Output and paste in the code below, modifying the first 12 lines as per your needs. In the first line you replace table.Q9 with the name of your table (see step 2). In the 3rd line you replace Egypt with the name of the row that contains the standardized residuals that you wish to use, filling in the other rows with the labels that you wish to have appear on the final visualization.
 x = table.Q9
z = table.zScores
row.to.use = "Egypt"
row.label = "Country"
column.label = "Concern"
title = "Traveler's concerns about different countries (bubbles relate to Egypt)"
legend.title = "Strength of relationship"
# Removing rows and columns to be ignored
remove = c("NET", "Total")
x = x[!rownames(x) %in% remove, !colnames(x) %in% remove]
z = z[row.to.use, !colnames(z) %in% remove]
colnames(x) = paste0(colnames(x), ": ", round(x[row.to.use,]), "%")
# Default circle size (this is relative to the z-scores)
z[abs(z) <= 1.96] <- 0 #This turns off the significance.
default.size = 0.1 # Minimum circle size
my.ca = ca::ca(x)
coords = flipDimensionReduction::CANormalization(my.ca, "Principal")
n.rows = nrow(coords$row.coordinates)
n.columns = nrow(coords$column.coordinates)
coords = rbind(coords$row.coordinates, coords$column.coordinates)
# Creating the 'group' variable
n = n.rows + n.columns
groups <- rep("No association", n.columns) groups[z > 0] = paste0("Weakness of ", row.to.use)
groups[z < 0] = paste0("Strength of ", row.to.use)
groups <- c(rep(row.label, n.rows), groups)
# Setting bubble size
bubble.size <- c(rep(default.size, n.rows), abs(z))
# Labeling the dimensions
singular.values <- round(my.ca$sv^2, 6)
variance.explained <- paste(as.character(round(100 * prop.table(singular.values), 1)), "%", sep = "")[c(1, 2)]
column.labels <- paste("Dimension", c(1, 2), paste0("(", variance.explained, ")"))
bubble.size[bubble.size < default.size] <- default.size
rhtmlLabeledScatter::LabeledScatter(X = coords[, 1],
Y = coords[, 2],
Z = bubble.size,
label = rownames(coords),
label.alt = rownames(coords),
group = groups,
colors = c("Black", "Purple", "#FA614B", "#3E7DCC"),
fixed.aspect = TRUE,
title = title,
x.title = column.labels[1],
y.title = column.labels[2],
z.title = legend.title,
axis.font.size = 10,
labels.font.size = 14,
title.font.size = 20,
legend.font.size = 15,
y.title.font.size = 16,
x.title.font.size = 16) 

Turning off the significance testing

The visualization below is the same as the one above, except that the significance testing has been turned off. This was achieved by:

  • Commenting out line 14 (i.e., typing a # at the very beginning of the line, which prevents that line of code being run.
  • Removing , “purple” from line 40 and swapping around the order of the two last colors ( “#3E7DCC”, “#FA614B”). This is where you customize the colors. You can type in a color code, or a color name, such as “Red” or “Blue”.

Only showing only the positive residuals

The next plot shows only the positive residuals (i.e., the concerns about Egypt that have the strongest relationship), it was created by:

  • Removing the three letters abs from line 28.
  • Commenting out line 25.
  • In line 40, replacing #3M7DCC with Purple


Taking the data values off the chart

Lastly, to remove the percentages from the visualization, comment out line 12, which leaves us with the visualization below.

More advanced customizations

If you hover your mouse over the word LabeledScatter in Properties > R CODE (line 34), a tooltip shows all the definitions of the parameters in this function, which allow further customization to be performed.


Author: Tim Bock

Tim Bock is the founder of Displayr. Tim is a data scientist, who has consulted, published academic papers, and won awards, for problems/techniques as diverse as neural networks, mixture models, data fusion, market segmentation, IPO pricing, small sample research, and data visualization. He has conducted data science projects for numerous companies, including Pfizer, Coca Cola, ACNielsen, KFC, Weight Watchers, Unilever, and Nestle. He is also the founder of Q www.qresearchsoftware.com, a data science product designed for survey research, which is used by all the world’s seven largest market research consultancies. He studied econometrics, maths, and marketing, and has a University Medal and PhD from the University of New South Wales (Australia’s leading research university), where he was an adjunct member of staff for 15 years.

1 Comment. Share your thoughts.

  1. […] I have also written other posts that describe how to create these visualizations and the differences in the R code between the plots.  One of them describes how to create these visualizations in Displayr, and another describing how to do it in Q. […]

Leave a Reply

Your email address will not be published. Required fields are marked *