# Focusing the Results of Correspondence Analysis in Q

**Correspondence analysis is often used to visualize a table of data. The goal is to represent as much information as possible, as accuratelyÂ as possible. However, there may be circumstances when we are interested in a specific row of the table (usually a brand). Such aÂ pointÂ may not be represented well in the standard scatterplot output. In this post we explain a new technique developed by Q for rotating the results of correspondence analysis to focus on a specific point.**

Correspondence analysis outputs consist of coordinates (usually plotted on a scatterplot) that explain the most variation across all of the brands. When we are interested in a specific brand, it can be useful to use focused rotation, described below. This is a novel technique that we have developed, described in the paper A brandâ€™s eye view of correspondence analysisÂ published in theÂ International Journal of Market Research.

## Start your engines

The data we are using describes theÂ characteristics that people associate with cars. The input table below is labelled byÂ 14 car brands along the rows. The columns are labelled by characteristics. Each cell indicates the strength of association between a characteristic and a car.

The chart below shows the correspondence analysis resulting from this data. In Q it is created from **Create > Dimension Reduction > Correspondence Analysis of a Table.** The data is plotted with normalization ofÂ principal coordinates. This means that we can compare distances between column labels and distances between row labels, but not the distance between a row and a column label. See this post for a more in-depth discussion about normalization and interpretation of correspondence analysis.

The dimensions output by correspondence analysis are in decreasing order of variance explained. This means that laterÂ dimensions explain smaller portions of the variance. The chart shows only the first two dimensions, which for this example, capture only 53.4% of the variance. So the hidden dimensions contain a reasonable amount of information. Importantly, from the plot alone we cannot tell how much information about any given point (brand) is retained.

## Our first car

As an example,Â *Mini Cooper*Â is relatively close to the origin. This could be because it is poorly represented by the two plotted dimensions. Or it could genuinely be the case that *Mini Cooper*Â is close to the origin in all dimensions.

If we were performing this analysis to find the relationship ofÂ *Mini Cooper*Â toÂ the other cars and characteristics, we could not draw any strong conclusions from this plot. The best we couldÂ say is that in the first two dimensions alone, there is little to discriminate it.

## Quality of the map

We can create a table showing how much variance is represented in each dimension withÂ **Create > Dimension Reduction > Diagnostic > Quality. **The resulting table (below) shows the variance in the first two dimensions before the row label of each car. SinceÂ *Mini Cooper* has only 16%, we can now say that the plot above hides much of the information for this brand.

## Making a sharp turn

In order to find out more about the Mini, we rotate the results so that all of its variance is in the first dimension. This means that there is no longer any hidden information about this point. We shift the focus of the output ontoÂ *Mini Cooper*.

In Q, this is done by entering *Mini Cooper* in the box labeled *Focus row or column*.Â The effect of the rotation is shown below.

In this case, correspondence analysis produces embeddings in 5 dimensional space. If you find this difficult to visualize, join the club. What matters here is that there is no longer any hidden information aboutÂ *Mini Cooper. *We can now see that it isÂ more relatedÂ to *Fiat 500* than the other cars. This makes intuitive sense, as they are both small cars. We haveÂ gained insight by focusing on what differentiatesÂ *Mini Cooper* from the other cars.

However, note that the chart as a whole explains 46.3% of the variance in contrast to 53.4% in the first chart. Â The price we pay for the rotation is that the first twoÂ dimensions no longer contain as much variance as possible about all of the data. It is no longer the best general representation of all the points.

## Buying a new car

As another example, let’s rotate to focus on theÂ *VW Golf*. Notice how the plot below is very similar to the original, except rotated on the page.

This rotation is easier to visualize. We have turned the page clockwise by about 135 degrees and the relationship between *VW GolfÂ *and the other cars has been closely maintained. The total variance explained has dropped by only 0.1% from the original plot. All of this tells us thatÂ *VW Golf* was well represented originally. This confirms the 99% variance in the first two dimensions from the quality table above.

TRY IT OUT

The analysisÂ in this post was performed in Q. The flipDimensionReduction package (available on GitHub), which uses theÂ caÂ package, performed the correspondence analysis.Â Click hereÂ to download the QPack with all the analysis in this post.

The car data is from a latent feature analysis performed inÂ Meulders, M. (2013). An R Package for Probabilistic Latent Feature Analysis of Two-Way TwoMode

Frequencies. Journal of Statistical Software, 54(14), 1-29. This analysis uses data fromÂ Van Gysel, E. (2011). Perceptuele analyse van automodellen met probabilistische feature modellen.

[translation from Dutch: Perceptual analysis of car models with probabilistic feature models] Master thesis. Hogeschool-Universiteit Brussel.

#### Author: Jake Hoare

After escaping from physics to a career in banking, then escaping from banking, I decided to go back to BASIC and study computing. This led me to rediscover artificial intelligence and data science. I now get to indulge myself at Displayr working in the Data Science team, sometimes on machine learning.