The Efficient Algorithm For Choice Model Experimental Designs
In this blog post, I describe the Efficient algorithm for generating choice model designs. This algorithm is used for generating choice model designs with partial profiles, with no constant attributes specified.
Efficient algorithm differs from other algorithms in that is also accepts priors. Priors may be specified as fixed values which correspond to utilities of attribute levels (parameters of the choice model), or as a multivariate normal distribution (in terms of means and standard deviations). Priors are supposed to reflect the utilities of the respondents who will be given the survey: a high utility for an attribute level indicates a stronger preference and vice versa. A distribution of priors is preferred over fixed priors when there is variation or uncertainty in respondents utilities. When a prior is supplied, the design will be created to be optimal for respondents with utilities matching the prior.
An example of a prior specification is shown in the table below. Due to the coding of attributes, the means and standard deviations of the first level of each attribute are set to zero. The mean for 5 star safety is greater than the mean for 4 stars, which in turn is greater than the mean for 3 stars, indicating a preference for safer cars. Similarly, higher car prices correspond to lower means, as expected.
The Efficient algorithm is just the Partial Profiles algorithm with zero constant attributes specified, but this significantly simplifies the method (the number of steps is half as many), and also results in faster computation times compared to Partial Profiles with a nonzero number of constant attributes. The algorithm essentially works around minimizing D-error. Priors are incorporated through D-error, as explained in this article.
The idea behind the algorithm is simple: start with a random design, and cycle sequentially through each element in the design (identified by a question, alternative and attribute), finding out which level for the element minimizes D-error the most, and then updating the element with the optimal level. At the end of the cycle, if the D-error is improved, the cycle is repeated, until no more improvements are made.
This is a slight oversimplification of the algorithm, as there are actually inner (alternatives and attributes) and outer (questions) cycles that are both subject to this repeating rule. The steps below provide a more rigorous description of the algorithm:
1. Randomly generate a design (this is the only source of randomness, the rest of the algorithm is deterministic).
2. Compute the D-error of the the design.
3. Begin loop over questions, with the current question denoted by s.
4. Compute the D-error of the the design.
5. Begin loop over alternatives, with the current alternative denoted by j.
6. Begin loop over attributes, with the current attribute denoted by f.
7. Loop through each level of attribute f to find the level l* that minimizes D-error when it is inserted into the design at question s, alternative j, attribute f.
8. Update the design with l* at question s, alternative j, attribute f.
9. Continue loop over attributes f (step 6) until all attributes have been covered.
10. Continue loop over alternatives j (step 5) until all alternatives have been covered.
11. If the D-error of the design is better than the D-error at step 4, go back to step 5.
10. Continue loop over questions s (step 3) until all questions have been covered.
13. If the D-error of the design is better than the D-error at step 2, go back to step 3, otherwise, the algorithm is done!
This algorithm produces a design that is locally-optimal in terms of D-error, and although it is not necessarily the best possible design for the specifications, it has been shown in the original research paper (see below) to be very close to optimal.
Readers seeking the original source of this algorithm should refer to the paper(( D. P. Cuervo, R. Kessels, P. Goos and K. Sörensen (2016). An integrated algorithm for the optimal design of stated choice experiments with partial profiles. Transportation Research Part B 93.)) listed in the references below. The algorithm is presented in pseudocode and is known as the integrated algorithm by the authors.