[R-bloggers] How to use Hierarchical Bayes for Choice Modeling In R (and 4 more aRticles)

[R-bloggers] How to use Hierarchical Bayes for Choice Modeling In R (and 4 more aRticles)

Link to R-bloggers

How to use Hierarchical Bayes for Choice Modeling In R

Posted: 31 Jul 2018 05:00 AM PDT

(This article was first published on R – Displayr, and kindly contributed to R-bloggers)

There are multiple add-on packages available in R to fit choice models in a Bayesian framework.  These include bayesm, choiceModelR, and flipChoice. In this article, I fill focus on the use of flipChoice, which relies on the state-of-the-art probabilistic programming language stan to fit the choice model.

Getting Started

I assume you are already familiar with choice models so I'll be focusing solely on their estimation using flipChoice. You can install flipChoice from GitHub using the devtools package but you may need to first install devtools using the command install.packages("devtools") in R. You can then install flipChoice using devtools::install_github("Displayr/flipChoice"). The main function that handles fitting of choice models in the package is FitChoiceModel.

FitChoiceModel offers great flexibility in the types of inputs it accepts.  Data can be in any of five different formats, including Sawtooth CHO file format, Q experiment question format, a ChoiceModelDesign output from flipChoice::ChoiceModelDesign, Sawtooth Dual file format, and JMP file format.

Inputs/Function Arguments

I'll key you in on some of the arguments/parameters of the FitChoiceModel function you should be aware of to fit your own choice model.

  • design – use this argument if you have a ChoiceModelDesign output from flipChoice
  • choices – for use with design, a data.frame with the choices made by each respondent (one column per question)
  • questions – for use with design, a data.frame with the IDs of the tasks presented to each respondent (with one column per question and one row for each respondent)
  • data – use if you have a data.frame in Q experiment question format (example below)
  • file – A text string giving the path to a CHO file
  • respondent.ids – for use with cho.file, a vector of respondent IDs
  • file – A text string giving the path (possibly a URL) to a Sawtooth or JMP design file
  • levels.file – A text string giving the path to a JMP or CHO
  • iterations, hb.chains, hb.max.tree.depth, hb.adapt.delta – parameters controlling the stan Monte Carlo algorithm. The function tries to provide sensible defaults for these, but they should be adjusted if there are any warnings about sampling problems or the diagnostic functions (mentioned below) indicate any convergence issues.
  • classes – For multi-class HB, the number of latent classes. The default is to only have one class.
  • tasks.left.out – If you wish to perform cross validation to check the predictive accuracy of your model, this specifies the number of classes to leave out.
  • cov.formula, cov.data – can be used to specific respondent specific covariates (see here)
  • hb.prior.mean, hb.prior.sd – Optional vectors for specifying prior values for the mean parameters

Additional parameters are discussed in the documentation; type ?flipChoice::FitChoiceModel at the R prompt to view it. The function also accepts arguments which will be passed on to rstan::sampling and rstan::stan.

Multiple door choice

Example data

I'll walk you through the rest of this method using data from a discrete choice experiment. This experiment investigated consumer preferences when purchasing a 12-pack of chicken eggs from a supermarket. Each of the 380 respondents were asked eight questions requiring them to choose between three alternative 12-packs of eggs.

The alternatives varied in their levels of seven attributes:

  • egg weight (55g, 60g, 65g, or 70g)
  • egg quality (caged, barn, or free range)
  • price ($2, $3, $4, $5, or $6)
  • chicken feed used (not stated, grain and fish, or vegetables only)
  • egg uniformity (all eggs the same or some different)
  • if the eggs were organic (not stated or antibiotic and hormone free)
  • whether any proceeds from the sale went to charity (not stated or 10% donated to RSPCA).

I've provided an example question below:

Egg Question

Example question from egg preference discrete choice experiment.

Fitting your Choice Model

The eggs data is included in the package in multiple formats, so you can experiment and try out the different input formats accepted by FitChoiceModel.

To fit a choice model to the eggs data in Cho file format, we can use the following commands:

library(flipChoice)
cho.file <- system.file("testdata", "Training.cho", package = "flipChoice")
attribute.lvls.file <- system.file("testdata", "Attribute_labels_-_Training.xlsx", package = "flipChoice")
data(cho.ids, package = "flipChoice")  # vector of respondent IDs
fit <- FitChoiceModel(cho.file = cho.file,
attribute.levels.file = attribute.lvls.file,
respondent.ids = respondent.ids,
hb.iterations = 500, hb.chains = 2)

The function will issue a warning if it detects any issues with sampling but we should also check for convergence by examining the effect sample sizes and Rhat values for the mean and standard deviation parameters. These are included in the output of summary statistics using ExtractParameterStats(fit). We can examine trace plots for the same parameters using TracePlots(fit). Estimates of the posterior medians and 95% confidence bands for these parameters can be plotted using the function PlotPosteriorIntervals. My colleague, Justin, has discussed the diagnostics in more detail (including images) for a very similar model (MaxDiff) here.

We hope this post has helped you use Hierarchical Bayes to create your choice model. But if you're feeling a bit confused and overwhelmed, it's OK! Feel free to check out our guide on how to use Hierarchical Bayes to create a choice model in Displayr, it's easier, we promise. 

To leave a comment for the author, please follow the link and comment on their blog: R – Displayr.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Judging Freehand Circle Drawing Competitions

Posted: 30 Jul 2018 03:00 PM PDT

(This article was first published on Theory meets practice..., and kindly contributed to R-bloggers)

Abstract:

In 2007 Alexander Overwijk went viral with his 'Perfect Circle' video. The same year a World Freehand Circle Drawing Championship was organized, which he won. In this post we show how a mobile camera, R and the imager package can be used to develop an image analysis based method to judge future instances of the championship.



Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Public License (GPL v3) license from github.

Introduction

A few years back I watched with awe the 2007 video of Alexander Overwijk's freehand drawing a 1m diameter circle:




Note: Depending on your browser you might need to click "Watch on Youtube" to see the video.

Ever since watching that video I have wondered how one would go about to judge the winner of such an alleged World Freehand Circle Drawing Championship (WFHCDC). While researching for this post I finally figured it out. On his web page Alexander in the story behind the video "reveals":

They have a laser machine called the circleometer that creates the perfect circle closest to the one you drew. The circleometer then calculates the difference in area between the laser circle and the circle that you drew. The machine then calibrates the area difference as if you had drawn a circle with radius one meter. The person with the smallest area difference is declared the world freehand circle drawing champion.

Aha! Imaginary circleometers are expensive and my dean of study most likely isn't open for investing in perfect circle measurement equipment… So here is a cheaper solution involving a mobile device camera, R and the imager package by Simon Barthelmé et al. Altogether a combination of modern data science tools, which my dean of study is most likely to approve! We'll use a screenshot from the perfect circle video as motivating example to guide through the 3 phases of the method:

  1. Image Rectification
  2. Freehand circle identification and perfect circle estimation
  3. Quantifying deviation from the perfect circle

We start by loading the screenshot into R using imager:

library("imager")  file <- "circle2.png"  img <- imager::load.image(file.path(fullFigPath, file))

Image Rectification

The image clearly suffers from perspective distortions caused by the camera being positioned to the right of the circle and, hence, not being orthogonal to the blackboard plane. Furthermore, small lense distortions are also visible – for example the right vertical line of the blackboard arcs slightly. Since the video contains no details about what sort of lense equipment was used, for the sake of simplicity, we will ignore lense distortions in this post. If such information is available one can use a program such as RawTherapee (available under a GNU GPL v3 license) to read the EXIF information in the meta data of the image and automatically correct for lens distortion.

To rectify the image we estimate the parameters of the 2D projection based on 4 ground control points (GPC). We use R's locator function to determine the pixel location of the four corner points of the blackboard in the image, but could just as well use any image analysis program such as Gimp. Furthermore, we need the true object coordinates of these GPC. Unfortunately, these are only approximately available to due lack of knowledge of the size of the blackboard in the classroom. As a consequence a guesstimate of the horizontal length is used.

plot(img)  p <- locator(4)  p <- round(cbind(p$x, p$y))  dump(list=c("p"), "")

These points are now used to rectify the image by a Direct Linear Transformation (DLT) based on exactly 4 control points (Hartley and Zisserman 2004, Chapter 4)1. That is the parameters of the 3×3 transformation matrix \(H\) in homogeneous coordinates are estimated such that \(p' = H p\), see the code on github for details.

We can implement the rectifying transformation using the imager::warp function:

##Transform image coordinates (x',y') to (x,y), i.e. note we specify  ##the back transformation p = H * p', so H here is the inverse.  map.persp.inv <- function(x,y, H) {    out_image <- H %*% rbind(x,y,1)    list(x=out_image[1,]/out_image[3,], y=out_image[2,]/out_image[3,])  }  ##Pad dx_blackboard pixels to the right to make space for blackboard  img_padded <- pad(img, nPix=dx_blackboard, axes="x", pos=1)  ##Warp image  warp <- imwarp(img_padded, map=function(x,y) map.persp.inv(x,y,solve(H)),coordinates="absolute", direction="backward")

The result looks as follows: Please notice the different x-axes of the two images when comparing them. For faster computation and better visualization in the remainder of this post, we crop the x-axis of the image to the relevant parts of the circle.

warp <- imsub(warp, x %inr% c(dx_blackboard, nrow(warp)))

Freehand circle identification

As described in the imager edge detection tutorial we use length of the gradient to determine the edges in the image. This can be done by applying filters to the image.

##Edge detection function. Sigma is the size of the blur window.  detect.edges <- function(im, sigma=1) {    isoblur(im,sigma) %>% imgradient("xy") %>% enorm() %>% imsplit("c") %>% add  }  #Edge detection filter sequence.  edges <- detect.edges(warp,1) %>% sqrt

To detect the circle from this we specify a few seed points for a watershed algorithm with a priority map inverse proportional to gradient magnitude. This includes a few points outside the circle and a few points inside the circle. Note: a perfect circle would have no border, but when drawing a circle with a piece of chalk it's destined to have a thin border line.

We can now extract the circle by:

##Just the circle  freehandCircle <- (warp * (mask==2) > 0) %>% grayscale  ##Total area covered by the circle  freehandDisc <- label(freehandCircle, high_connectivity=TRUE) > 0
dilatedDisc <- freehandDisc %>% dilate_rect(sx=3,sy=3)  freehandCircleThinBorder <- (freehandDisc - dilatedDisc) != 0

Perfect circle estimation

Once the freehand circle path in the image has been identified, we need to find the best fitting perfect circle matching this path. This problem is elegantly solved by Coope (1993), who formulates the problem as finding center and radius of the circle minimizing the squared Euclidean distance to \(m\) data points \(a_j\), \(j=1,\ldots,m\). Denoting by \(c\) the center of the circle and by \(r>0\) the radius we want to find the solution of

\[
\min_{c\in \mathbb{R^2}, r>0} \sum_{j=1}^m F_j(c,r)^2, \quad\text{where}\quad F_j(c,r) = \left|r – ||c-a_j||_2\right|,
\]

and \(||x||_2\) denotes Euclidean distance. Because the curve fitting minimizes the distance between an observed point \(a_j\) and its closest point on the circle and thus involves both the \(x\) and the \(y\) direction , this is a so called total least squares problem. The problem is non-linear and can only be solved by iterative numerical methods. However, the dimension of the parameter space can be reduced by one, because given the center \(c\) we can determine that \(r(c)=\frac{1}{m} \sum_{j=1}^m ||c-a_j||_2\).

##Compute radius given center  radius_given_center <- function(center, dist=NULL) {    if (is.null(dist)) {      a <- as.matrix(where(freehandCircleThinBorder > 0))      dist <- sqrt((a[,1] - center[1])^2 + (a[,2] - center[2])^2)    }    return(mean(dist))  }    ##Target functin of the total least squares criterion of Coope (1993)  target_tls <- function(theta) {    ##Extract parameters    center <- exp(theta[1:2])      ##Total least squares criterion from Coope (1993)    a <- as.matrix(where(freehandCircleThinBorder > 0))    dist <- sqrt((a[,1] - center[1])^2 + (a[,2] - center[2])^2)    ##Compute radius given center    radius <- radius_given_center(center, dist)      F <- abs( radius - dist)    sum(F^2)  }      res_tls <- optim(par=log(c(x=background[1,1], y=background[1,2])), fn=target_tls)  center <- exp(res_tls$par)  fit_tls <- c(center,radius=radius_given_center(center))  fit_tls
##        x        y   radius   ## 894.3885 707.3191 518.4119

We illustrate the freehand circle (in black) and the fitted circle (magenta) on top of each other using the alpha channel. You have to study the image carefully to detect differences between the two curves!

Quantifying the circularness of the freehand circle

We quantify the circularness of the freehand circle by contrasting the area covered by it with the area of the fitted perfect circle. The closer this ratio is to 1 the more perfect is the freehand circle.

##Area of the freehand drawn disc  areaFreehandDisc <- sum(freehandDisc)    ##Area of the disc corresponding to the idealized circle fitted  ##to the freehand circle  areaIdealDisc <- pi * fit_tls["radius"]^2    ##Ratio between the two areas  ratio_area <- as.numeric(areaFreehandDisc  / areaIdealDisc)  ratio_area
## [1] 0.9971778

Yup, it's a pretty perfect circle! Note also that the calibration is to a circle with an area of 1 pixel unit and not a circle with diameter of 1m as described in the above text about the "circleometer". Since the fitted circle already takes the desired shape into account, my intuition is that this ratio is a pretty good way to quantify circularness. However, to avoid measurehacks, we use as backup measure the circleometer approach: for each point on the freehand circle we measure its distance to the freehand circle and integrate/sum this up over the path of the freehand circle. We can approximate this integration using image pixels as follows.

##Create a pixel based circle in an image of the same size as the  ##freehandCircle img. For visibility we use a border of 'border' pixels  ##s.t. circle goes [radius - border/2, radius + border/2].  Circle <- function(center, radius, border) {    as.cimg(function(x,y) {      lhs <- (x-center[1])^2 + (y-center[2])^2      return( (lhs >= (radius-border/2)^2) & (lhs <= (radius+border/2)^2))    }, dim=dim(freehandCircle))  }    ##Build pixel circle based on the fitted parameters  C_tls <- Circle(fit_tls[1:2], fit_tls[3], border=1)  ##Calculate Euclidean distance to circle for each pixel in the image  dist <- distance_transform(C_tls, value=1, metric=2)  ##Distance between outer border of freehand circle and perfect circle  area_difference <- sum(dist[freehandCircleThinBorder>0])    ##Compute area difference and scaled it by the area of the fitted disc  ratio_areadifference <- as.numeric(area_difference / areaIdealDisc)

The image below illustrates this by overlaying the result on top of the distance map. For better visualization we zoom in on the 270-300 degree part of the circle (i.e. the bottom right). In magenta is the fitted perfect circle, in gray the freehand circle and the area between the two paths is summed up over the entire path of the freehand circle:

We obtain ratio_areadifference= 0.01735. Thus also this measure tells us: it's a pretty perfect circle! To summarise: The output on the display of the judge's Circle-O-Meter App (available under a GPL v3 license) at the World Freehand Circle Drawing Championship would be as follows: 😉

Discussion

We took elements of computer vision, image analysis and total least squares to segment a chalk-drawn circle on a blackboard and provided measures of it's circularness. Since we did not have direct access to the measurements of the blackboard in object space, a little guesstimation was necessary, nevertheless, the results show that it was a pretty circular freehand circle!

With the machinery in place for judging freehand circles, its time to send out the call for contributions to the 2nd World Freehand Circle Drawing Championship (online edition!). Stay tuned for the call: participants would upload their photo plus minor modifications of a general analysis R-script computing the two area ratios measures and submit their contribution by a pull request the github WFHCDC repository. You can spend the anxious waiting time practicing your freehand 1m diameter circles – it's a good way to loosen up long & unproductive meetings!

Appendix

If we instead of the total sum of squares criterion involving \(F_j(c,r)\) mentioned in the text solve the related criterion \[
\sum_{j=1}^m f_j(c,r)^2, \quad\text{where}\quad f_j(c,r) =
||c-a_j||_2^2 – r^2,
\]
then a much simpler solution emerges. Coope (1993) explains that this alternative criterion geometrically corresponds to minimizing the product

\[
\text{(distance to the closest point on the circle)}\times \text{(distance to
the furthest away point point on the circle)}
\]

over the measurement point. In order to obtain the solution write the residuals \(f_j\) as \(f_j(c,r) = c^T c – 2 c^T a_j + a_j^T a_j – r^2\) and perform a change of variables from \((c_1, c_2, r)'\) to \[
y =
\left[
\begin{matrix}
2 c_1 \\
2 c_2 \\
r^2 – c^T c \\
\end{matrix}
\right]
\quad \text{and let} \quad
b_j =
\left[
\begin{matrix}
a_{j1} \\
a_{j2} \\
1
\end{matrix}
\right].
\]
The minimization problem then becomes \[
\min_{y \in \mathbb{R}^3} \sum_{j=1}^m \left\{ a_j^T a_j – b_j^T y \right\},
\]
which can be written as a linear least square (LLS) expression \[
\min_y ||By – d||_2^2,
\]
where \(B\) is a \(3\times m\) matrix with the \(b_j\)-vectors as columns and \(d=||a_j||_2^2\). This expression is then easily solved using the standard least squares machinery.

##Fast linear least squares problem as described in Coope (1993)  fitCircle_lls <- function(freehandCircle) {    a <- as.matrix(where(freehandCircle > 0))    b <- cbind(a,1)    B <- b    d <- a[,1]^2 + a[,2]^2    y <- solve(t(B) %*% B) %*% t(B) %*% d    x <- 1/2*y[1:2]    r <- as.numeric(sqrt(y[3] + t(x) %*% x))    return(c(x=x[1], y=x[2], radius=r))  }    ##Fit using linear least squares procedure of Coole (1993)  fit_lls <- fitCircle_lls(freehandCircleThinBorder)    ##Compare TLS and LLS fit  rbind(lls=fit_lls,tls=fit_tls)
##            x        y   radius  ## lls 894.3666 707.3901 518.4295  ## tls 894.3885 707.3191 518.4119

In other words: the results are nearly identical.

Literature

Coope, I.D. 1993. "Circle Fitting by Linear and Nonlinear Least Squares." Journal of Optimization Theory and Applications 76 (2): 381–88. doi:10.1007/BF00939613.

Hartley, R., and A. Zisserman. 2004. Multiple View Geometry in Computer Vision. 2nd ed. Cambridge University Press. http://www.robots.ox.ac.uk/~vgg/hzbook/.


  1. Alternatively, see slide 18 and onward in https://ags.cs.uni-kl.de/fileadmin/inf_ags/3dcv-ws11-12/3DCV_WS11-12_lec04.pdf↩

To leave a comment for the author, please follow the link and comment on their blog: Theory meets practice....

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

New R package debugr – use automatic debug messages to improve your code

Posted: 30 Jul 2018 02:41 PM PDT

(This article was first published on Topics in R, and kindly contributed to R-bloggers)

debugr is a new package designed to support debugging in R. It mainly provides the dwatch() function which prints a debug output to the console or to a file. A debug output can consist of a static text message, the values of one or more objects (potentially transformed by applying some functions) or the value of one or multiple (more complex) R expressions. Whether or not a debug message is displayed can be made dependent on the evaluation of a criterion phrased as an R expression. Generally, debug messages are only shown if the debug mode is activated. The debug mode is activated and deactivated with debugr_switchOn() and debugr_switchOff(), respectively, which change the logical debugr.active value in the global options. Since debug messages are only displayed in debug mode, the dwatch() function calls can even remain in the original code as they remain silent and won't have any effect until the debug mode is switched on again. Read more:

To leave a comment for the author, please follow the link and comment on their blog: Topics in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

A Certification for R Package Quality

Posted: 30 Jul 2018 09:35 AM PDT

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Cii_badge-300x300There are more than 12,000 packages for R available on CRAN, and many others available on Github and elsewhere. But how can you be sure that a given R package follows best development practices for high-quality, secure software?

Based on a recent survey of R users related to challenges in selecting R packages, the R Consortium now recommends a way for package authors to self-validate that their package follows best practices for development. The CII Best Practices Badge Program, developed by the Linux Foundation's Core Infrastructure Initiative, defines a set of criteria that open-source software projects should follow for quality and security. The criteria relate to open-source license standards, documentation, secure development and delivery practices, version control and bug reporting practices, build and test processes, and much more. R packages that meet these standards are a signal to R users that can be relied upon, and as the standards document notes:

There is no set of practices that can guarantee that software will never have defects or vulnerabilities; even formal methods can fail if the specifications or assumptions are wrong. Nor is there any set of practices that can guarantee that a project will sustain a healthy and well-functioning development community. However, following best practices can help improve the results of projects. For example, some practices enable multi-person review before release, which can both help find otherwise hard-to-find technical vulnerabilities and help build trust and a desire for repeated interaction among developers from different organizations.

Developers can self-validate their packages (free of charge) using an online tool to check their adherence to the standards and get a "passing", "silver" or "gold" rating. Passing projects are eligible to display the badge (shown above) as a sign of quality and security, and almost 200 projects qualify as of this writing. A number of R packages have already gone through or started the certification process, including ggplot2, R Consortium projects covr and DBI, and the built-in R package Matrix Matrix. For more details on how R packages can get to a passing certification, and the R Consortium survey they led to the recommendation, see the R Consortium blog post at the link below.

R Consortium: Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results

 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

R Consortium Proposal Accepted!

Posted: 30 Jul 2018 06:00 AM PDT

(This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers)

Today I am happy to announce that my proposal to the R Consortium was accepted!

I first announced that I was submitting a proposal back in March. As a reminder, the proposal has two distinct parts:

  1. Creating a guide to working with US Census data in R.
  2. Creating an R Consortium Working Group focused on US Census Data.

If you'd like to read the proposal in its entirety, you can do so here.

I am currently planning to develop the "Census Guide" in public using github. If you'd like to follow along with the development, you can do so by visiting the github respository and clicking the "watch" button:

View the Github Repository 

The post R Consortium Proposal Accepted! appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Comments