[R-bloggers] Why R? 2020 (Remote) Call for Papers Extended (and 4 more aRticles) | |
- Why R? 2020 (Remote) Call for Papers Extended
- fairmodels: let’s fight with biased Machine Learning models (part 1 — detection)
- Spatial GLMM(s) using the INLA Approximation
- rfm 0.2.2
- FNN-VAE for noisy time series forecasting
Why R? 2020 (Remote) Call for Papers Extended Posted: 31 Jul 2020 09:00 AM PDT
[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This decided to give you one more week to submit a talk or a workshop to Call for Papers for 2020.whyr.pl remote conference. Please fill this form 2020.whyr.pl/submit/ if you are interested in an active participation. The new deadline for submissions is 2020-08-07 23:59 CEST (UTC+2)! Looking forward to your submissions! As the meeting is held in English we invite R users from all over the globe! We will stream the conference on youtube.com/WhyRFoundation. The channel already contains past R webinars and keynote talks from previous conferences. Our Social MediaFind out our Social Media channels: To leave a comment for the author, please follow the link and comment on their blog: http://r-addict.com. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. |
fairmodels: let’s fight with biased Machine Learning models (part 1 — detection) Posted: 31 Jul 2020 08:36 AM PDT
[This article was first published on Stories by Przemyslaw Biecek on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. fairmodels: let's fight with biased Machine Learning models (part 1 — detection)Author: Jakub Wiśniewski ![]() TL;DRThe fairmodels R Package facilitates bias detection through model visualizations. It implements few mitigation strategies that could reduce the bias. It enables easy to use checks for fairness metrics and comparison between different Machine Learning (ML) models. Longer versionFairness in ML is a quickly emerging field. Big companies like IBM or Google developed some tools already (see AIF360) with growing community of users. Unfortunately, there aren't many tools enabling to discover bias and discrimination in machine learning models created in R. Therefore, checking the fairness of the classifier created in R might be a difficult task. This is why R package fairmodels was created. Introduction to fairness conceptsWhat does it mean that model is fair? Imagine we have a classification model which decisions would have some impact on a human. For example, the model must decide whether some individuals will get a loan or not. What we don't want is our model predictions to be based on sensitive (later called protected) attributes such as sex, race, nationality, etc… because it could potentially harm some unprivileged groups of people. However, not using such variables might not be enough because the correlations are usually hidden deep inside the data. That is what fairness in ML is for. It checks if privileged and unprivileged groups are treated similarly and if not, it offers some bias mitigation techniques. There are numerous fairness metrics such as Statistical Parity, Equalized odds, Equal opportunity, and more. They check if model properties on privileged and unprivileged groups are the same ![]() Many of these metrics can be derived from the confusion matrix. For example, Equal opportunity is ensuring the equal rate of TPR (True Positive Rate) among subgroups in the protected variable. However, knowing these rates is not essential information for us. We would like to know whether the difference between these rates between the privileged group and the unprivileged ones is significant. Let's say that the acceptable difference in fairness metrics is 0.1. We will call this epsilon. TPR criterion for this metric would be: ![]() Such a criterion is double-sided. It also ensures that there is not much difference in favour of the unprivileged group. fairmodels as bias detection toolfairmodels is R package for discovering, eliminating, and visualizing bias. Its main function — fairness_check() enables the user to quickly check if popular fairness metrics are satisfied. fairness_check() return an object called fairness_object. It wraps models together with metrics in useful structure. To create this object we need to provide:
So let's see how it works in practice. We will make a linear regression model with german credit data predicting whether a certain person makes more or less than 50k annually. Sex will be used as a protected variable.
library(fairmodels) 2. Create an explainer library(DALEX) 3. Use the fairness_check(). Here the epsilon value is set to default which is 0.1 fobject <- fairness_check(explainer_lm, Now we can check the level of bias ![]() ![]() As we can see checking fairness is not difficult. What is more complicated is comparing the discrimination between models. But even this can be easily done with fairmodels! fairmodels is flexibleWhen we have many models, they can be passed into one fairness_check() together. Moreover, there is possible an iterative approach. As we explain the model and it does not satisfy fairness criteria, we can add other models along with fairness_object to fairness_check(). That way even the same model with different parameters and/or trained on different data can be compared with the previous one(s). library(ranger) rf_model <- ranger(Risk ~., data = german, probability = TRUE) ![]() ![]() That is it. Ranger model passes our fairness criteria (epsilon = 0.1) and therefore is fair. Summaryfairmodels is flexible and easy to use tool for asserting that the ML model is fair. It can handle multiple models, trained on different data no matter if it was encoded, features were standardized, etc… It facilitates the bias detection process in multiple models allowing at the same time to compare those models with each other. Learn more
To leave a comment for the author, please follow the link and comment on their blog: Stories by Przemyslaw Biecek on Medium. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
Spatial GLMM(s) using the INLA Approximation Posted: 30 Jul 2020 05:00 PM PDT
[This article was first published on Corey Sparks R blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The INLA Approach to Bayesian modelsThe Integrated Nested Laplace Approximation, or INLA, approach is a recently developed, computationally simpler method for fitting Bayesian models [(Rue et al., 2009, compared to traditional Markov Chain Monte Carlo (MCMC) approaches. INLA fits models that are classified as latent Gaussian models, which are applicable in many settings (Martino & Rue, 2010. In general, INLA fits a general form of additive models such as: \(\eta = \alpha + \sum_{j=1}^{nf} f^{(j)}(u_{ij}) + \sum_{k=1}^{n\beta}\beta_k z_{ki} + \epsilon_i\) where \(\eta\) is the linear predictor for a generalized linear model formula, and is composed of a linear function of some variables u, \(\beta\) are the effects of covariates, z, and \(\epsilon\) is an unstructured residual (Rue et al., 2009). As this model is often parameterized as a Bayesian one, we are interested in the posterior marginal distributions of all the model parameters. Rue and Martino (2007) show that the posterior marginal for the random effects (x) in such models can be approximated as: \(\tilde{p}(x_i|y) = \sum_k \tilde{p}(x_i|\theta_k, y) \tilde{p}(\theta_k|y) \Delta_k\) via numerical integration (Rue & Martino, 2007; Schrodle & Held, 2011a, 2011b). The posterior distribution of the hyperparameters (\(\theta\)) of the model can also be approximated as: \(\tilde{p}(\theta | y)) \propto \frac{p(x, \theta, y)}{\tilde{p}G(x| \theta,y)} \mid _{x} = x^*(\theta)\) , where G is a Gaussian approximation of the posterior and \(x^*(\theta)\) is the mode of the conditional distribution of \(p(x|\theta,y)\). Thus, instead of using MCMC to find an iterative, sampling-based estimate of the posterior, it is arrived at numerically. This method of fitting the spatial models specified above has been presented by numerous authors (Blangiardo & Cameletti, 2015; Blangiardo et al., 2013; Lindgren & Rue, 2015; Martins et al., 2013; Schrodle & Held, 2011a, 2011b), with comparable results to MCMC. LibrariesDataI have the data on my github site under the nhgis_vs page. These are data from the NHGIS project by IPUMS who started providing birth and death data from the US Vital statistics program. The data we will use here are infant mortality rates in US counties between 2000 and 2007. Census intercensus population estimatesFrom the Census population estimates program Data prep Get census data using tidycensusHere I get data from the 2000 decennial census summary file 3 Create expected numbers of casesIn count data models, and spatial epidemiology, we have to express the raw counts of events relative to some expected value, or population offset, see this Rpub for a reminder. Next we make the spatial information, we get the polygons from census directly using Construction of spatial relationships:Contiguity based neighborsIn a general sense, we can think of a square grid. Cells that share common elements of their geometry are said to be "neighbors". There are several ways to describe these patterns, and for polygons, we generally use the rules of the chess board. Rook adjacency Neighbors must share a line segment Queen adjacency Neighbors must share a vertex or a line segment If polygons share these boundaries (based on the specific definition: rook or queen), they are said to be "spatial neighbors" of one another. The figure below illustrates this principle. For an observation of interest, the pink area, the Rood adjacent areas are those in green in the figure, because they share a line segment. For the second part of the figure on the right, the pink area has different sets of neighbors, compared to the Rook rule neighbors, because the area also shares vertices with other polygons, making them Queen neighbors. ![]() Adjacency using Chessboard Rules Order of adjacencyThe figure above also highlights the order of adjacency among observations. By order of adjacency, we simply men that observations are either immediate neighbors (the green areas), or they are neighbors of immediate neighbors. These are referred to as first and second order neighbors. So, we can see, that the yellow polygons are the neighboring areas for this tract, which allows us to think about what the spatial structure of the area surrounding this part of campus. For an example, let's consider the case of San Antonio again. If our data are polygons, then there is a function in the Distance based associationThe queen and rook rules are useful for polygon features, but distance based contiguity is useful for all feature types (points, polygons, lines). The idea is similar to the polygon adjacency rule from above, but the distance rule is based on the calculated distance between areas. There are a variety of distance metrics that are used in statistics, but the most commonly assumed one is the Euclidean distance. The Euclidean distance between any two points is: \[D^2 = \sqrt{\left (x_1 – x_2 \right)^2 + \left (y_1 – y_2 \right)^2 } \] Where x and y are the coordinates of each of the two areas. For polygons, these coordinates are typically the centroid of the polygon (you may have noticed this above when we were plotting the neighbor lists), while for point features, these are the two dimensional geometry of the feature. The collection of these distances between all features forms what is known as the distance matrix between observations. This summarizes all distances between all features in the data. K nearest neighbors
Plot geographies Model setup
We can fit these model using the Bayesian framework with INLA. First, we consider the basic GLM for the mortality outcome, with out any hierarchical structure. We can write this model as a Negative Binomial model, for instance as: \[\text{Deaths}_{ij} \sim NB(\mu_{ij}, \gamma)\] \[\mu_{ij} = \text{log(E_d)}_{ij} + X' \beta\] INLA will use vague Normal priors for the \(\beta\)'s, and we have other parameters in the model to specify priors for. INLA does not require you to specify all priors, as all parameters have a default prior specification. In this example, I will use a \(Gamma(1, .5)\) prior for all hierarchical variance terms. Plot our observed vs fitted values Basic county level random intercept modelNow we add basic nesting of rates within counties, with a random intercept term for each county. This would allow there to be heterogeneity in the mortality rate for each county, over and above each county's observed characteristics. This model would be: \[\text{Deaths}_{ij} \sim NB(\mu_{ij}, \gamma)\] \[\mu_{ij} = \text{log(E_d)}_{ij} + X' \beta + u_j\] \[u_j \sim \text{Normal} (0 , \tau_u)\] where \(\tau_u\) here is the precision, not the variance and precision = 1/variance. INLA puts a log-gamma prior on the the precision by default. Marginal Distributions of hyperparametersWe can plot the posterior marginal of the hyperparameter in this model, in this case \(\sigma_u = 1/\tau_u\) BYM ModelModel with spatial correlation – Besag, York, and Mollie (1991) model and temporal heterogeneity \[\text{Deaths}_{ij} \sim NB(\mu_{ij}, \gamma)\] \[\mu_{ij} = \text{log(E_d)}_{ij} + X' \beta + u_j + v_j + \gamma_t\] Which has two random effects, one an IID random effect and the second a spatially correlated random effect, specified as a conditionally auto-regressive prior for the \(v_j\)'s. This is the Besag model: \[v_j|v_{\neq j},\sim\text{Normal}(\frac{1}{n_i}\sum_{i\sim j}v_j,\frac{1}{n_i\tau})\] and \(u_j\) is an IID normal random effect, \(\gamma_t\) is also given an IID Normal random effect specification, and there are now three hyperparameters, \(\tau_u\) and \(\tau_v\) and \(\tau_{\gamma}\) and each are given log-gamma priors. For the BYM model we must specify the spatial connectivity matrix in the random effect. This indicates very low spatially correlated variance in these data. Space-time mapping of the fitted values Map of spatial random effectsIt is common to map the random effects from the BYM model to look for spatial trends, in this case, there are not strong spatial signals: Exceedence probabilitiesIn Bayesian spatial models that are centered on an epidemiological type of outcome, it is common to examine the data for spatial clustering. One way to do this is to examine the clustering in the relative risk from one of these GLMM models. For instance if \(\theta\) is the relative risk \[\theta = exp(\beta_0 + \beta_1*x_1 + u_j)\] from one of our Negative binomial models above. We can use the posterior marginals of the relative risk to ask \(\theta \gt \theta^*\) where \(\theta^*\) is a specific level of excess risk, say 50% extra or \(\theta > 1.25\). If the density, or \(\text{Pr}(\theta \gt \theta^*)\) is high, then there is evidence that the excess risk is not only high, but significantly high. To get the exceedence probabilities from one of our models, we can use the So, we see lots of occasions where the exceedence probability is greater than .9. We can visualize these in a map. Which shows several areas of the south where risk the infant mortality rate is signficantly higher than the national rate, with high posterior probability. ReferencesBesag, J., York, J., & Mollie, a. (1991). Bayesian image-restoration, with 2 applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(1), 1-20. https://doi.org/10.1007/BF00116466 To leave a comment for the author, please follow the link and comment on their blog: Corey Sparks R blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
Posted: 30 Jul 2020 05:00 PM PDT
[This article was first published on Rsquared Academy Blog - Explore Discover Learn, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. We're excited to announce the release of In this blog post, we will summarize the changes implemented in the current ( SegmentationIn previous versions, We are grateful to @leungi for bringing this to our attention and also for fixing it. Now, In the above example, the interval used to define the Champions segment is a subset of Loyal Customers. In the previous versions, those customers who Visualization
From version 0.2.1, all plotting functions use an additional argument Custom Threshold for RFM ScoresLots of users wanted to know the threshold used for generating the RFM scores. From version 0.2.1, Another request (see here) was to be able to use custom or user specific threshold for generating RFM score. If you look at the above output, we have 5 bins/scores and there are six different values. Let us focus on the Let us look at the quantiles used for generating the scores. The intervals are created in the below style:
Since rfm uses left closed intervals to generate the scores, we add We have used the values from the threshold table to reproduce the earlier result. If you observe carefully, we have specified We have tried our best to explain how to use custom threshold but completely understand that it can be confusing to implement at beginning. If you have any questions about this method, feel free to write to us at support@rsquaredacademy.com and our team will be happy to help you. AcknowledgementsWe are grateful to @leungi, @gfagherazzi and @DavidGarciaEstaun for their inputs. Learning MoreFeedback*As the reader of this blog, you are our most important critic and commentator. We welcome your comments. You can email to let us know what you did or did not Email: support@rsquaredacademy.com To leave a comment for the author, please follow the link and comment on their blog: Rsquared Academy Blog - Explore Discover Learn. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
FNN-VAE for noisy time series forecasting Posted: 29 Jul 2020 05:00 PM PDT
[This article was first published on RStudio AI Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ") training_loop_vae(ds_train) test_batch <- as_iterator(ds_test) %>% iter_next() encoded <- encoder(test_batch[[1]][1:1000]) test_var <- tf\(math\)reduce_variance(encoded, axis = 0L) print(test_var %>% as.numeric() %>% round(5)) } "` Experimental setup and dataThe idea was to add white noise to a deterministic series. This time, the Roessler system was chosen, mainly for the prettiness of its attractor, apparent even in its two-dimensional projections: ![]() (#fig:unnamed-chunk-1)Roessler attractor, two-dimensional projections. Like we did for the Lorenz system in the first part of this series, we use Then, noise is added, to the desired degree, by drawing from a normal distribution, centered at zero, with standard deviations varying between 1 and 2.5. Here you can compare effects of not adding any noise (left), standard deviation-1 (middle), and standard deviation-2.5 Gaussian noise: ![]() (#fig:unnamed-chunk-4)Roessler series with added noise. Top: none. Middle: SD = 1. Bottom: SD = 2.5. Otherwise, preprocessing proceeds as in the previous posts. In the upcoming results section, we'll compare forecasts not just to the "real", after noise addition, test split of the data, but also to the underlying Roessler system – that is, the thing we're really interested in. (Just that in the real world, we can't do that check.) This second test set is prepared for forecasting just like the other one; to avoid duplication we don't reproduce the code. ResultsThe LSTM used for comparison with the VAE described above is identical to the architecture employed in the previous post. While with the VAE, an As a result, in all cases, there was one latent variable with high variance and a second one of minor importance. For all others, variance was close to 0. In all cases here means: In all cases where FNN regularization was used. As already hinted at in the introduction, the main regularizing factor providing robustness to noise here seems to be FNN loss, not KL divergence. So for all noise levels, besides FNN-regularized LSTM and VAE models we also tested their non-constrained counterparts. Low noiseSeeing how all models did superbly on the original deterministic series, a noise level of 1 can almost be treated as a baseline. Here you see sixteen 120-timestep predictions from both regularized models, FNN-VAE (dark blue), and FNN-LSTM (orange). The noisy test data, both input ( ![]() (#fig:unnamed-chunk-6)Roessler series with added Gaussian noise of standard deviation 1. Grey: actual (noisy) test data. Green: underlying Roessler system. Orange: Predictions from FNN-LSTM. Dark blue: Predictions from FNN-VAE. Despite the noise, forecasts from both models look excellent. Is this due to the FNN regularizer? Looking at forecasts from their unregularized counterparts, we have to admit these do not look any worse. (For better comparability, the sixteen sequences to forecast were initiallly picked at random, but used to test all models and conditions.) ![]() (#fig:unnamed-chunk-7)Roessler series with added Gaussian noise of standard deviation 1. Grey: actual (noisy) test data. Green: underlying Roessler system. Orange: Predictions from unregularized LSTM. Dark blue: Predictions from unregularized VAE. What happens when we start to add noise? Substantial noiseBetween noise levels 1.5 and 2, something changed, or became noticeable from visual inspection. Let's jump directly to the highest-used level though: 2.5. Here first are predictions obtained from the unregularized models. ![]() (#fig:unnamed-chunk-8)Roessler series with added Gaussian noise of standard deviation 2.5. Grey: actual (noisy) test data. Green: underlying Roessler system. Orange: Predictions from unregularized LSTM. Dark blue: Predictions from unregularized VAE. Both LSTM and VAE get "distracted" a bit too much by the noise, the latter to an even higher degree. This leads to cases where predictions strongly "overshoot" the underlying non-noisy rhythm. This is not surprising, of course: They were trained on the noisy version; predict fluctuations is what they learned. Do we see the same with the FNN models? ![]() (#fig:unnamed-chunk-9)Roessler series with added Gaussian noise of standard deviation 2.5. Grey: actual (noisy) test data. Green: underlying Roessler system. Orange: Predictions from FNN-LSTM. Dark blue: Predictions from FNN-VAE. Interestingly, we see a much better fit to the underlying Roessler system now! Especially the VAE model, FNN-VAE, surprises with a whole new smoothness of predictions; but FNN-LSTM turns up much smoother forecasts as well. "Smooth, fitting the system…" – by now you may be wondering, when are we going to come up with more quantitative assertions? If quantitative implies "mean squared error" (MSE), and if MSE is taken to be some divergence between forecasts and the true target from the test set, the answer is that this MSE doesn't differ much between any of the four architectures. Put differently, it is mostly a function of noise level. However, we could argue that what we're really interested in is how well a model forecasts the underlying process. And there, we see differences. In the following plot, we contrast MSEs obtained for the four model types (grey: VAE; orange: LSTM; dark blue: FNN-VAE; green: FNN-LSTM). The rows reflect noise levels (1, 1.5, 2, 2.5); the columns represent MSE in relation to the noisy("real") target (left) on the one hand, and in relation to the underlying system on the other (right). For better visibility of the effect, MSEs have been normalized as fractions of the maximum MSE in a category. So, if we want to predict signal plus noise (left), it is not extremely critical whether we use FNN or not. But if we want to predict the signal only (right), with increasing noise in the data FNN loss becomes increasingly effective. This effect is far stronger for VAE vs. FNN-VAE than for LSTM vs. FNN-LSTM: The distance between the grey line (VAE) and the dark blue one (FNN-VAE) becomes larger and larger as we add more noise. ![]() (#fig:unnamed-chunk-10)Normalized MSEs obtained for the four model types (grey: VAE; orange: LSTM; dark blue: FNN-VAE; green: FNN-LSTM). Rows are noise levels (1, 1.5, 2, 2.5); columns are MSE as related to the real target (left) and the underlying system (right). Summing upOur experiments show that when noise is likely to obscure measurements from an underlying deterministic system, FNN regularization can strongly improve forecasts. This is the case especially for convolutional VAEs, and probably convolutional autoencoders in general. And if an FNN-constrained VAE performs as well, for time series prediction, as an LSTM, there is a strong incentive to use the convolutional model: It trains significantly faster. With that, we conclude our mini-series on FNN-regularized models. As always, we'd love to hear from you if you were able to make use of this in your own work! Thanks for reading! // add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });
To leave a comment for the author, please follow the link and comment on their blog: RStudio AI Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
You are subscribed to email updates from R-bloggers. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
Comments
Post a Comment