[R-bloggers] Riddler: Can You Roll The Perfect Bowl? (and 6 more aRticles) |
- Riddler: Can You Roll The Perfect Bowl?
- gratia 0.4.1 released
- RSqLParser – tool to parse your SQL queries.
- Learning Shiny for Production
- Superspreading and the Gini Coefficient
- Mimic Excel’s Conditional Formatting in R
- drat 0.1.6: Rewritten macOS binary support
Riddler: Can You Roll The Perfect Bowl? Posted: 31 May 2020 06:41 AM PDT [This article was first published on Posts | Joshua Cook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. FiveThirtyEight's Riddler Express
PlanI will approximate the solution to this puzzle by simulating the game Setup Simulate a single passI split the code into two pieces. The first simulates a bowl with a A single simulation can be run by calling Below are the results from running the simulation at angles between 90 Find the smallest angleThe second part of the code is to find the smallest (narrowest) angle at The purpose of the For efficiency, the while loop uses a memoised version of From the print-out above, we can see how the algorithm jumps back an With that successful proof-of-concept, the following code runs the The simulation took 89 steps. The plot below shows the angle and The following plot shows each of the paths tried, again, coloring the Finally, we can find the approximated angle by taking the smallest angle The algorithm approximates the solution to be: 53.1301 degrees (0.9273 The simulation with this angle is shown in an animated plot below. AcknowledgementsRepetitive tasks were sped up using the To leave a comment for the author, please follow the link and comment on their blog: Posts | Joshua Cook. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
Posted: 31 May 2020 06:00 AM PDT [This article was first published on From the Bottom of the Heap - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. After a slight snafu related to the 1.0.0 release of dplyr, a new version of gratia is out and available on CRAN. This release brings a number of new features, including differences of smooths, partial residuals on partial plots of univariate smooths, and a number of utility functions, while under the hood gratia works for a wider range of models that can be fitted by mgcv. Partial residuals |
RSqLParser – tool to parse your SQL queries. Posted: 30 May 2020 11:01 PM PDT [This article was first published on R – FordoX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. A slow performing query is a ticking bomb which can lead to explosion i.e a huge performance overhead in your application, any time specially when there is load on database servers. And knowing the its and bits of your SQL query is of utmost importance in diffusing the bomb. This is not the only scenario when knowing your SQL is important. From your slow query logs, you might want to find the most used tables and time when a particular table gets maximum hits to do some analysis. This information probably can help you decide upon a time for you to take dumps or fire alter queries on the table. Say for instance, you have a relatively large SQL query embedded in your application code which has probably more than tens of bind variables scattered here and there. For debugging purpose, you might want to replace those variable with your chosen values and fire them in a particular SQL execution tool which does not support dynamic bind variable replacement. To cater to all the needs, I felt there is a need of SQL parser in R and came up with this package – RSqlParser inspired by Java's JSqlParser. This tool will come handy for carrying out many analysis on SQL queries. With this package, you can design your free tool to identify the reasons for your poorly performing queries or to address your various other use cases. RSqlParser is a non-validating SQL parser. It expects syntactically correct SQL statements. It can be used to get various components of SQL statements. Currently, it supports only SELECT statements. MethodsThere are currently 4 methods in the package:
There are many more methods waiting to be released in upcoming versions of the package. Not only that, in upcoming versions, package should be able to parse all DML and DDL statements. Till then, if you are facing any issue using the package, please let me know. To leave a comment for the author, please follow the link and comment on their blog: R – FordoX. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
Posted: 30 May 2020 05:00 PM PDT [This article was first published on Colin Fay, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Hey Shiny devs of the world! I'm leading a training in July about During this online workshop, I'll share a lot of what we know about I'm very excited about this training as it will cover a lot of the We still have a couple of tickets left to this session, so if you want See you there! To leave a comment for the author, please follow the link and comment on their blog: Colin Fay. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
Superspreading and the Gini Coefficient Posted: 30 May 2020 03:00 PM PDT [This article was first published on Theory meets practice..., and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Abstract:We look at superspreading in infectious disease transmission from a statistical point of view. We characterise heterogeneity in the offspring distribution by the Gini coefficient instead of the usual dispersion parameter of the negative binomial distribution. This allows us to consider more flexible offspring distributions. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Public License (GPL v3) license from github. MotivationThe recent Science report on Superspreading during the COVID-19 pandemic by Kai Kupferschmidt has made the dispersion parameter \(k\) of the negative binomial distribution a hot quantity1 in the discussions of how to determine effective interventions. This short blog post aims at understanding the math behind statements such as "Probably about 10% of cases lead to 80% of the spread" and replicate them with computations in R. Warning: This post reflects more my own learning process of what is superspreading than trying to make any statements of importance. SuperspreadingLloyd-Smith et al. (2005) show that the 2002-2004 SARS-CoV-1 epidemic was driven by a small number of events where one case directly infected a large number of secondary cases – a so called superspreading event. This means that for SARS-CoV-1 the distribution of how many secondary cases each primary case generates is heavy tailed. More specifically, the effective reproduction number describes the mean number of secondary cases a primary case generates during the outbreak, i.e. it is the mean of the offspring distribution. In order to address dispersion around this mean, Lloyd-Smith et al. (2005) use the negative binomial distribution with mean \(R(t)\) and over-dispersion parameter \(k\) as a probability model for the offspring distribution. The number of offspring that case \(i\), which got infected at time \(t_i\), causes is given by \[ That the dispersion parameter \(k\) is making epidemiological fame is a little surprising, because it is a parameter in a specific parametric model. A parametric model, which might be inadequate for the observed data. A secondary objective of this post is thus to focus more on describing the heterogeneity of the offspring distribution using classical statistical concepts such as the Gini coefficient. Negative binomial distributed number of secondary casesLet's assume \(k=0.45\) as done in Adam et al. (2020). This is a slightly higher estimate than the \(k=0.1\) estimate by Endo et al. (2020)2 quoted in the Science article. We want to derive statements like "the x% most active spreaders infected y% of all cases" as a function of \(k\). The PMF of the offspring distribution with mean 2.5 and dispersion 0.45 looks as follows: So we observe that 43% of the cases never manage to infect a secondary case, whereas some cases manage to generate more than 10 new cases. The mean of the distribution is checked empirically to equal the specified \(R(t)\) of 2.5: Lloyd-Smith et al. (2005) define a superspreader to be a primary case, which generates more secondary cases than the 99th quantile of the Poisson distribution with mean \(R(t)\). We use this to compute the proportion of superspreaders in our distribution: So 10% of the cases will generate more than 7 new cases. To get to statements such as "10% generate 80% of the cases" we also need to know how many cases those 10% generate out of the 2.5 average. In other words, the superspreaders generate (on average) 1.19 of the 2.5 new cases of a generation, i.e. 48%. These statements can also be made without formulating a superspreader threshold by graphing the cumulative share of the distribution of primary cases against the cumulative share of secondary cases these generate. This is exactly what the Lorenz curve is doing. However, for outbreak analysis it appears clearer to graph the cumulative distribution in decreasing order of the number of offspring, i.e. following Lloyd-Smith et al. (2005) we plot the cumulative share as \(P(Y\geq y)\) instead of \(P(Y \leq y)\). This is a variation of the Lorenz curve, but allows statements such as "the %x cases with highest number of offspring generate %y of the secondary cases". Using the standard formulas to compute the Gini coefficient for a discrete distribution with support on the non-negative integers, i.e. \[ A plot of the relationship between the dispersion parameter and the Gini index, given a fixed value of \(R(t)=2.5\), looks as follows We see that the Gini index converges from above to the Gini index of the Poisson distribution with mean \(R(t)\). In our case this limit is Red Marble Toy ExampleFor the toy example offspring distribution used by Christian Drosten in his Coronavirus Update podcast episode 44 on COVID-19 superspreading (in German). The described hypothetical scenario is translated to an offspring distribution, where a primary case either generates 1 (with probability 9/10) or 10 (with probability 1/10) secondary cases: In other words, when fitting a negative binomial distribution to these data (probably not a good idea) we get a dispersion parameter of 0.59. The Gini coefficient allows for a more sensible description for offspring distributions, which are clearly not negative-binomial. DiscussionThe effect of superspreaders underlines the stochastic nature of the dynamics of an person-to-person transmitted disease in a population. The dispersion parameter \(k\) is conditional on the assumption of a given parametric model for the offspring distribution (negative binomial). The Gini index is an alternative characterisation to measure heterogeneity. However, in both cases the parameters are to be interpreted together with the expectation of the distribution. Estimation of the dispersion parameter is orthogonal to the mean in the negative binomial and its straightforward to also get confidence intervals for it. This is less straightforward for the Gini index. A heavy tailed offspring distribution can make the disease easier to control by targeting intervention measures to restrict superspreading (Lloyd-Smith et al. 2005). The hope is that such interventions are "cheaper" than interventions which target the entire population of infectious contacts. However, the success of such a targeted strategy also depends on how large the contribution of superspreaders really is. Hence, some effort is needed to quantify the effect of superspreaders. Furthermore, the above treatment also underlines that heterogeneity can be a helpful feature to exploit when trying to control a disease. Another aspect of such heterogeneity, namely its influence on the threshold of herd immunity, has recently been invested by my colleagues at Stockholm University (Britton, Ball, and Trapman 2020). LiteratureAdam, DC, P Wu, J Wong, E Lau, T Tsang, S Cauchemez, G Leung, and B Cowling. 2020. "Clustering and Superspreading Potential of Severe Acute Respiratory Syndrome Coronavirus 2 (Sars-Cov-2) Infections in Hong Kong." Research Square. https://doi.org/10.21203/rs.3.rs-29548/v1. Britton, T, F Ball, and P Trapman. 2020. "The Disease-Induced Herd Immunity Level for Covid-19 Is Substantially Lower Than the Classical Herd Immunity Level." https://arxiv.org/abs/2005.03085. Endo, A, Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group, S Abbott, AJ Kucharski, and S Funk. 2020. "Estimating the Overdispersion in Covid-19 Transmission Using Outbreak Sizes Outside China [Version 1; Peer Review: 1 Approved, 1 Approved with Reservations]." Wellcome Open Res. https://doi.org/10.12688/wellcomeopenres.15842.1. Lloyd-Smith, J. O., S. J. Schreiber, P. E. Kopp, and W. M. Getz. 2005. "Superspreading and the Effect of Individual Variation on Disease Emergence." Nature 438 (7066): 355–59. https://doi.org/10.1038/nature04153. To leave a comment for the author, please follow the link and comment on their blog: Theory meets practice.... R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
Mimic Excel’s Conditional Formatting in R Posted: 30 May 2020 02:14 PM PDT [This article was first published on triKnowBits, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The DT package is an interface between R and the JavaScript DataTables library (RStudio DT documentation). In Example 3 (at this page) they show how to heatmap-format a table. This post modifies the example to
Here we generate data similar to that in Example 3, but with average values growing by column set.seed(12345) Using the code in the example — modified to green — the darker values naturally appear in columns V4 and V5. But that's not what we want. For each column to have it's own scale, simply apply RStudio's algorithm to each column of df in a loop. The trick to notice is that formatStyle wants a datatable object as its first argument, and produces a datatable object as its result. Therefore, start off with a plain-Jane datatable and successively format each column, saving the result each time. Almost like building a ggplot. At the end, view the final result. # Start with a (relatively) plain, unformatted datatable object Actuaries in the crowd might recognize the image at the top of the post as the table of link ratios from the GenIns dataset in the ChainLadder package. There do not appear to be any distinctive trends in the ratios by age. To leave a comment for the author, please follow the link and comment on their blog: triKnowBits. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
drat 0.1.6: Rewritten macOS binary support Posted: 30 May 2020 12:01 PM PDT [This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. A new version of drat arrived on CRAN overnight, once again taking advantage of the fully automated process available for such packages with few reverse depends and no open issues. As we remarked at the last release fourteen months ago when we scored the same nice outcome: Being a simple package can have its upsides… This release is mostly the work of Felix Ernst who took on what became a rewrite of how binary macOS packages are handled. If you need to distribute binary packages for macOS users, this may help. Two more small updates were made, see below for full details. drat stands for drat R Archive Template, and helps with easy-to-create and easy-to-use repositories for R packages. Since its inception in early 2015 it has found reasonably widespread adoption among R users because repositories with marked releases is the better way to distribute code. As your mother told you: Friends don't let friends install random git commit snapshots. Rolled-up releases it is. The
Courtesy of CRANberries, there is a comparison to the previous release. More detailed information is on the drat page. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions. This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
You are subscribed to email updates from R-bloggers. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
Comments
Post a Comment