[R-bloggers] Covid-19 interactive map (using R with shiny, leaflet and dplyr) (and 8 more aRticles)

[R-bloggers] Covid-19 interactive map (using R with shiny, leaflet and dplyr) (and 8 more aRticles)

Link to R-bloggers

Covid-19 interactive map (using R with shiny, leaflet and dplyr)

Posted: 12 Mar 2020 12:29 PM PDT

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The departement of Public Health of the Strasbourg University Hospital (GMRC, Prof. Meyer) and the Laboratory of Biostatistics and Medical Informatics of the Strasbourg  Medicine Faculty (Prof. Sauleau), to the extent of their means and expertise, are contributing to the fight against Covid-19 infection. Doctor Fabacher has produced an interactive map for global monitoring of the infection, accessible at :

https://thibautfabacher.shinyapps.io/covid-19/

This map, which complements the Johns Hopkins University tool (Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE), focuses on the evolution of the number of cases per country and for a given period, but in terms of incidence and prevalence. It is updated daily.
The period of interest can be defined by the user and it is possible to choose  :

  • The count of new cases over a period of time or the same count in relation to the population of the country (incidence).
  • The total case count over a period of time or the same count reported to the population (prevalence).

This map is made using R with shiny, leaflet and dplyr packages.

Code available here :

https://github.com/DrFabach/Corona/blob/master/shiny.r

Reference
Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis; published online Feb 19. https://doi.org/10.1016/S1473-3099(20)30120-1.

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Keep Calm and Use vtreat (in R and in Python)

Posted: 12 Mar 2020 11:56 AM PDT

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A big thank you to Dmytro Perepolkin for sharing a "Keep Calm and Use vtreat" poster!

ES0Q3zOX0AALwR5

Also, we have translated the Python vtreat steps from our recent "Cross-Methods are a Leak/Variance Trade-Off" article into R vtreat steps here.

This R-port demonstrates the new to R fit/prepare notation!

We want vtreat to be a platform agnostic (works in R, works in Python, works elsewhere) well documented standard methodology.

To this end: Nina and I have re-organized the basic vtreat use documentation as follows:

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Corona in Belgium

Posted: 12 Mar 2020 07:14 AM PDT

[This article was first published on bnosac :: open analytical helpers - bnosac :: open analytical helpers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I lost a few hours this afternoon when digging into the Corona virus data mainly caused by reading this article at this website which gives a nice view on how to be aware of potential issues which can arise when collecting data and to be aware of hidden factors and it also shows Belgium.

As a Belgian, I was interested to see how Corona might impact our lives in the next weeks and out of curiosity I was interested to see how we are doing compared to other countries regarding containment of the Corona virus outspread – especially since we still do not have a government in Belgium after elections 1 year ago. In what follows, I'll be showing some graphs using data available at https://github.com/CSSEGISandData/COVID-19 (it provides up-to-date statistics on Corona cases). If you want to reproduce this, pull the repository and just execute the following R code shown.

Data

Let's see first if the data is exactly what is shown at our National Television.

library(data.table)
library(lattice)
x <- list.files("csse_covid_19_data/csse_covid_19_daily_reports/", pattern = ".csv", full.names = TRUE)
x <- data.frame(file = x, date = substr(basename(x), 1, 10), stringsAsFactors = FALSE)
x <- split(x$file, x$date)
x <- lapply(x, fread)
x <- rbindlist(x, fill = TRUE, idcol = "date")
x$date <- as.Date(x$date, format = "%m-%d-%Y")
x <- setnames(x, 
              old = c("date", "Country/Region", "Province/State", "Confirmed", "Deaths", "Recovered"),
              new = c("date", "region", "subregion", "confirmed", "death", "recovered"))
x <- subset(x, subregion %in% "Hubei" |
region %in% c("Belgium", "France", "Netherlands", "Spain", "Singapore", "Germany", "Switzerland", "Italy"))

x$area <- ifelse(x$subregion %in% "Hubei", x$subregion, x$region)
x <- x[!duplicated(x, by = c("date", "area")), ]
x <- x[, c("date", "area", "confirmed", "death", "recovered")]
subset(x, area %in% "Belgium" & confirmed > 1)

Yes, the data from https://github.com/CSSEGISandData/COVID-19  looks correct indeed. Same numbers as reported on the Belgian Television. 

date area confirmed death recovered
2020-03-01 Belgium 2 0 1
2020-03-02 Belgium 8 0 1
2020-03-03 Belgium 13 0 1
2020-03-04 Belgium 23 0 1
2020-03-05 Belgium 50 0 1
2020-03-06 Belgium 109 0 1
2020-03-07 Belgium 169 0 1
2020-03-08 Belgium 200 0 1
2020-03-09 Belgium 239 0 1
2020-03-10 Belgium 267 0 1
2020-03-11 Belgium 314 3 1

Exponential number of cases of Corona

Now is the outbreak really exponential. Let's make some graphs.

What is clear when looking at the plots is that indeed infections happen at a exponential scale except in Singapore where the government managed to completely isolate the Corona cases, while in Belgium and other European countries the government lacked the opportunity to isolate the Corona cases and we are now in a phase of trying to slow down to reduce the impact.

corona1

You can reproduce the plot as follows

trellis.par.set(strip.background = list(col = "lightgrey"))
xyplot(confirmed ~ date | area, data = x, type = "b", pch = 20, 
scales = list(y = list(relation = "free", rot = 0), x = list(rot = 45, format = "%A %d/%m")), 
layout = c(5, 2), main = sprintf("Confirmed cases of Corona\n(last date in this graph is %s)", max(x$date)))

Compare to other countries – onset

It is clear that the onset of Corona is different in each country. Let's define the day 0 as the day where 75 persons had Corona in the country. That will allow us to compare different countries. In Belgium we started to have more than 75 patients with Corona on Friday 2020-03-06.  In the Netherlands that was one day earlier. 

date area confirmed
2020-01-22 Hubei 444
2020-02-17 Singapore 77
2020-02-23 Italy 155
2020-02-29 Germany 79
2020-02-29 France 100
2020-03-01 Spain 84
2020-03-04 Switzerland 90
2020-03-05 Netherlands 82
2020-03-06 Belgium 109

Reproduce as follows:

x <- x[order(x$date, x$area, decreasing = TRUE), ]
x <- x[, days_since_case_onset := as.integer(date - min(date[confirmed > 75])), by = list(area)]
x <- x[, newly_confirmed := as.integer(confirmed - shift(confirmed, n = 1, type = "lead")), by = list(area)]
onset <- subset(x, days_since_case_onset == 0, select = c("date", "area", "confirmed"))
onset[order(onset$date), ]

Compare to other countries – what can we expect?

Now are we doing better than other countries in the EU. Following plot shows the log of the number of people diagnosed as having Corona since the onset date shown above. It looks like Belgium has learned from the issues in Italy but it still hasn't learned the way to deal with the virus outbreak the same as e.g. Singapore has done.

Based on the blue line, we can expect Belgium to have next week between roughly 1100 confirmed cases (log(1100)=7) or if we follow the trend of France that would be roughly 3000 (log(3000)=8) patients with Corona. We hope that it is only the first.

corona2 

Reproduce as follows:

xyplot(log(confirmed) ~ days_since_case_onset | "Log(confirmed cases) of Corona since onset of sick person nr 75", 
groups = area,
data = subset(x, days_since_case_onset >= 0 &
area %in% c("Hubei", "France", "Belgium", "Singapore", "Netherlands", "Italy")),
xlab = "Days since Corona onset (confirmed case 75)", ylab = "Log of number of confirmed cases",
auto.key = list(space = "right", lines = TRUE),
type = "b", pch = 20, lwd = 2) 

Compared to the Netherlands

Now, are we doing better than The Netherlands? Currently it looks like we are. But time will tell for the future. Give the above trend shown above, I can only hope everyone in Belgium follows the government guidelines as strict as possible.

corona3

 

Reproduce as follows:

xyplot(newly_confirmed ~ date | "Newly confirmed cases of Corona", groups = area,
data = subset(x, area %in% c("Belgium", "Netherlands") & date > as.Date("2020-03-01")),
xlab = "Date", ylab = "Number of new Corona cases",
scales = list(x = list(rot = 45, format = "%A %d/%m", at = seq(as.Date("2020-03-01"), Sys.Date(), by = "day"))),
auto.key = list(space = "right", lines = TRUE),
type = "b", pch = 20, lwd = 2)

To leave a comment for the author, please follow the link and comment on their blog: bnosac :: open analytical helpers - bnosac :: open analytical helpers.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Dear Data Scientists – how to ease your job!

Posted: 12 Mar 2020 01:37 AM PDT

[This article was first published on R-Bloggers – eoda GmbH, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

You are the modern Indiana Jones of digitalization! Always on the lookout to increase knowledge, show relations and take your organization to the next level. Just like the great discoverers like Leif Eriksson, Vasco da Gama and Lewis and Clark, you never let yourself be stopped from pursuing an idea and discovering new possibilities. As Margaret Hamilton, Katherine Johnson or Marie Curie you recognize structures, patterns and find ways to solve problems or make processes more efficient. And just like these people you have a huge pool of methods, tools and knowledge that you use every day. With your skills you can simplify the daily work of your colleagues, relieve them and get things straight concerning the data jungle. Your own work is often difficult and inconvenient.

Every day you are faced with a multitude of challenges in your projects: How do you get feedback from the departments, how do you share your scripts with other data scientists or how do you distribute your scripts quickly and easil? You quickly want to connect and access data? Performance and compliance are less interesting for you than making your results available to different groups?

Wouldn't it be great if you also had a solution that would make your work easier?

YUNA – The data science platform from data scientists for data scientists

Let's consider the common data science languages R, Python and Julia. Maybe you were involved in projects where analysis scripts were available in different languages. A platform that would be able to use all these languages, regardless of the language, but still fully, with all their (connectivity) packages and libraries – that would be something, wouldn't it? If you could continue using your usual IDEs, you wouldn't even have to change your work.

Imagine you could run the scripts you develop in a scalable environment. In an environment where you wouldn't have to worry about data retrieval, user queries and data sources. And when all agents work together, thanks to dynamic load balancing, you and your analysis deserve the „big" in „big data".

Pair the above examples with advanced script execution logging, so you'll know exactly what your scripts are always outputting. Imagine a platform where parameterized, automated, sequential script execution is standard. Imagine being able to evaluate and optimize your scripts before, after and during production?

Or in short: Would it be exciting to work in a software solution that was developed in collaboration with you? By people who experience exactly the same things as you do every day.

With this idea YUNA was developed – The data science platform by data scientists for data scientists

Data Science is a team sport

In data science projects, you are rarely a lone fighter – even if you are the only person who is really involved in the field. Be it in the conception of the use case , the presentation of results and the planning of the next project. You often work together with different people who have their very own requirements – and you often have to be the translator so that the others understand your complex work.

With YUNA, many questions can be answered by tracing results back to the actual data source, such as machine sensors. Business users can set business questions to which you will find the answers. Coordination paths are greatly reduced, and you can concentrate on the essential – data science, your passion, your job.

Find more information about YUNA at: https://www.eoda.de/en/leistungen/yuna

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – eoda GmbH.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

All you need to know on clustering with Factoshiny…

Posted: 11 Mar 2020 09:56 PM PDT

[This article was first published on François Husson, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The function Factoshiny of the package Factoshiny proposes a complete clustering strategy that allows you:

  • to draw a hierarchical tree and a partition
  • to describe and characterize the clusters by quantitative and categorical variables
  • to consider lots of individuals thanks to the complementarity of Kmeans and clustering algorithms
  • to consider categorical variables or contingency tables

Implementation with R software

See this video and the audio transcription of this video:

Course videos

Theorectical and practical informations on clustering are available in these 4 course videos (here are the slides and the audio transcription of the courses):

 

Introduction

 

 

 

 

Materials

Here is the material used in the videos:

 

To leave a comment for the author, please follow the link and comment on their blog: François Husson.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Top 5 R resources on COVID-19 Coronavirus

Posted: 11 Mar 2020 05:00 PM PDT

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Photo by CDC

Photo by CDC

The Coronavirus is a serious concern around the globe. With its expansion, there are also more and more online resources about it. This article presents a selection of the best R resources on the COVID-19 virus.

This list is by no means exhaustive. I am not aware of all R resources available online about the Coronavirus, so please feel free to let me know in the comments or by contacting me if you believe that another resource (R package, Shiny app, R code, data, etc.) deserves to be on this list.

R Shiny apps

Coronavirus tracker

Developed by John Coene, this Shiny app tracks the spread of the coronavirus, based on three data sources (John Hopkins, Weixin and DXY Data). The Shiny app, built with shinyMobile (which makes it responsive on different screen sizes), presents in a really nice way the number of deaths, confirmed, suspected and recovered cases by time and region.

The code is available on GitHub.

COVID-19 outbreak

Developed by the department of Public Health of the Strasbourg University Hospital and the Laboratory of Biostatistics and Medical Informatics of the Strasbourg Medicine Faculty, this Shiny app shows an interactive map for global monitoring of the infection. It focuses on the evolution of the number of cases per country and for a given period in terms of incidence and prevalence.

The code is available on GitHub.

R packages

{nCov2019}

The {nCov2019} package gives you access to epidemiological data on the coronavirus outbreak.1 The package gives real-time statistics and includes historical data. The vignette explains the main functions and possibilities of the package.

Furthermore, the authors of the package also developed a website with interactive plots and time-series forecasts, which could be useful in informing the public and studying how the virus spread in populous countries.

R code

Analyzing COVID-19 outbreak data with R

Written by Tim Churches, these two articles (part 1 and part 2) explore the R tools and packages that might be used to analyze the COVID-19 data. Moreover, it presents R code to analyze how contagious is the Coronavirus thanks to the SIR model (an epidemiological model).

The code is available on GitHub (part 1 and part 2).

COVID-19 Data Analysis with {tidyverse} and {ggplot2}

An analysis of data around the Coronavirus with the {tidyverse} and {ggplot2} packages, for China and world wide.

Both documents are a mix of data cleaning, data processing and visualizations of the confirmed/cured cases and death rates across countries or regions.

Data

Thanks for reading. I hope you will find these R resources on the COVID-19 Coronavirus useful. Feel free to let me know in the comments if I missed one.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion. If you find a mistake or bug, you can inform me by raising an issue on GitHub. For all other requests, you can contact me here.

Get updates every time a new article is published by subscribing to this blog.

Related articles:


  1. The package has also been the subject of a preprint.↩

To leave a comment for the author, please follow the link and comment on their blog: R on Stats and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Persistent config and data for R packages

Posted: 11 Mar 2020 05:00 PM PDT

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Does your R package work best with some configuration?
You probably want it to be easily found by your package.
Does your R package download huge datasets that don't change much on the provider side?
Maybe you want to save the corresponding data somewhere persistent so that things will go faster during the next R session.
In this blog post we shall explain how an R package developer can go about using and setting persistent configuration and data on the user's machine.

Preface: standard locations on the user's machine

Throughout this post we'll often refer to standard locations on the user's machine.
As explained by Gábor Csárdi in an R-pkg-devel email, "Applications can actually store user level configuration information, cached data, logs, etc. in the user's home directory, and there is a standard way to do this [depending on the operating system]."
R packages that are on CRAN cannot write to the home directory without getting confirmation from the user, but they can and should use standard locations.
To find where those are, package developers can use the rappdirs package.

# Using a reference class object  rhub_app <- rappdirs::app_dir("rhub", "r-hub")  rhub_app$cache()  
## [1] "/home/maelle/.cache/rhub"  
# or functions  rappdirs::user_cache_dir("rhub")  
## [1] "/home/maelle/.cache/rhub"  

On top of these non-R specific standard locations, we'll also mention the standard homes of R options and environment variables, .Rprofile and .Renviron.

User preferences

As written in Android developer guidance and probably every customer service guide ever, "Everyone likes it when you remember their name".
Everyone probably likes it too when the barista at their favourite coffee shop remembers their usual order.
As an R package developer, what can you do for your R package to correctly assess user preferences and settings?

Using options

In R, options allow the user to set and examine a variety of global options which affect the way in which R computes and displays its results. For instance, for the usethis package, the usethis.quiet option can control whether usethis is chatty1. Users either:

Users can use a project-level or more global user-level .Rprofile.
The use of a project-level .Rprofile overrides the user-level .Rprofile unless the project-level .Rprofile contains the following lines as mentioned in the blogdown book:

# in .Rprofile of the project  if (file.exists('~/.Rprofile')) {    base::sys.source('~/.Rprofile', envir = environment())  }  # then set project options  

For more startup tweaks, the user could adopt the startup package.

As a package developer in your code you can retrieve options by using getOption() whose second argument is a fallback for when the option hasn't been set by the user.
Note that an option can be any R object.

options(blabla.foo = TRUE)  if (isTRUE(getOption("blabla.foo", FALSE))) {    message("foo!")  }  
## foo!  
options(blabla.bar = mean)  getOption("blabla.bar")(c(1:7))  
## [1] 4  

The use of options in the .Rprofile startup file is great for workflow packages like usethis, blogdown, etc., but shouldn't be used for, say, arguments influencing the results of a statistical function.

Using environment variables

Environment variables, found via Sys.getenv() rather than getOption(), are often used for storing secrets (like GITHUB_PAT for the gh package) or the path to secrets on disk (like TWITTER_PAT for rtweet), or not secrets (e.g. the browser to use for chromote).

Similar to using options() in the console or at the top of a script the user could use Sys.setenv().
Obviously, secrets should not be written at the top of a script that's public.
To make environment variables persistent they need to be stored in a startup file, .Renviron.
.Renviron does not contain R code like .Rprofile, but rather key-value pairs that are only called via Sys.getenv().

As a package developer, you probably want to at least document how to set persistent variables or provide a link to such documentation; and you could even provide helper functions like what rtweet does.

Using credential stores for secrets

Although say API keys are often stored in .Renviron, they could also be stored in a standard and more secure location depending on the operating system.
The keyring package allows to interact with such credential stores.
You could either take it on as a dependency like e.g. gh, or recommend the user of your package to use keyring and to add a line like

Sys.setenv(SUPERSECRETKEY = keyring::key_get("myservice"))  

in their scripts.

Using a config file

The batchtools package expect its users to setup a config file somewhere if they don't want to use the defaults.
That somewhere can be several locations, as explained in the batchtools::findConfFile() manual page.
Two of the possibilities are rappdirs::user_config_dir("batchtools", expand = FALSE) and rappdirs::site_config_dir("batchtools") which refer to standard locations that are different depending on the operating system.

The golem package offers its users the possibility to use a config file based on the config package.

A good default experience

Obviously, on top of letting users set their own preferences, you probably want your package functions to have sensible defaults. 😁

Asking or guessing?

For basic information such as username, email, GitHub username, the whoami package does pretty well.

whoami::whoami()  
##                 username                 fullname            email_address   ##                 "maelle"          "Maëlle Salmon" "maelle.salmon@yahoo.se"   ##              gh_username   ##                 "maelle"  
whoami::email_address()  
## [1] "maelle.salmon@yahoo.se"  

In particular, for the email address, if the R environment variable EMAIL isn't set, whoami uses a call to git to find Git's global configuration.
Similarly, the gert package can find and return Git's preferences via gert::git_config_global()2.

In these cases where packages guess something, their guessing is based on the use of standard locations for such information on different operating systems.
Unsurprisingly, in the next section, we'll recommend using such standard locations when caching data.

Not so temporary files3

To quote Android developers guide again, "Persist as much relevant and fresh data as possible.".

A package that exemplifies doing so is getlandsat that downloads "Landsat 8 data from AWS public data sets" from the web.
The first time the user downloads an image, the result is cached so next time no query needs to be made.
A very nice aspect of getlandsat is its providing cache management functions

library("getlandsat")  # list files in cache  lsat_cache_list()  
## [1] "/home/maelle/.cache/landsat-pds/L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B3.TIF"  ## [2] "/home/maelle/.cache/landsat-pds/L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B4.TIF"  ## [3] "/home/maelle/.cache/landsat-pds/L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B7.TIF"  
# List info for single files  lsat_cache_details(files = lsat_cache_list()[1])  
##   ##   directory: /home/maelle/.cache/landsat-pds  ##   ##   file: /L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B3.TIF  ##   size: 64.624 mb  
lsat_cache_details(files = lsat_cache_list()[2])  
##   ##   directory: /home/maelle/.cache/landsat-pds  ##   ##   file: /L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B4.TIF  ##   size: 65.36 mb  
# List info for all files  lsat_cache_details()  
##   ##   directory: /home/maelle/.cache/landsat-pds  ##   ##   file: /L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B3.TIF  ##   size: 64.624 mb  ##   ##   file: /L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B4.TIF  ##   size: 65.36 mb  ##   ##   file: /L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B7.TIF  ##   size: 62.974 mb  
# delete files by name in cache  # lsat_cache_delete(files = lsat_cache_list()[1])    # delete all files in cache  # lsat_cache_delete_all()  

The getlandasat uses the rappdirs package we mentioned earlier.

lsat_path <- function() rappdirs::user_cache_dir("landsat-pds")  

When using rappdirs, keep caveats in mind.

If you hesitate to use e.g. rappdirs::user_cache_dir() vs rappdirs::user_data_dir(), use a GitHub code search.

rappdirs or not

To use an app directory from within your package you can use rappdirs as mentioned earlier, but also other tools.

  • Package developers might also like the hoardr package that basically creates an R6 object building on rappdirs with a few more methods (directory creation, deletion).


  • Some package authors "roll their own" like Henrik Bengtsson in R.cache.


More or less temporary solutions

This section presents solutions for caching results very temporarily, or less temporarily.

Caching results within an R session

To cache results within an R session, you could use a temporary directory for data.
For any function call you could use memoise that supports, well memoization which is best explained with an example.

time <- memoise::memoise(Sys.time)  time()  
## [1] "2020-03-12 11:03:10 CET"  
Sys.sleep(1)  time()  
## [1] "2020-03-12 11:03:10 CET"  

Only the first call to time() actually calls Sys.time(), after that the results is saved for the entire session unless memoise::forget() is called.
It is great for speeding up code, and for not abusing internet resources which is why the polite package wraps memoise.

Providing a ready-to-use dataset in a non-CRAN package

If your package depends on the use of a huge dataset, the same for all users, that is by definition too huge for CRAN, you can use a setup like the one presented by Brooke Anderson and Dirk Eddelbuettel in which the data is packaged up in a separate package not on CRAN, that the user will install therefore saving the data on disk somewhere where you can find it easily.5

Conclusion

In this blog post we presented ways of saving configuration options and data in a not so temporary way in R packages.
We mentioned R startup files (options in .Rprofile and secrets in .Renviron, the startup package); the rappdirs and hoardr packages as well as an exciting similar feature in R devel; the keyring package.
Writing in the user home directory can be viewed as invasive (and can trigger CRAN archival), hence there is a need for a good package design (asking for confirmation; providing cache management functions like getlandsat does) and documentation for transparency.
Do you use any form of caching on disk with a default location in one of your packages?
Do you know where your rhub email token lives?6 😉

Many thanks to Christophe Dervieux for useful feedback on this post!


  1. Note that in tests usethis suppresses the chatty behaviour by the use of withr::local_options(list(usethis.quiet = FALSE)).
    [return]
  2. The gert package uses libgit2, not Git directly.
    [return]
  3. We're using the very good email subject by Roy Mendelssohn on R-pkg-devel.
    [return]
  4. There's actually an R package called backports which provides backports of functions which have been introduced in one of the base packages in R version 3.0.1 or later, maybe it'll provide backports for tools::R_user_dir()?
    [return]
  5. If your package has a helper for downloading and saving the dataset locally, and you don't control the dataset source (contrary to the aforementioned approach), you might want to register several URLs for that content, as explained in the README of the conceptual contenturi package.
    [return]
  6. In file.path(rappdirs::user_data_dir("rhub", "rhub"), "validated_emails.csv"), /home/maelle/.local/share/rhub/validated_emails.csv in my case.
    [return]

To leave a comment for the author, please follow the link and comment on their blog: Posts on R-hub blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

one or two?

Posted: 11 Mar 2020 04:20 PM PDT

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A superposition of two random walks from The Riddler:

Starting from zero, a random walk is produced by choosing moves between ±1 and ±2 at each step. If the choice between both is made towards maximising the probability of ending up positive after 100 steps, what is this probability?

Although the optimal path is not necessarily made of moves that optimise the probability of ending up positive after the remaining steps, I chose to follow a dynamic programming approach by picking between ±1 and ±2 at each step based on that probability:

bs=matrix(0,405,101) #best stategy with value i-203 at time j-1  bs[204:405,101]=1  for (t in 100:1){    tt=2*t    bs[203+(-tt:tt),t]=.5*apply(cbind(       bs[204+(-tt:tt),t+1]+bs[202+(-tt:tt),t+1],       bs[201+(-tt:tt),t+1]+bs[205+(-tt:tt),t+1]),1,max)}  

resulting in the probability

> bs[203,1]  [1] 0.6403174  

Just checking that a simple strategy of picking ±1 above zero and ±2 below leads to the same value

ga=rep(0,T)  for(v in 1:100) ga=ga+(1+(ga<1))*sample(c(-1,1),T,rep=TRUE)  

or sort of

> mean(ga>0)  [1] 0.6403494  

With highly similar probabilities when switching at ga<2

> mean(ga>0)  [1] 0.6403183  

or ga<0

> mean(ga>0)  [1] 0.6403008  

and too little difference to spot a significant improvement between the three boundaries.

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

AsioHeaders 1.12.2-1

Posted: 11 Mar 2020 03:48 PM PDT

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An updated minor version of the AsioHeaders package arrived on CRAN today. Asio provides a cross-platform C++ library for network and low-level I/O programming. It is also included in Boost – but requires linking when used as part of Boost. This standalone version of Asio is a header-only C++ library which can be used without linking (just like our BH package with parts of Boost).

This release corresponds to a minor upstream update, and is only the second update ever. It may help overcome one santizer warning which David Hall brought to my attention. We tested this version against all reverse depends (which was easy enough as there are only three).The NEWS entry follows.

Changes in version 1.12.2-1 (2020-03-11)

  • Upgraded to Asio 1.12.2 (Dirk in #4 fixing #3)

Via CRANberries, there is a diffstat report relative to the previous release.

Comments and suggestions about AsioHeaders are welcome via the issue tracker at the GitHub GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Comments

  1. Hi, thanks for sharing this post but the text isn't readable and background color is very dark which makes it more difficult to read. It would be better if you make these changes.
    ar integration for quickBooks

    ReplyDelete

Post a Comment