[R-bloggers] nanotime 0.2.3 (and 2 more aRticles)

nanotime 0.2.3
RcppAPT 0.0.5
Using R’s set.seed() to set seeds for use in C/C++ (including Rcpp)

Posted: 30 Sep 2018 08:14 AM PDT

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

A minor maintenance release of the nanotime package for working with nanosecond timestamps just arrived on CRAN.

nanotime uses the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it now uses a more rigorous S4-based approach thanks to a rewrite by Leonardo Silvestri.

This release disables some tests on the Slowlaris platform we are asked to conform to (which is a good thing as wider variety of test platforms widens test converage) yet have no real access to (which is bad thing, obviously) beyind what the helpful rhub service offers. We also updated the Travis setup. No code changes.

Changes in version 0.2.3 (2018-09-30)

Skip some tests on Solaris which seems borked with timezones. As we have no real, no fixed possible (Dirk in #42).

Update Travis setup

Once this updates on the next hourly cron iteration, we also have a diff to the previous version thanks to CRANberries. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

RcppAPT 0.0.5

Posted: 29 Sep 2018 05:08 PM PDT

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

A new version of RcppAPT – our interface from R to the C++ library behind the awesome apt, apt-get, apt-cache, … commands and their cache powering Debian, Ubuntu and the like – is now on CRAN.

This version is a bit of experiment. I had asked on the r-package-devel and r-devel list how I could suppress builds on macOS. As it does not have the required libapt-pkg-dev library to support the apt, builds always failed. CRAN managed to not try on Solaris or Fedora, but somewhat macOS would fail. Each. And. Every. Time. Sadly, nobody proposed a working solution.

So I got tired of this. Now we detect where we build, and if we can infer that it is not a Debian or Ubuntu (or derived system) and no libapt-pkg-dev is found, we no longer fail. Rather, we just set a #define and at compile-time switch to essentially empty code. Et voilà: no more build errors.

And as before, if you want to use the package to query the system packaging information, build it on system using apt and with its libapt-pkg-dev installed.

A few other cleanups were made too.

Changes in version 0.0.5 (2017-09-29)

NAMESPACE now sets symbol registration

configure checks for suitable system, no longer errors if none found, but sets good/bad define for the build

Existing C++ code is now conditional on having a 'good' build system, or else alternate code is used (which succeeds everywhere)

Added suitable() returning a boolean with configure result

Tests are conditional on suitable() to test good builds

The Travis setup was updated

The vignette was updated and expanded

Courtesy of CRANberries, there is also a diffstat report for this release.

A bit more information about the package is available here as well as as the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

This posting includes an audio/video/photo media file: Download Now

Using R’s set.seed() to set seeds for use in C/C++ (including Rcpp)

Posted: 29 Sep 2018 05:00 PM PDT

(This article was first published on R on roryverse, and kindly contributed to R-bloggers)

In native R, the user sets the seed for random number generation (RNG) with set.seed(). Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by set.seed(). Good news! It can be done.

pacman::p_load(inline, purrr)

`rbernoulli`

Base R (or technically the stats package) provides no rbernoulli(). It's a pretty gaping hole in the pantheon of rbeta(), rbinom(), rcauchy, rchisq(), rexp(), rf(), rgamma(), etc. Thankfully, Hadley Wickham noticed this and gave us purrr::rbernoulli().

set.seed(1)  rbernoulli(5, 0.7)  #> [1] FALSE  TRUE  TRUE  TRUE FALSE  set.seed(1)  rbernoulli(5, 0.7)  #> [1] FALSE  TRUE  TRUE  TRUE FALSE

So it seems like Hadley managed to get set.seed() to work with rbernoulli(). How did he do this? Let's take a closer look at purrr::rbernoulli().

purrr::rbernoulli  #> function (n, p = 0.5)   #> {  #>     stats::runif(n) > (1 - p)  #> }  #>   #>

Ah, it seems Hadley just wrapped runif(); hence, because set.seed() works with runif(), it works with his implementation of purrr::rbernoulli().

C++ RNG

The C++ standard library provides the header file, which includes Bernoulli RNG. Let's give that a whirl.

cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), '                         int n_ = as(n), seed_ = as(seed);                         double p_ = as(p);                         std::default_random_engine generator(seed_);                         std::bernoulli_distribution distribution(p_);                         IntegerVector out(n_);                         for (std::size_t i = 0; i != n_; ++i) {                           out[i] = distribution(generator);                         }                         return out;                         ', includes = "#include ")

cpp_rbernoulli(6, 0.7, seed = 1)  #> [1] 1 0 1 1 0 1  cpp_rbernoulli(6, 0.7, seed = 1)  #> [1] 1 0 1 1 0 1  cpp_rbernoulli(6, 0.7, seed = 2)  #> [1] 1 0 1 0 1 1

OK, so now we have cpp_rbernoulli() which is working, but the user has to pass the seed as an argument of the function, there's no option to use set.seed().

`get_seed()`

If only there was a get_seed() function that we could use. Well, here it is!

get_seed <- function() {    sample.int(.Machine$integer.max, 1)  }

This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by set.seed(). Therefore, it's fine to use as a seed itself. Let's take a look.

set.seed(1)  replicate(6, get_seed())  #> [1]  570175513  799129990 1230193230 1950361378  433108649 1929277158  set.seed(1)  replicate(6, get_seed())  #> [1]  570175513  799129990 1230193230 1950361378  433108649 1929277158  set.seed(2)  replicate(6, get_seed())  #> [1]  397031630 1508336757 1231208929  360888751 2026879546 2026097046  set.seed(2)  replicate(6, get_seed())  #> [1]  397031630 1508336757 1231208929  360888751 2026879546 2026097046

So as we can see, setting a seed via set.seed() determines the seeds that subsequently come out of get_seed(), so all is well with the world. get_seed() can now be used to create a version of cpp_rbernoulli() which uses set.seed(). For the sake of inflating my own ego, I'll name this version after myself.

rorybernoulli <- function(n, p) {    cpp_rbernoulli(n, p, get_seed())  }

Let's check that it's in working order.

set.seed(1)  rorybernoulli(6, 0.7)  #> [1] 1 1 0 0 1 0  set.seed(1)  rorybernoulli(6, 0.7)  #> [1] 1 1 0 0 1 0  set.seed(2)  rorybernoulli(6, 0.7)  #> [1] 0 1 1 1 1 1  set.seed(2)  rorybernoulli(6, 0.7)  #> [1] 0 1 1 1 1 1

Everything is awesome.

Benchmarking

Lastly, let's compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with p = 0.5.

bench::mark(purrr::rbernoulli(1e6, p = 0.5),              rorybernoulli(1e6, p = 0.5),              check = FALSE)  #> # A tibble: 2 x 10  #>   expression     min    mean  median    max `itr/sec` mem_alloc  n_gc n_itr  #>                      #> 1 purrr::rb…  26.9ms 29.45ms 28.57ms 32.7ms      34.0   11.45MB     4    13  #> 2 roryberno…  8.55ms  9.61ms  9.38ms 12.7ms     104.     3.82MB     4    46  #> # ... with 1 more variable: total_time

Wow, rorybernoulli() is three times faster! I wasn't expecting that. Perhaps it's because there's a quicker way of generating a Bernoulli random number than by going through a uniform random number (as purrr::rbernoulli() does). It's also three times as efficient with memory, probably related to the time speedup. The point of me writing this post was to share this get_seed() thing with people so that the can use set.seed() with Rcpp and the like; purrr::rbernoulli() was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to purrr!

To leave a comment for the author, please follow the link and comment on their blog: R on roryverse.

This posting includes an audio/video/photo media file: Download Now

THE BENEFIT

Search This Blog