[R-bloggers] How GPL makes me leave R for Python :-( (and 4 more aRticles)

[R-bloggers] How GPL makes me leave R for Python :-( (and 4 more aRticles)

Link to R-bloggers

How GPL makes me leave R for Python :-(

Posted: 30 Jan 2019 10:08 PM PST

(This article was first published on R-posts.com, and kindly contributed to R-bloggers)


Being a data scientist in a startup I can program with several languages, but often R is a natural choice.
Recently I wanted my company to build a product based on R. It simply seemed like a perfect fit.

But this turned out to be a slippery slope into the open-source code licensing field, which I wasn't really aware of before.

Bottom line: legal advice was not to use R!

Was it a single lawyer? No. The company was willing to "play along" with me, and we had a consultation with 4 different software lawyers, one after the other.
What is the issue? R is licensed as GPL 2, and most R packages are also GPL (whether 2 or 3).

GPL is not a permissive license. It is categorized as "strongly protective".

In layman terms, if you build your work on a GPL program it may force you to license your product with a GPL license, too. In other words – it restrains you from keeping your code proprietary.

Now you say – "This must be wrong", and "You just don't understand the license and its meaning", right? You may also mention that Microsoft and other big companies are using R, and provide R services.

Well, maybe. I do believe there are ways to make your code proprietary, legally. But, when your software lawyers advise to "make an effort to avoid using this program" you do not brush them off 🙁


Now, for some details.

As a private company, our code needs to be proprietary. Our core is not services, but the software itself. We need to avoid handing our source code to a customer.

The program itself will be installed on a customer's server. Most of our customers have sensitive data and a SAAS model (or a connection to the internet) is out of the question. Can we use R?

The R Core Team addressed the question "Can I use R for commercial purposes?". But, as lawyers told us, the way it is addressed does not solve much. Any GPL program can be used for commercial purposes. You can offer your services installing the software, or sell a visualization you've prepared with ggplot2. But, it does not answer the question – can I write a program in R, and have it licensed with a non-GPL license (or simply – a commercial license)?

The key question we were asked was is our work a "derivative work" of R. Now, R is an interpreted programming language. You can write your code in notepad and it will run perfectly. Logic says that if you do not modify the original software (R) and you do not copy any of its source code, you did not make a derivative work.

As a matter of fact, when you read the FAQ of the GPL license it almost seems that indeed there is no problem. Here is a paragraph from the Free Software Foundation https://www.gnu.org/licenses/gpl-faq.html#IfInterpreterIsGPL:  

If a programming language interpreter is released under the GPL, does that mean programs written to be interpreted by it must be under GPL-compatible licenses?(#IfInterpreterIsGPL)

When the interpreter just interprets a language, the answer is no. The interpreted program, to the interpreter, is just data; a free software license like the GPL, based on copyright law, cannot limit what data you use the interpreter on. You can run it on any data (interpreted program), any way you like, and there are no requirements about licensing that data to anyone.

Problem solved? Not quite. The next paragraph shuffles the cards:

However, when the interpreter is extended to provide "bindings" to other facilities (often, but not necessarily, libraries), the interpreted program is effectively linked to the facilities it uses through these bindings. So if these facilities are released under the GPL, the interpreted program that uses them must be released in a GPL-compatible way. The JNI or Java Native Interface is an example of such a binding mechanism; libraries that are accessed in this way are linked dynamically with the Java programs that call them. These libraries are also linked with the interpreter. If the interpreter is linked statically with these libraries, or if it is designed to link dynamically with these specific libraries, then it too needs to be released in a GPL-compatible way.

Another similar and very common case is to provide libraries with the interpreter which are themselves interpreted. For instance, Perl comes with many Perl modules, and a Java implementation comes with many Java classes. These libraries and the programs that call them are always dynamically linked together.

A consequence is that if you choose to use GPLed Perl modules or Java classes in your program, you must release the program in a GPL-compatible way, regardless of the license used in the Perl or Java interpreter that the combined Perl or Java program will run on

This is commonly interpreted as "You can use R, as long as you don't call any library".

Now, can you think of using R without, say, the Tidyverse package? Tidyverse is a GPL library. And if you want to create a shiny web app – you still use the Shiny library (also GPL). Assume you will purchase a shiny server pro commercial license, this still does not resolve the shiny library itself being licensed as GPL.

Furthermore, we often use quite a lot of R libraries – and almost all are GPL. Same goes for a shiny app, in which you are likely to use many GPL packages to make your product look and behave as you want it to.

Is it legal to use R after all?

I think it is. The term "library" may be the cause of the confusion.

 

As Perl is mentioned specifically in the GPL FAQ quoted above, Perl addressed the issue of GPL licensed interpreter on proprietary scripts head on (https://dev.perl.org/licenses/ ): "my interpretation of the GNU General Public License is that no Perl script falls under the terms of the GPL unless you explicitly put said script under the terms of the GPL yourself.

Furthermore, any object code linked with perl does not automatically fall under the terms of the GPL, provided such object code only adds definitions of subroutines and variables, and does not otherwise impair the resulting interpreter from executing any standard Perl script"

There may also be a hidden explanation by which most libraries are fine to use. As said above, it is possible the confusion is caused by the use of the term "library" in different ways.

Linking/binding is a technical term for what occurs when compiling software together. This is not what happens with most R packages, as may be understood when reading the following question and answer: Does an Rcpp-dependent package require a GPL license?

The question explains why (due to GPL) one should NOT use the Rcpp R library. Can we infer from it that it IS ok to use most other libraries?

"This is not a legal advice"

As we've seen, what is and is not legal to do with R, being GPL, is far from being clear.

Everything that is written on the topic is also marked as "not a legal advice". While this may not be surprising, one has a hard time convincing a lawyer to be permissive, when the software owners are not clear about it. For example, the FAQ "Can I use R for commercial purposes?" mentioned above begins with "R is released under the GNU General Public License (GPL), version 2. If you have any questions regarding the legality of using R in any particular situation you should bring it up with your legal counsel". And ends with "None of the discussion in this section constitutes legal advice. The R Core Team does not provide legal advice under any circumstances."

 

In between the information is not very decisive, either. So at the end of the day, it is unclear what is the actual legal situation.

Another thing one of the software lawyers told us is that Investors do not like GPL. In other words, even if it turns out that it is legal to use R with its libraries – a venture capital investor may be reluctant. If true, this may cause delays and may also require additional work convincing the potential investor that what you are doing is indeed flawless. Hence, lawyers told us, it is best if you can find an alternative that is not GPL at all.

 

What makes Python better?

Most of the "R vs. Python" articles are pure junk, IMHO. They express nonsense commonly written in the spirit of "Python is a general-purpose language with a readable syntax. R, however, is built by statisticians and encompasses their specific language." Far away from the reality as I see it.

 

Most Common Licenses for Python Packages https://snyk.io/blog/over-10-of-python-packages-on-pypi-are-distributed-without-any-license/

But Python has a permissive license. You can distribute it, you can modify it, and you do not have to worry your code will become open-source, too. This truly is a great advantage.

Is there anything in between a permissive license and a GPL?

Yes there is.

For example, there is the Lesser GPL (LGPL). As described in Wikipedia: "The license allows developers and companies to use and integrate a software component released under the LGPL into their own (even proprietary) software without being required by the terms of a strong copyleft license to release the source code of their own components. However, any developer who modifies an LGPL-covered component is required to make their modified version available under the same LGPL license." Isn't this exactly what the R choice of a license was aiming at?

Others use an exception. Javascript, for example, is also GPL. But they added the following exception: "As a special exception to GPL, any HTML file which merely makes function calls to this code, and for that purpose includes it by reference shall be deemed a separate work for copyright law purposes. In addition, the copyright holders of this code give you permission to combine this code with free software libraries that are released under the GNU LGPL. You may copy and distribute such a system following the terms of the GNU GPL for this code and the LGPL for the libraries. If you modify this code, you may extend this exception to your version of the code, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version."

R is not LGPL. R has no written exceptions.

 

The fact that R and most of its libraries use a GPL license is a problem. At the very least it is not clear if it is really legal to use R to write proprietary code.

Even if it is legal, Python still has an advantage being a permissive license, which means "no questions asked" by potential customers and investors.

 

It would be good if the R core team, as well as people releasing packages, were clearer about the proper use of the software, as they see it. They could take Perl as an example.

 

It would be even better if the license would change. At least by adding an exception, reducing it to an LGPL or (best) permissive license.

Click HERE to leave a comment.

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Book review: Beyond Spreadsheets with R

Posted: 30 Jan 2019 04:00 PM PST

(This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers)

Disclaimer: Manning publications gave me the ebook version of Beyond Spreadsheets with R – A beginner's guide to R and RStudio by Dr. Jonathan Carroll free of charge.

Beyond Spreadsheets with R shows you how to take raw data and transform it for use in computations, tables, graphs, and more. You'll build on simple programming techniques like loops and conditionals to create your own custom functions. You'll come away with a toolkit of strategies for analyzing and visualizing data of all sorts using R and RStudio.

Aimed at

This book covers all the very basics of getting started with R, so it is useful to beginners but probably won't offer much for users who are already familiar with R. However, it does hint at some advanced functions, like defining specific function calls for custom classes.

What stood out for me in the book (both positive and negative)

  • It introduced version control right away in the first chapter. Coming from an academic background where such things as Github and virtual machines were not widely used (at least not in the biology department), I learned about these invaluable tools way too late. Thus, I would always encourage students to start using them as early as possible. It will definitely make your analyses more reproducible, failproof and comprehensible. And it will make it easier to collaborate and keep track of different projects from past and present!

  • Great recap of the very basics, like data types. Even though you might have been working with R for a while, I still find it useful to go over some basics once in a while – often I find some minor detail that I hadn't known before or had forgotten again (e.g. that you shouldn't use T and F instead of TRUE and FALSE because T and F can be variable names and in the worst case assigned to the opposite logical).

  • The book also includes a short part to debugging and testing, which I think is valuable knowledge that most introductions to R skip. Of course, you won't get an in depth covering of these two topics but at least you will have heard about them early on.

  • It introduced the rex package, which I hadn't known before. Looks very convenient for working with regular expressions!

  • The book introduces tibble and dplyr and also goes into Non-Standard Evaluation. It shows a couple of ways to work around that using tidy evaluation.

This is what is shown:

library(dplyr)  library(rlang)    head(mutate(mtcars, ratio = mpg / cyl))
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb    ratio  ## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 3.500000  ## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 3.500000  ## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 5.700000  ## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 3.566667  ## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 2.337500  ## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 3.016667
column_to_divide <- "cyl"  head(mutate(mtcars, ratio = mpg / !!sym(column_to_divide)))
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb    ratio  ## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 3.500000  ## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 3.500000  ## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 5.700000  ## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1 3.566667  ## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 2.337500  ## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1 3.016667

This snippet about Non-Standard-Evaluation in the book is very short. Users who worked with dplyr and functions before, might wonder why the book doesn't mention the standard evaluation form of the dplyr verbs (like mutate_). This is because those verbs are deprecated now. Instead, if you want to write functions with dplyr, use tidy evaluation.

dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.

This is how it would work:

  1. You write your function with arguments as you would normally (the input to column_to_divide will be unquoted).
  2. You use enquo to "capture the expression supplied as argument by the user of the current function" (?enquo).
  3. You feed this object to your dplyr function (here mutate) and use !! "to say that you want to unquote an input so that it's evaluated".
my_mutate_2 <- function(df, column_to_divide) {        quo_column_to_divide <- enquo(column_to_divide)        df %>%      mutate(ratio = mpg / !!quo_column_to_divide)  }    my_mutate_2(mtcars, cyl) %>%    select(mpg, cyl, ratio) %>%    head()
##    mpg cyl    ratio  ## 1 21.0   6 3.500000  ## 2 21.0   6 3.500000  ## 3 22.8   4 5.700000  ## 4 21.4   6 3.566667  ## 5 18.7   8 2.337500  ## 6 18.1   6 3.016667
  • Weirdly (to me) the book introduces dplyr and pipe operators but when Jonathan talked about replacing NAs, he only mentions the base R version:
d_NA <- data.frame(x = c(7, NA, 9, NA), y = 1:4)  d_NA[is.na(d_NA)] <- 0  d_NA
##   x y  ## 1 7 1  ## 2 0 2  ## 3 9 3  ## 4 0 4

But he doesn't mention the tidy way to do this:

d_NA %>%    tidyr::replace_na()
##   x y  ## 1 7 1  ## 2 0 2  ## 3 9 3  ## 4 0 4
  • Similar with aggregate. This is the base R version that is shown:
aggregate(formula = . ~ Species, data = iris, FUN = mean)
##      Species Sepal.Length Sepal.Width Petal.Length Petal.Width  ## 1     setosa        5.006       3.428        1.462       0.246  ## 2 versicolor        5.936       2.770        4.260       1.326  ## 3  virginica        6.588       2.974        5.552       2.026

But doesn't show the tidy way. I guess this is because at this point in book the tidy way would be a bit too complicated, due to the "wrong" data format (wide vs. long). He does introduce the tidy principles of data analysis in the next chapters.

library(tidyverse)  iris %>%    gather(x, y, Sepal.Length:Petal.Width) %>%    group_by(Species, x) %>%    summarise(mean = mean(y))
## # A tibble: 12 x 3  ## # Groups:   Species [?]  ##    Species    x             mean  ##                    ##  1 setosa     Petal.Length 1.46   ##  2 setosa     Petal.Width  0.246  ##  3 setosa     Sepal.Length 5.01   ##  4 setosa     Sepal.Width  3.43   ##  5 versicolor Petal.Length 4.26   ##  6 versicolor Petal.Width  1.33   ##  7 versicolor Sepal.Length 5.94   ##  8 versicolor Sepal.Width  2.77   ##  9 virginica  Petal.Length 5.55   ## 10 virginica  Petal.Width  2.03   ## 11 virginica  Sepal.Length 6.59   ## 12 virginica  Sepal.Width  2.97
  • It covers web scraping and exporting data to .RData, .rds and how to use dput() when asking StackOverflow questions.

  • In the part about looping, the book mainly focuses on using the purrr package. You can find an example for using purrr in data analysis on my blog.

  • The book introduces package development, which is very useful, even if you just want to have a place where all your self-made functions "live". 🙂 Moreover, it also goes into Unit Testing with testthat!

Verdict

Even though I am not really the target audience for a book like that since I am not a beginner any more (I basically skimmed through a lot of the very basic stuff), it still offered a lot of valuable information! I particularly liked that the book introduces very important parts of learning R for practical and modern applications, such as using the tidyverse, version control, debugging, how to aks proper questions on StackOverflow and basic package development, that don't usually get mentioned in beginner's books.


sessionInfo()
## R version 3.5.2 (2018-12-20)  ## Platform: x86_64-apple-darwin15.6.0 (64-bit)  ## Running under: macOS Mojave 10.14.2  ##   ## Matrix products: default  ## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib  ## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib  ##   ## locale:  ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8  ##   ## attached base packages:  ## [1] stats     graphics  grDevices utils     datasets  methods   base       ##   ## other attached packages:  ##  [1] forcats_0.3.0   stringr_1.3.1   purrr_0.2.5     readr_1.3.1      ##  [5] tidyr_0.8.2     tibble_2.0.1    ggplot2_3.1.0   tidyverse_1.2.1  ##  [9] bindrcpp_0.2.2  rlang_0.3.1     dplyr_0.7.8      ##   ## loaded via a namespace (and not attached):  ##  [1] tidyselect_0.2.5 xfun_0.4         haven_2.0.0      lattice_0.20-38   ##  [5] colorspace_1.4-0 generics_0.0.2   htmltools_0.3.6  yaml_2.2.0        ##  [9] utf8_1.1.4       pillar_1.3.1     glue_1.3.0       withr_2.1.2       ## [13] modelr_0.1.2     readxl_1.2.0     bindr_0.1.1      plyr_1.8.4        ## [17] munsell_0.5.0    blogdown_0.10    gtable_0.2.0     cellranger_1.1.0  ## [21] rvest_0.3.2      evaluate_0.12    knitr_1.21       fansi_0.4.0       ## [25] broom_0.5.1      Rcpp_1.0.0       scales_1.0.0     backports_1.1.3   ## [29] jsonlite_1.6     hms_0.4.2        digest_0.6.18    stringi_1.2.4     ## [33] bookdown_0.9     grid_3.5.2       cli_1.0.1        tools_3.5.2       ## [37] magrittr_1.5     lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.2   ## [41] xml2_1.2.0       lubridate_1.7.4  assertthat_0.2.0 rmarkdown_1.11    ## [45] httr_1.4.0       rstudioapi_0.9.0 R6_2.3.0         nlme_3.1-137      ## [49] compiler_3.5.2

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

missing digit in a 114 digit number [a Riddler’s riddle]

Posted: 30 Jan 2019 03:19 PM PST

(This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers)

A puzzling riddle from The Riddler (as Le Monde had a painful geometry riddle this week): this number with 114 digits

530,131,801,762,787,739,802,889,792,754,109,70?,139,358,547,710,066,257,652,050,346,294,484,433,323,974,747,960,297,803,292,989,236,183,040,000,000,000

is missing one digit and is a product of some of the integers between 2 and 99. By comparison, 76! and 77! have 112 and 114 digits, respectively. While 99! has 156 digits. Using WolframAlpha on-line prime factor decomposition code, I found that only 6 is a possible solution, as any other integer between 0 and 9 included a large prime number in its prime decomposition:

However, I thought anew about it when swimming the next early morning [my current substitute to morning runs] and reasoned that it was not necessary to call a formal calculator as it is reasonably easy to check that this humongous number has to be divisible by 9=3×3 (for else there are not enough terms left to reach 114 digits, checked by lfactorial()… More precisely, 3³³x33! has 53 digits and 99!/3³³x33! 104 digits, less than 114), which means the sum of all digits is divisible by 9, which leads to 6 as the unique solution.

 

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

R Markdown Template for Business Reports

Posted: 30 Jan 2019 01:31 AM PST

(This article was first published on INWT-Blog-RBloggers, and kindly contributed to R-bloggers)

In this post I'd like to introduce the R Markdown template for business reports by INWTlab. It's been my aim to have a nice and clean template that is easy to customize in colors, cover and logo. I know there are quite a few templates available, but I was missing one to be used in a corporate environment. That is, I want to have a logo included and a cover page that also can be styled in a corporate design. In many companies nowadays MS Word is still THE reporting tool, so the overall look and feel is loosly oriented at MS Word defaults. In addition, this template can be extended by hacking the tex file defs.tex. In my eyes a tex file is way easier to hack – especially for not so experienced latex users – than style files (.sty).

Installation

You can install the package from the INWTlab Github Repository ireports. If you haven't installed devtools yet, please do that first.

# install.packages("devtools") devtools::install_github("INWTlab/ireports")

Once the package is installed, you can already use the template in its default version by selecting R Markdown… > From Template > INWTlab Business Report and then clicking OK. As result a new Rmd script pops up and you can knit it right away. This is what the default report looks like:

Usage

Text & Graphics

You can write text and add graphics just like you would in any other R Markdown document.

Tables

In order to generate tables in MS Word style you can use the example code snippet below. It renders a table from the iris data set.

tab <- xtable(head(iris, n = 20), align = "|C|C|C|C|C|C|") addtorow <- list() addtorow$pos <- list() addtorow$pos[[1]] <- -1 addtorow$command <- c("\\rowcolor{colorTwo}") print(tab,       include.rownames = FALSE,       tabular.environment = "tabularx",       width = "\\textwidth",       comment = FALSE,       add.to.row = addtorow)

This code snippet consists of three parts:

  1. The table creation. With xtable() I create an object tab of class xtable. Make sure to edit the align option as the number of columns changes.
  2. The coloring of the header row. I color the header via a latex code snippet which is placed in the header of the latex table code. The add.to.row option in print() enables me to put a code snippet into the latex code wherever I like by specifying a command and a position in a separate list, here called addtorow.
  3. The table printing itself.

So, when you create a new table, all you have to do is to create the xtable object tab in the first row, making sure that the align has the right number of columns specified.

Customization

Logo and Cover

When you create a new R Markdown file from the template, RStudio creates a new folder in your working directory (check with getwd()). You can overwrite logo.png and cover.png in this folder with your own files. Keep in mind to either name your files exactly like mine or to correct the file names in the yaml header. The recommended dimension for the logo is 350 x 130 pixels. The cover should have 1654 x 2339 pixels. In the case you want to use other dimensions, you need to hack the tex file defs.tex in order to make it look nice.

Colors

There are currently two colors in use, namely iblue and igray. If you'd like to adopt your corporate colors, change the hexadecimal color values for iblue and igray in the yaml header.

Collaboration

You're welcome to help improve this template, just open an issue on GitHub or send a pull request.

To leave a comment for the author, please follow the link and comment on their blog: INWT-Blog-RBloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

December 2108: “Top 40” New CRAN Packages

Posted: 29 Jan 2019 04:00 PM PST

(This article was first published on R Views, and kindly contributed to R-bloggers)

By my count, 157 new packages stuck to CRAN in December. Below are my "Top 40" picks in ten categories: Computational Methods, Data, Finance, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities and Visualization. This is the first time I have used the Medicine category. I am pleased that a few packages that appear to have clinical use made the cut. Also noteworthy in this month's selection are the inclusion of four packages from the Microsoft Azure team (stuffing 41 packages into the "Top 40"), and some eclectic, but fascinating packages in the Science section.

Computational Methods

ar.matrix v0.1.0: Provides functions that use precision matrices and Choleski factorization to simulates auto-regressive data. The README offers examples.

mvp v1.0-2: Provides functions for the fast symbolic manipulation polynomials. See the vignette and this R Journal paper for details on how to create this image of the Rosenbrock function.

pomdp v0.9.1: Provides an interface to pomdp-solve, a solver for Partially Observable Markov Decision Processes (POMDP). See the vignette for examples.

Data

dbparser v1.0.0: Provides a tool for parsing the DrugBank XML database. The vignette shows how to get started.

rdhs v0.6.1: Implements a client querying the DHS API to download and manipulate survey datasets and metadata. There are introductions to using rdhs and the rdhs client, an extended example about Anemia prevalence, and vignettes on Country Codes, Interacting with the geojson API results, and Testing.

Finance

optionstrat v1.0.0: Implements the Black-Scholes-Merton option pricing model to calculate key option analytics and graphical analysis of various option strategies. See the vignette.

riskParityPortfolio v0.1.1: Provides functions to design risk parity portfolios for financial investment. In addition to the vanilla formulation, where the risk contributions are perfectly equalized, many other formulations are considered that allow for box constraints and short selling. The package is based on the papers: Feng and Palomar (2015), Spinu (2013), and Griveau-Billion et al.(2013). See the vignette for an example.

Machine Learning

BTM v0.2: Provides functions to find Biterm topics in collections of short texts. In contrast to topic models, which analyze word-document co-occurrence, biterms consist of two words co-occurring in the same short text window.

ParBayesianOptimization v0.0.1: Provides a framework for optimizing Bayesian hyperparameters according to the methods described in Snoek et al. (2012). There are vignettes on standard and advanced features.

Medicine

LUCIDus v0.9.0: Implements the LUCID method to jointly estimate latent unknown clusters/subgroups with integrated data. See the vignette for details.

metaRMST v1.0.0: Provides functions that use individual patient-level data to produce a multivariate meta-analysis of randomized controlled trials with the difference in restricted mean survival times ( RMSTD ).

webddx v0.1.0: Implements a differential-diagnosis generating tool. Given a list of symptoms, the function query_fz queries the FindZebra website and returns a differential-diagnosis list.

Science

bioRad v0.4.0: Provides functions to extract, visualize, and summarize aerial movements of birds and insects from weather radar data. There is an Introduction and a vignette on Exercises.

pmd v0.1.1: Implements the paired mass distance analysis proposed in Yu, Olkowicz and Pawliszyn (2018) for gas/liquid chromatography–mass spectrometry. See the vignette for an introduction.

tabula v1.0.0: Provides functions to examine archaeological count data and includes several measures of diversity. There are vignettes on Diversity Measures, Matrix Classes, and Matrix Seriation. This last vignette includes an example reproducing the results of Peeples and Schachner (2012).

traitdataform v0.5.2: Provides functions to assist with handling ecological trait data and applying the Ecological Trait-Data Standard terminology described in Schneider et al. (2018).

waterquality v0.2.2: Implements over 45 algorithms to develop water quality indices from satellite reflectance imagery. The vignette introduces the package.

Statistics

areal v0.1.2: Implements areal weighted interpolation with support for multiple variables in a workflow that is compatible with the tidyverse and sf frameworks. There are vignettes on Areal Interpolation, Wieghted Areal Interpoaltion, and Preparing Data for Interpolation.

FLAME v1.0.0: Implements the Fast Large-scale Almost Matching Exactly algorithm of Roy et al. (2017) for causal inference. Look at the README to get started.

mistr v0.0.1: Offers a computational framework for mixture distributions with a focus on composite models. There is an Introduction and a vignette on Extensions.

mlergm v0.1: Provides functions to estimate exponential-family random graph models for multilevel network data, assuming the multilevel structure is observed. There is a Tutorial.

MTLR v0.1.0: Implements the Multi-Task Logistic Regression (MTLR) proposed by Yu et al. (2011). See the vignette.

mulitRDPG v1.0.1: Provides functions to fit the Multiple Random Dot Product Graph Model and performs a test for whether two networks come from the same distribution. See Nielsen and Witten (2018) for details.

ocp v0.1.0: Implements the Bayesian online changepoint detection method of Adams and MacKay (2007) for univariate or multivariate data. Gaussian and Poisson probability models are implemented. The vignette provides an introduction.

probably v0.0.1: Provides tools for post-processing class probability estimates. See the vignettes Where does probability fit in? and Equivocal Zones.

smurf v1.0.0: Implements the SMuRF algorithm of Devriendt et al. (2018) to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood. See the package Introduction.

subtee v0.3-4: Provides functions for naive and adjusted treatment effect estimation for subgroups. Proposes model averaging Bornkamp et al. (2016) and bagging Rosenkranz (2016) to address the problem of selection bias in treatment effect estimation for subgroups. There is a Introduction and vignettes for the plot and subbuild functions.

xspliner v0.0.2: Provides functions to assist model building using surrogate black-box models to train interpretable spline based, additive models. There are vignettes on Basic Theory and Usage, Automation, Classification, Use Cases, Graphics, Extra Information, and the xspliner Environment.

Time Series

mfbvar v0.4.0: Provides functions for estimating mixed-frequency Bayesian vector autoregressive (VAR) models with Minnesota or steady-state priors as those used by Schorfheide and Song (2015), or by Ankargren, Unosson and Yang (2018). Look at the GitHub page for an example.

NTS v1.0.0: Provides functions to simulate, estimate, predict, and identify models for nonlinear time series.

Utilities

AzureContainers v1.0.0: Implements an interface to container functionality in Microsoft's Azure cloud that enables users to manage the the Azure Container Instance, Azure Container Registry, and Azure Kubernetes Service. There are vignettes on Plumber model deployment and Machine Learning server model deployment.

AzureRMR v1.0.0: Implements lightweight interface to the Azure Resource Manager REST API. The package exposes classes and methods for OAuth authentication and working with subscriptions and resource group. There is an Introduction and a vignette on Extending AzureRMR.

AzureStor v1.0.0: Provides tools to manage storage in Microsoft's Azure cloud. See the Introduction.

AzureVM v1.0.0: Implements tools for working with virtual machines and clusters of virtual machines in Microsoft's Azure cloud. See the Introduction.

cliapp v0.1.0: Provides functions that facilitate creating rich command line applications with colors, headings, lists, alerts, progress bars, and custom CSS-based themes. See the README for examples.

projects v0.1.0: Provides a project infrastructure with a focus on manuscript creation. See the README for the conceptual framework and an introduction to the package.

remedy v0.1.0: Implements an RStudio Addin offering shortcuts for writing in Markdown.

solartime v0.0.1: Provides functions for computing sun position and times of sunrise and sunset. The vignette offers an overview.

Visualization

easyalluvial v0.1.8: Provides functions to simplify Alluvial plots for visualizing categorical data over multiple dimensions as flows. See Rosvall and Bergstrom (2010). See the README for details.

spatialwidget v0.2: Provides functions for converting R objects, such as simple features, into structures suitable for use in htmlwidgets mapping libraries. See the vignette for details.

transformr v0.1.1: Provides an extensive framework for manipulating the shapes of polygons and paths and can be seen as the spatial brother to the tweenr package. See the README for details.

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Comments