[R-bloggers] another surmortaliy graph (and 11 more aRticles)

[R-bloggers] another surmortaliy graph (and 11 more aRticles)

Link to R-bloggers

another surmortaliy graph

Posted: 27 Apr 2020 11:20 AM PDT

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another graph showing the recent peak in daily deaths throughout France as recorded by INSEE and plotted by Baptiste Coulmont from Paris 8 Sociology Department. And further discussed by Arthur Charpentier on Freakonometrics. With a few days off due to reporting, this brings an objective perspective on the impact of the epidemics (and of the quarantine) compared with the other years since 2001, without requiring tests or even surveys. (The huge peak in August 2003 was an heat wave that decimated elderly citizens throughout France.)

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

W is for Write and Read Data – Fast

Posted: 27 Apr 2020 07:00 AM PDT

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Once again, I'm dipping outside of the tidyverse, but this package and its functions have been really useful in getting data quickly in (and out) of R.

For work, I have to pull in data from a few different sources, and manipulate and work with them to give me the final dataset that I use for much of my analysis. So that I don't have to go through all of that joining, recoding, and calculating each time, I created a final merged dataset as a CSV file that I can load when I need to continue my analysis. The problem is that the most recent version of that file, which contains 13 million+ records, was so large, writing it (and subsequently reading it in later) took forever and sometimes timed out.

That's when I discovered the data.table library, and its fread and fwrite functions. Tidyverse is great for working with CSV files, but a lot of the memory and loading time is used for formatting. fread and fwrite are leaner and get the job done a bit faster. For regular-sized CSV files (like my reads2019 set), the time difference is pretty minimal. But for a 5GB datafile, it makes a huge difference.

library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
##  ggplot2 3.2.1      purrr   0.3.3
## tibble 2.1.3 dplyr 0.8.3
## tidyr 1.0.0 stringr 1.4.0
## readr 1.3.1 forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
system.time(reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv",
col_names = TRUE))
## Parsed with column specification:
## cols(
## Title = col_character(),
## Pages = col_double(),
## date_started = col_character(),
## date_read = col_character(),
## Book.ID = col_double(),
## Author = col_character(),
## AdditionalAuthors = col_character(),
## AverageRating = col_double(),
## OriginalPublicationYear = col_double(),
## read_time = col_double(),
## MyRating = col_double(),
## Gender = col_double(),
## Fiction = col_double(),
## Childrens = col_double(),
## Fantasy = col_double(),
## SciFi = col_double(),
## Mystery = col_double(),
## SelfHelp = col_double()
## )
##    user  system elapsed 
## 0.00 0.10 0.14
rm(reads2019)

library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
system.time(reads2019 <- fread("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv"))
##    user  system elapsed 
## 0 0 0

But let's show how long it took to read my work datafile. Here's the elapsed time from the system.time output.

read_csv:
user system elapsed
61.14 11.72 90.56

fread:
user system elapsed
57.97 16.40 57.19

But the real win is in how quickly this package writes CSV data. Using a package called wakefield, I'll randomly generate 10,000,000 records of survey data, then see how it takes to write the data to file using both write_csv and fwrite.

library(wakefield)
## Warning: package 'wakefield' was built under R version 3.6.3
## 
## Attaching package: 'wakefield'
## The following objects are masked from 'package:data.table':
##
## hour, minute, month, second, year
## The following object is masked from 'package:dplyr':
##
## id
set.seed(42)

reallybigshew <- r_data_frame(n = 10000000,
id,
race,
age,
smokes,
marital,
Start = hour,
End = hour,
iq,
height,
died)


system.time(write_csv(reallybigshew, "~/Downloads/Blogging A to Z/bigdata1.csv"))
##    user  system elapsed 
## 134.22 2.52 137.80
system.time(fwrite(reallybigshew, "~/Downloads/Blogging A to Z/bigdata2.csv"))
##    user  system elapsed 
## 8.65 0.32 2.77

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

R is everywhere

Posted: 27 Apr 2020 01:07 AM PDT

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R is everywhere

  • Learn what R is all about
  • Get an overview of why R is useful
  • Submit your first code exercise

Introduction to R

The most powerful statistical computing language on the planet.

Norman Nie, Founder of SPSS

R is a programming language and environment to work with data. It is loved by statisticians and data scientists for its expressive code syntax and plentiful external libraries and tools and works on all major operating systems.

It is the Swiss army knife for data analysis and statistical computing (and you can make some pretty charts, too!). The R language is easily extensible with packages written by a large and growing community of developers around the world. You can find it pretty much anywhere—it is used by academic institutions, start-ups, international corporations and many more.

This is also reflected by looking at its adoption. Here we can see a large increase in both downloads and number of packages available over the years:

In 2020 R celebrates its 20th birthday with the release of version 4.0. And yes, it's free and open source 😀

Quiz: R Facts

Which of the following statements about R are correct?
Start Exercise

Why Use R?

R is a popular language for solving data analysis problems and is also used by people who traditionally do not consider themselves as programmers. When creating charts and visualizations with R, you will find that you have a much greater creative possibilities as opposed to graphical applications, such as Excel.

Here are some of the features R is most famous for:

Visualization: Creating beautiful graphs and visualizations is one of its biggest strengths. The core language already provides a rich set of tools used for plotting charts and for all kinds of graphics. The sky's the limit.

Reproducibility: Unlike spreadsheet software, R code is not coupled to specific datasets and can easily be reused across different projects – even when exceeding more than 1 million rows. Easily build reusable reports and automatically generate new versions as the data changes.

Advanced modelling: R provides the biggest and most powerful code base for data analysis in the world. The richness and depth of available statistical models is unparalleled and growing by the day, thanks to the huge community of open source package developers and contributors.

Automation: R code can also be used to automate reports or to perform data transformations and model computations. It can also be integrated in automated production workflows, cloud computing environments and modern database systems.

Quiz: Using R

What are the main reasons to use R compared to spreadsheet software?
Start Exercise

You R in Good Company

R is the de facto standard for statistical computing at academic institutions and companies around the world. Its great support for literate programming (code that can be combined with human-readable text) enables researchers and data scientists to create publication-ready reports which are easy to reproduce for reviewers.

The language has seen a wide adoption in various industries—see some examples below:

Information Technology

Pharma: Merck, Genentech (Roche), Novartis, Pfizer

Newspapers: The Economist, The New York Times, Financial Times

Finance

  • Banks: Bank of America, J.P.Morgan, Goldman Sachs, Credit Suisse, UBS, Deutsche Bank
  • Insurances: Lloyd's, Allianz

See also the R Consortium page for further information about industrial partners and initiatives.

Building Blocks

The R language consists of three fundamental building blocks, which we will have a look at in the following chapters:

  • Objects: Everything that exists is an object
  • Functions: Everything that happens is a function call
  • Interfaces: R connects well with many statistical algorithms and libraries

The most important object type in R are vectors. They form the basis for (almost) all R data structures. Being very vector-oriented makes R a very expressive and powerful language.

Functions and operators make it easy to work with vectors and compute results.

The greatest strengths of R is its flexibility to easily integrate new algorithms and build interfaces around them. R's package ecosystem allows you to choose from thousands of open source models and libraries. The main package repository, called CRAN, hosts these packages and allows you to easily install and use them in your code.

Exercise: Submit your first code

This course has code exercises to help you learn and quickly explore new concepts. After entering code in the editor, hit the "Submit" button to execute it. The editor will give you feedback on your submission and displays any output below the editor. If you need some additional help use the "Get Hint" button.

To finish your first exercise, press the "Submit" button.

Start Exercise

To leave a comment for the author, please follow the link and comment on their blog: Quantargo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

R is everywhere

Posted: 27 Apr 2020 01:07 AM PDT

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R is everywhere

  • Learn what R is all about
  • Get an overview of why R is useful
  • Submit your first code exercise

Introduction to R

The most powerful statistical computing language on the planet.

Norman Nie, Founder of SPSS

R is a programming language and environment to work with data. It is loved by statisticians and data scientists for its expressive code syntax and plentiful external libraries and tools and works on all major operating systems.

It is the Swiss army knife for data analysis and statistical computing (and you can make some pretty charts, too!). The R language is easily extensible with packages written by a large and growing community of developers around the world. You can find it pretty much anywhere—it is used by academic institutions, start-ups, international corporations and many more.

This is also reflected by looking at its adoption. Here we can see a large increase in both downloads and number of packages available over the years:

In 2020 R celebrates its 20th birthday with the release of version 4.0. And yes, it's free and open source 😀

Quiz: R Facts

Which of the following statements about R are correct?
Start Exercise

Why Use R?

R is a popular language for solving data analysis problems and is also used by people who traditionally do not consider themselves as programmers. When creating charts and visualizations with R, you will find that you have a much greater creative possibilities as opposed to graphical applications, such as Excel.

Here are some of the features R is most famous for:

Visualization: Creating beautiful graphs and visualizations is one of its biggest strengths. The core language already provides a rich set of tools used for plotting charts and for all kinds of graphics. The sky's the limit.

Reproducibility: Unlike spreadsheet software, R code is not coupled to specific datasets and can easily be reused across different projects – even when exceeding more than 1 million rows. Easily build reusable reports and automatically generate new versions as the data changes.

Advanced modelling: R provides the biggest and most powerful code base for data analysis in the world. The richness and depth of available statistical models is unparalleled and growing by the day, thanks to the huge community of open source package developers and contributors.

Automation: R code can also be used to automate reports or to perform data transformations and model computations. It can also be integrated in automated production workflows, cloud computing environments and modern database systems.

Quiz: Using R

What are the main reasons to use R compared to spreadsheet software?
Start Exercise

You R in Good Company

R is the de facto standard for statistical computing at academic institutions and companies around the world. Its great support for literate programming (code that can be combined with human-readable text) enables researchers and data scientists to create publication-ready reports which are easy to reproduce for reviewers.

The language has seen a wide adoption in various industries—see some examples below:

Information Technology

Pharma: Merck, Genentech (Roche), Novartis, Pfizer

Newspapers: The Economist, The New York Times, Financial Times

Finance

  • Banks: Bank of America, J.P.Morgan, Goldman Sachs, Credit Suisse, UBS, Deutsche Bank
  • Insurances: Lloyd's, Allianz

See also the R Consortium page for further information about industrial partners and initiatives.

Building Blocks

The R language consists of three fundamental building blocks, which we will have a look at in the following chapters:

  • Objects: Everything that exists is an object
  • Functions: Everything that happens is a function call
  • Interfaces: R connects well with many statistical algorithms and libraries

The most important object type in R are vectors. They form the basis for (almost) all R data structures. Being very vector-oriented makes R a very expressive and powerful language.

Functions and operators make it easy to work with vectors and compute results.

The greatest strengths of R is its flexibility to easily integrate new algorithms and build interfaces around them. R's package ecosystem allows you to choose from thousands of open source models and libraries. The main package repository, called CRAN, hosts these packages and allows you to easily install and use them in your code.

Exercise: Submit your first code

This course has code exercises to help you learn and quickly explore new concepts. After entering code in the editor, hit the "Submit" button to execute it. The editor will give you feedback on your submission and displays any output below the editor. If you need some additional help use the "Get Hint" button.

To finish your first exercise, press the "Submit" button.

Start Exercise

To leave a comment for the author, please follow the link and comment on their blog: Quantargo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Essential list of useful R packages for data scientists

Posted: 26 Apr 2020 11:40 PM PDT

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have written couple of blog posts on R packages (here | here ) and this blog post is sort of a preset of all the most needed packages for data science, statistical usage and every-day usage with R.

Among thousand of R packages available on CRAN (with all the  mirror sites) or Github and any developer's repository.

Many useful functions are available in many different R packages, many of the same functionalities also in different packages, so it all boils down to user preferences and work, that one decides to use particular package. From the perspective of a statistician and data scientist, I will cover the essential and major packages in sections. And by no means, this is not a definite list, and only a personal preference.

1. Loading and importing  data

Loading and read data into R environment is most likely one of the first steps if not the most important. Data is the fuel.

Breaking it into the further sections, reading data from binary files, from ODBC drivers and from SQL databases.

 

1.1. Importing from binary files

# Reading from SAS and SPSS  install.packages("Hmisc", dependencies = TRUE)  # Reading from Stata, Systat and Weka  install.packages("foreign", dependencies = TRUE)  # Reading from KNIME  install.packages(c("protr","foreign"), dependencies = TRUE)  # Reading from EXCEL  install.packages(c("readxl","xlsx"), dependencies = TRUE)  # Reading from TXT, CSV  install.packages(c("csv","readr","tidyverse"), dependencies = TRUE)  # Reading from JSON  install.packages(c("jsonLite","rjson","RJSONIO","jsonvalidate"), dependencies = TRUE)  # Reading from AVRO  install.packages("sparkavro", dependencies = TRUE)  # Reading from Parquet file  install.packages("arrow", dependencies = TRUE)  devtools::install_github("apache/arrow/r")  # Reading from XML  install.packages("XML", dependencies = TRUE)

 

1.2. Importing from ODBC

This will cover most of the used work for ODBC drives:

install.packages(c("odbc", "RODBC"), dependencies = TRUE)

 

1.3. Importing from SQL Databases

Accessing SQL database with a particular package can also have great benefits when pulling data from database into R data frame.  In addition, I have added some useful R packages that will help you query data in R much easier (RSQL) or even directly write SQL Statements (sqldf) and other great features.

#Microsoft MSSQL Server  install.packages(c("mssqlR", "RODBC"), dependencies = TRUE)  #MySQL   install.packages(c("RMySQL","dbConnect"), dependencies = TRUE)  #PostgreSQL  install.packages(c("postGIStools","RPostgreSQL"), dependencies = TRUE)  #Oracle  install.packages(c("ODBC"), dependencies = TRUE)  #Amazon  install.packages(c("RRedshiftSQL"), dependencies = TRUE)  #SQL Lite  install.packages(c("RSQLite","sqliter","dbflobr"), dependencies = TRUE)  #General SQL packages  install.packages(c("RSQL","sqldf","poplite","queryparser"), dependencies = TRUE)

 

2. Manipulating Data

Data Engineering, data copying, data wrangling and data manipulating data is the very next task in the journey.

2.1. Cleaning data

Data cleaning is essential for cleaning out all the outliers, NULL, N/A values, wrong values, doing imputation or replacing them, checking up frequencies and descriptive and applying different single- , bi-, and multi-variate statistical analysis to tackle this issue. The list is by no means the complete list, but can be a good starting point:

install.packages(c("janitor","outliers","missForest","frequency","Amelia",                     "diffobj","mice","VIM","Bioconductor","mi",                      "wrangle"), dependencies = TRUE)

2.2. Dealing with R data types and formats

Working with correct data types and knowing your ways around handling formatting of your data-set can be overlooked and yet important. List of the must have packages:

install.packages(c("stringr","lubridate","glue",                     "scales","hablar","readr"), dependencies = TRUE)

2.3. Wrangling, subseting and aggregating data

There are many packages available to do the task of wrangling, engineering and aggregating, especially {base} R package should not be overlooked, since it offers a lot of great and powerful features. But following is a list of those most widely used in the R community and easy to maneuver data:

install.packages(c("dplyr","tidyverse","purr","magrittr",                     "data.table","plyr","tidyr","tibble",                     "reshape2"), dependencies = TRUE)

 

3. Statistical tests and Sampling Data

3.1. Statistical tests

Many of the statistical tests (Shapiro, T-test, Wilcox, equality, …) are available in base and stats package that are available with R engine. Which is great, because primarily R is a statistical language, and many of the tests are already included. But adding additional packages, that I have used:

install.packages(c("stats","ggpubr","lme4","MASS","car"),                      dependencies = TRUE)

3.2. Data Sampling

Data sampling, working with samples and population, working with inference, weights, and type of statistical data sampling can be find in these brilliant packages, also including those that are great for surveying data.

install.packages(c("sampling","icarus","sampler","SamplingStrata",                      "survey","laeken","stratification","simPop"),                        dependencies = TRUE)

4. Statistical Analysis

Regarding of type of the variable, type of the analysis, and results a statistician wants to get, there are list of packages that should be part of daily R environment, when it comes to statistical analysis.

4.1. Regression Analysis

Frankly, one of the most important analysis

install.packages(c("stats","Lars","caret","survival","gam","glmnet",                    "quantreg","sgd","BLR","MASS","car","mlogit","earth",                    "faraway","nortest","lmtest","nlme","splines",                    "sem","WLS","OLS","pls","2SLS","3SLS","tree","rpart"),   dependencies = TRUE)

4.2. Analysis of variance

Distribution and and data dispersion is core to understanding the data. Many of the tests for variance are already built-in in R engine (package stats), but here are also some, that might be useful for analyzing variance.

install.packages(c("caret","rio","car","MASS","FuzzyNumbers",                     "stats","ez"), dependencies = TRUE)

4.3. Multivariate analysis

Using more than two variables is considered multi-variate analysis. Excluding regression analysis and analysis of variance (between 2+ variables), since it is introduced in section 4.1., covering statistical analysis with working on many variables  like factor analysis, principal axis component, canonical analysis, discrete analysis, and others:

install.packages(c("psych","CCA","CCP","MASS","icapca","gvlma","smacof",                   "MVN","rpca","gpca","EFA.MRFA","MFAg","MVar","fabMix",                   "fad","spBFA","cate","mnlfa","CSFA","GFA","lmds","SPCALDA",                   "semds", "superMDS", "vcd", "vcdExtra"),    dependencies = TRUE)

4.4. Classification and Clustering

Based on different type of clustering and classification, there are many packages to cover both. Some of the essential packages for clustering:

install.packages(c("fpc","cluster","treeClust","e1071","NbClust","skmeans",                  "kml","compHclust","protoclust","pvclust","genie", "tclust",                  "ClusterR","dbscan","CEC","GMCM","EMCluster","randomLCA",                  "MOCCA","factoextra",poLCA), dependencies = TRUE)

and for classification:

install.packages("tree", "e1071")

4.5. Analysis of Time-series

Analysing time series and time-serie type of data will be done easier with the following packages:

install.packages(c("ts","zoo","xts","timeSeries","tsModel", "TSMining",                "TSA","fma","fpp2","fpp3","tsfa","TSdist","TSclust","feasts",                "MTS", "dse","sazedR","kza","fable","forecast","tseries",                "nnfor","quantmod"), dependencies = TRUE)

4.6. Network analysis

Analyzing networks is also part of statistical analysis. And some of the relevant packages:

install.packages(c("fastnet","tsna","sna","networkR","InteractiveIGraph",                   "SemNeT","igraph","NetworkToolbox","dyads",                     "staTools","CINNA"), dependencies = TRUE)

4.7. Analysis of text

Besides analyzing open text, once can analyse any kind of text, including the word corpus, the semantics and many more. Couple of starting packages:

install.packages(c("tm","tau","koRpus","lexicon","sylly","textir",           "textmineR","MediaNews", "lsa","SemNeT","ngram","ngramrr",           "corpustools","udpipe","textstem", "tidytext","text2vec"),             dependencies = TRUE)

5. Machine Learning

R has variety of good machine learning packages that are powerfull and give you the full Machine Learning cycle. Breaking down the sections by it's natural way.

5.1. Building and validating  the models

Once you build one or more models, after comparing the results of each models, it is also important to validate the models against the test or any other datasets. Here are powerfull packages to do model validation.

install.packages(c("tree", "e1071","crossval","caret","rpart","bcv",                    "klaR","EnsembleCV","gencve","cvAUC","CVThresh",                    "cvTools","dcv","cvms","blockCV"), dependencies = TRUE)

5.2. Random forests packages

sdfs

install.packages(c("randomForest","grf","ipred","party","randomForestSRC",                    "grf","BART","Boruta","LTRCtrees","REEMtree","refr",                    "binomialRF","superml"), dependencies = TRUE)

5.3. Regression type (regression, boosting, Gradient descent) algoritms packages

Regression type of machine learning algorithm are many, with additional boosting or gradient. Some of very usable packages:

install.packages(c("earth", "gbm","GAMBoost", "GMMBoost", "bst","superml",                     "sboost"), dependencies = TRUE)

5.4. Classification algorithms

Classifying problems have many of the packages and many are also great for machine learning cases. Handful.

install.packages(c("rpart", "tree", "C50", "RWeka","klar", "e1071",                     "kernlab","svmpath","superml","sboost"),   dependencies = TRUE)

5.5. Neural networks

There are many types of Neural networks and many of different packages will give you all types of NN. Only couple of very useful R packages to tackle the neural networks.

install.packages(c("nnet","gnn","rnn","spnn","brnn","RSNNS","AMORE",                     "simpleNeural","ANN2","yap","yager","deep","neuralnet",                     "nnfor","TeachNet"), dependencies = TRUE)

5.6. Deep Learning

R had embraced deep learning and many of the powerfull  SDK and packages have been converted to R, making it very usable for R developers and R machine learning community.

install.packages(c("deepnet","RcppDL","tensorflow","h2o","kerasR",                     "deepNN", "Buddle","automl"), dependencies = TRUE)

5.7. Reinforcement Learning

Reinforcement learning is gaining popularity and more and more packages are being developered in R as well. Some of the very userful packages:

devtools::install_github("nproellochs/ReinforcementLearning")  install.packages(c("RLT","ReinforcementLearning","MDPtoolbox"),   dependencies = TRUE)

5.8. Model interpretability and explainability

Results of machine learning models can be a black-box. Many of the packages are dealing to have black-box more like "glass box", making the models more understandable, interpretable and explainable. Very powerfull packages to do just that for many different machine learning algorithms.

install.packages(c("lime","localModel","iml","EIX","flashlight",                      "interpret","outliertree","breakDown"),   dependencies = TRUE)

 

6. Visualisation

Visualisation of the data is not only the final step to understanding the data, but can also bring clarity to interpretation and buidling the mental model around the data. Couple of packages, that will help boost the visualization:

install.packages(c("ggvis","htmlwidgets","maps","sunburstR", "lattice",    "predict3d","rgl","rglwidget","plot3Drgl","ggmap","ggplot2","plotly",    "RColorBrewer","dygraphs","canvasXpress","qgraph","moveVis","ggcharts",    "igraph","visNetwork","visreg", "VIM", "sjPlot", "plotKML", "squash",    "statVisual", "mlr3viz", "klaR","DiagrammeR","pavo","rasterVis",    "timelineR","DataViz","d3r","d3heatmap","dashboard" "highcharter",    "rbokeh"), dependencies = TRUE)

7. Web Scraping

Many R packages are specificly designed to scrape (harvest) data from particular website, API or archive. Here are only couple of very generic:

install.packages(c("rvest","Rcrawler","ralger","scrapeR"),                dependencies = TRUE)

8. Documents and books organisation

Organizing your documents (file, code, packages, diagrams, pictures) in readable document and have it as a dashboard or book view, there are couple of packages for this purpose:

install.packages(c("devtools","usethis","roxygen2","knitr",                      "rmarkdown","flexdashboard","Shiny",                      "xtable","httr","profvis"), dependencies = TRUE)

Wrap up

The R script for loading and installing the packages is available at Github. Make sure to check the Github repository for latest list updates. And as always, feel free to fork the code or commit updates, add essentials packages to list, comment, improve and agree or disagree.

You can also run the following command to install all of the packages in a single run:

install.packages(c("Hmisc","foreign","protr","readxl","xlsx",                   "csv","readr","tidyverse","jsonLite","rjson",                   "RJSONIO","jsonvalidate","sparkavro","arrow","feather",                   "XML","odbc","RODBC","mssqlR","RMySQL",                   "dbConnect","postGIStools","RPostgreSQL","ODBC",                   "RSQLite","sqliter","dbflobr","RSQL","sqldf",                   "poplite","queryparser","influxdbr","janitor","outliers",                   "missForest","frequency","Amelia","diffobj","mice",                   "VIM","Bioconductor","mi","wrangle","mitools",                   "stringr","lubridate","glue","scales","hablar",                   "dplyr","purr","magrittr","data.table","plyr",                   "tidyr","tibble","reshape2","stats","Lars",                   "caret","survival","gam","glmnet","quantreg",                   "sgd","BLR","MASS","car","mlogit","RRedshiftSQL",                   "earth","faraway","nortest","lmtest","nlme",                   "splines","sem","WLS","OLS","pls",                   "2SLS","3SLS","tree","rpart","rio",                   "FuzzyNumbers","ez","psych","CCA","CCP",                   "icapca","gvlma","smacof","MVN","rpca",                   "gpca","EFA.MRFA","MFAg","MVar","fabMix",                   "fad","spBFA","cate","mnlfa","CSFA",                   "GFA","lmds","SPCALDA","semds","superMDS",                   "vcd","vcdExtra","ks","rrcov","eRm",                   "MNP","bayesm","ltm","fpc","cluster",                   "treeClust","e1071","NbClust","skmeans","kml",                   "compHclust","protoclust","pvclust","genie","tclust",                   "ClusterR","dbscan","CEC","GMCM","EMCluster",                   "randomLCA","MOCCA","factoextra","poLCA","ts",                   "zoo","xts","timeSeries","tsModel","TSMining",                   "TSA","fma","fpp2","fpp3","tsfa",                   "TSdist","TSclust","feasts","MTS","dse",                   "sazedR","kza","fable","forecast","tseries",                   "nnfor","quantmod","fastnet","tsna","sna",                   "networkR","InteractiveIGraph","SemNeT","igraph",                   "dyads","staTools","CINNA","tm","tau","NetworkToolbox"                   "koRpus","lexicon","sylly","textir","textmineR",                   "MediaNews","lsa","ngram","ngramrr","corpustools",                   "udpipe","textstem","tidytext","text2vec","crossval",                   "bcv","klaR","EnsembleCV","gencve","cvAUC",                   "CVThresh","cvTools","dcv","cvms","blockCV",                   "randomForest","grf","ipred","party","randomForestSRC",                   "BART","Boruta","LTRCtrees","REEMtree","refr",                   "binomialRF","superml","gbm","GAMBoost","GMMBoost",                   "bst","sboost","C50","RWeka","klar",                   "kernlab","svmpath","nnet","gnn","rnn",                   "spnn","brnn","RSNNS","AMORE","simpleNeural",                   "ANN2","yap","yager","deep","neuralnet",                   "TeachNet","deepnet","RcppDL","tensorflow","h2o",                   "kerasR","deepNN","Buddle","automl","RLT",                   "ReinforcementLearning","MDPtoolbox","lime","localModel",                   "iml","EIX","flashlight","interpret","outliertree",                   "dockerfiler","azuremlsdk","sparklyr","cloudml","ggvis",                   "htmlwidgets","maps","sunburstR","lattice","predict3d",                   "rgl","rglwidget","plot3Drgl","ggmap","ggplot2",                   "plotly","RColorBrewer","dygraphs","canvasXpress","qgraph",                   "moveVis","ggcharts","visNetwork","visreg","sjPlot",                   "plotKML","squash","statVisual","mlr3viz","DiagrammeR",                   "pavo","rasterVis","timelineR","DataViz","d3r","breakDown",                   "d3heatmap","dashboard","highcharter","rbokeh","rvest",                   "Rcrawler","ralger","scrapeR","devtools","usethis",                   "roxygen2","knitr","rmarkdown","flexdashboard","Shiny",                   "xtable","httr","profvis"), dependencies = TRUE)

 

Happy R-ing. 🙂

 

 

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

#26: Upgrading to R 4.0.0

Posted: 26 Apr 2020 05:18 PM PDT

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the 26th post in the rationally regularized R revelations series, or R4 for short.

R 4.0.0 was released two days ago, and a casual glance at some social media conversations appears to suggest quite some confusion, almost certainly some misunderstandings, and possibly also a fair amount of fear, uncertainty, and doubt about the process. So I thought I could show how I upgrade my own main workstation, live and in colour without a safety net. (Almost: I did upgrade my laptop yesterday which went swimmingly, if more slowly.) So here is a fresh video about upgrading to R 4.0.0, with some support slides as usual:

The slides used in the video are at this link.

A few quick follow-ups to the 'live' nature of this. The pbdZMQ package did in fact install smoothly once the (Ubuntu) -dev packages for Zero MQ were (re-)installed; then IRkernel also followed. BioConductor completed once I realized that GOSemSim needed the annotation package GO.db to be updated, that allowed MNF to install. So the only bug, really, was the circular depdency between pkgload and testthat. Overall, not bad at all for a quick afternoon session!

And as mentioned, if you are interested and have questions concerning use of R on a .deb based system like Debain or Ubuntu (or Mint or …), the r-sig-debian list is a very good and friendly place to ask them.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Get all your packages back on R 4.0.0

Posted: 26 Apr 2020 05:00 PM PDT

[This article was first published on Johannes B. Gruber on Johannes B. Gruber, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R 4.0.0 was released on 2020-04-24.
Among the many news two stand out for me:
First, R now uses stringsAsFactors = FALSE by default, which is especially welcome when reading in data (e.g., via read.csv) and when constructing data.frames.
The second news that caught my eye was that all packages need to be reinstalled under the new version.

This can be rather cumbersome if you have collected a large number of packages on your machine while using R 3.6.x and you don't want to spend the next weeks running into Error in library(x) : there is no package called 'x' errors.
But there is an easy way to solve this.

After you made the update, first get your old packages:

old_packages <- installed.packages(lib.loc = "/home/johannes/R/x86_64-pc-linux-gnu-library/3.6/")  head(old_packages[, 1])
##       abind     acepack        ade4         AER   animation   anomalize   ##     "abind"   "acepack"      "ade4"       "AER" "animation" "anomalize"

lib.loc should be the location you installed the packages before updating to R 4.0.0.
If unsure, you can call .libPaths().
The first path is your new lib.loc and the previous one should look the same except with 3.6 at the end.

Then you can find the packages previously installed but currently missing:

new_packages <- installed.packages()  missing_df <- as.data.frame(old_packages[    !old_packages[, "Package"] %in% new_packages[, "Package"],     ])

missing_df now contains all packages you had previously installed but are not present now.
In an intermediate step you might want clean up this list a bit, as you might not want all former packages back (I just used write.csv to export it, annotated the list and read it back in with read.csv).

Once this is done, you can install your packages back:

install.packages(missing_df$Package)

This can run fo a while…

Once the installations are done, you can check the missing packages again:

missing_df <- as.data.frame(old_packages[    !old_packages[, 1] %in% installed.packages()[, 1],     ])

If you've got all your packages back, missing_df should have zero rows.
If not, you might have had some packages which are not currently on CRAN.
For me those are usually packages only available on GitHub so far.
I used a nice little piece of code I found in the available package to find the repositories of these packages:

library(dplyr, warn.conflicts = FALSE)  on_gh <- function(pkg) {    repo = jsonlite::fromJSON(paste0("http://rpkg-api.gepuro.net/rpkg?q=", pkg))    repo[basename(repo$pkg_name) == pkg,]  }  gh_pkgs <- lapply(c("quanteda.classifiers", "emo"), on_gh) %>%     bind_rows()  as_tibble(gh_pkgs)
## # A tibble: 2 x 3  ##   pkg_name            title                           url                         ##                                                                    ## 1 quanteda/quanteda.… quanteda textmodel extensions … https://github.com/quante…  ## 2 hadley/emo          Easily insert emoji into R and… https://github.com/hadley…

Check if this grabbed the correct ones, then you can install them using remotes::install_github(gh_pkgs$pkg_name).

For me, that was it.
Your mileage may vary if some of your packages were removed from CRAN in the meantime or if you use other repos (e.g., Bioconductor).

To leave a comment for the author, please follow the link and comment on their blog: Johannes B. Gruber on Johannes B. Gruber.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Updating to 4.0.0 on MacOS

Posted: 26 Apr 2020 05:00 PM PDT

[This article was first published on Posts on R Lover ! a programmer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Mixed emotions

Wow! Has it been a year? Another major update from The R
Foundation
(the recent 4.0.0
release
in April). I'm always happy to see the
continuing progress and the combination of new features and bug fixes, but I
also dread the upgrade because it means I have to address the issue of what to
do about the burgeoning number of packages (libraries) I have installed. I wrote
a fairly comprehensive post about it last year. I just took the plunge this year
and almost everything seems to still work. Vindication!

The details are here in the old post
but since this is timely I republish the basics.

I'm aware that there are full-fledged package
managers
like
packrat and checkpoint and even a package designed to manage the upgrade for
you on windows, but I'm a Mac user and wanted to do things my own way and I
don't need that level of sophistication.

So I set out to do the following:

  1. Capture a list of everything I had installed under R 3.6.x and, very
    importantly, as much as I could about where I got the package e.g. 
    CRAN or GitHub or ???
  2. Keep a copy for my own edification and potential future use.
  3. Do a clean R 4.0.0 install and not copy any library directories manually or
    create symlinks or any other thing at the OS level.
  4. Take a look at the list I produced in #1 above but mainly to just download
    and install the exact same packages if I can find them.
  5. Make the process mainly scripted and automatic and available again for
    the future – it worked this year let's hope it works again next.

Before you upgrade!

Let's load tidyverse to have access to all it's various functions and
features and then build a dataframe called allmypackages with the basic
information about the packages I currently have installed in R 3.6.3.

Note – I'm writing this after already upgrading so there will be a few inconsistencies in the output

  • This could just as easily be a tibble but I chose as.data.frame
  • I am deliberately removing base packages from the dataframe by filter
  • I am eliminating columns I really don't care about with select
require(tidyverse)    allmypackages <- as.data.frame(installed.packages())    allmypackages <- allmypackages %>%    filter(Priority != "base" | is.na(Priority)) %>%    select(-c(Enhances:MD5sum, LinkingTo:Suggests)) %>%    droplevels()    str(allmypackages)

A function to do the hard work

As I mentioned above the stack overflow post was a good start but I wanted more
information from the function. Rather than TRUE/FALSE to is it github I would
like as much information as possible about where I got the package. The
package~source function will be applied to the Package column for each row
of our dataframe. For example
as.character(packageDescription("ggplot2")$Repository) will get back "CRAN",
and as.character(packageDescription("CHAID")$Repository) will yield "R-Forge".
For GitHub packages the result is character(0) which has a length of zero.
So we'll test with an if else clause. If we get an answer like "CRAN" we'll
just return it. If not, we'll see if there is a GitHub repo listed with
as.character(packageDescription(pkg)$GithubRepo) as well as a GitHub username
as.character(packageDescription(pkg)$GithubUsername). If they exist we'll
concatenate and return. If not we'll return "Other". Besides being good
defensive programming this may catch the package you have built for yourself as
is the case for me.

package_source <- function(pkg){    x <- as.character(packageDescription(pkg)$Repository)    if (length(x) == 0) {      y <- as.character(packageDescription(pkg)$GithubRepo)      z <- as.character(packageDescription(pkg)$GithubUsername)      if (length(y) == 0) {        return("Other")      } else {        return(str_c("GitHub repo = ",                     z,                     "/",                     y))      }    } else {      return(x)    }  }    # show the first 60 as an example  head(sapply(allmypackages$Package,              package_source),       60)

What's in your libraries?

Now that we have the package_source function we can add a column to our data
frame and do a little looking.

allmypackages$whereat <- sapply(allmypackages$Package,                                  package_source)  str(allmypackages)    table(allmypackages$whereat)  allmypackages %>%    filter(whereat == "Other") %>%    select(Package, Version)

And just to be on the safe side we'll also write a copy out as a csv file so we
have it around in case we ever need to refer back.

write.csv(allmypackages, "mypackagelistApril2020.csv")

Go ahead and install R 4.0.0

At this point we have what we need, so go ahead and download and install R
4.0.0. At the end of the installation process you'll have a pristine copy with a
new (mostly empty) library directory (on my system it's
/Library/Frameworks/R.framework/Versions/4.0/). When next you restart R and R
Studio you'll see a clean new version. Let's make use of our data frame to
automate most of the process of getting nice clean copies of the libraries we
want.

We'll start by getting the entire tidyverse since we need several parts and
because installing it will trigger the installation of quite a few dependencies
and bootstrap our work.

# post upgrade with output surpessed  install.packages("tidyverse")  library(tidyverse)

Now we have R 4.0.0 and some additional packages. Let's see what we can do.
First let's create two dataframes, one with our old list and one with what we
have right now. Then we can use anti_join to make a dataframe that lists the
differences thediff. We can use filter and pull to generate a vector of
just the the packages that are on CRAN we want to install.

oldpackages <- read.csv("mypackagelistApril2020.csv")  allmypackages <- as.data.frame(installed.packages())  allmypackages <- allmypackages %>%    filter(Priority != "base" | is.na(Priority)) %>%    select(-c(Enhances:MD5sum, LinkingTo:Suggests))  thediff <- anti_join(oldpackages,                       allmypackages,                        by = "Package")    thediff <- droplevels(thediff)  thediff %>%    filter(whereat == "CRAN") %>%    pull(Package) %>%    as.character

Just do it!

Now that you have a nice automated list of everything that is a CRAN package you
can give it a final look and see if there is anything else you'd like to filter
out. Once you are sure the list is right one final pipe will set the process in
motion.

thediff %>%    filter(whereat == "CRAN") %>%    pull(Package) %>%    as.character %>%    install.packages

Depending on the speed of your network connection and the number of packages you
have that will run for a few minutes.

That takes care of our CRAN packages. What about GitHub? Here's another chance
to review what you have and whether you still want need these packages. You can
automate the process and once again feed the right vector to
devtools::install_github().

# Manual peek  thediff %>%    filter(str_detect(whereat, "GitHub repo")) %>%    select(Package, Version, NeedsCompilation, whereat)    # if you want to automate  thediff %>%    filter(str_detect(whereat, "GitHub repo")) %>%    pull(whereat) %>%    as.character %>%    str_remove("GitHub repo = ") %>%    devtools::install_github()

Same with the one package I get from R-Forge…

allmypackages %>%    filter(str_detect(whereat, "R-Forge")) %>%    select(Package, Version, NeedsCompilation, whereat)    install.packages("CHAID", repos="http://R-Forge.R-project.org")

At the end of this process you should have a nice clean R install that has all
the packages you choose to maintain as well as a detailed listing of what those
are.

Done

Hope you enjoyed the post. Comments always welcomed. Especially please let
me know if you actually use the tools and find them useful.

Chuck

CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

To leave a comment for the author, please follow the link and comment on their blog: Posts on R Lover ! a programmer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

An adventure in downloading books

Posted: 26 Apr 2020 05:00 PM PDT

[This article was first published on Anindya Mozumdar, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Earlier today, I noticed a tweet from well known R community member Jozef Hajnala. The tweet was about Springer releasing around 65 books related to data science and machine learning for free to download as PDFs. Following the link in his tweet, I learned that Springer has released 408 books in total, out of which 65 are related to the field of data science. The author of the blog post did a nice job of providing the links to the Springer website for each of these books. While browsing through a couple of the links, it appeared to me that the links are all well structured and it would be worth a try to write an R script to download all of the books.

My first impluse was to use the rvest package. However, I was finding it hard to scrape the page in the "Towards Data Science" website as it is probably generated using JavaScript and not a simple HTML. After a few minutes of research, I discovered the Rcrawler package which appeared to have some functions which would suit my needs. While I have heard of headless browsers before, this was my first experience using one. Rcrawler itself installs PhantomJS using which one can mimic 'visiting' a web page using code. The LinkExtractor function from RCrawler is a nice function which gives you the internal and external links present in a page. It also provides you with some general information on the page, which was useful to extract the name of each book.

Given the well structured pages in the Springer website, it took some simple string manipulation to find a way to generate the link to the actual PDF of the book. After that, it was a simple call to the R function download.file. As a result of this exercise, I also learned two new things

  • Using a regular expression to remove the last 2 characters in a string.
  • The 'wb' mode in download.file. In my initial experiments, I was facing some issues with the downloaded pdf which got solved using this mode.

Overall, an hour of effort based on a tweet, and I learned a few things. I will most likely not have the time to read most or any of these books but at least it helped me learn some new stuff in R. Time well spent.

library(Rcrawler)    install_browser() # One time only    br <- run_browser()  page<-LinkExtractor(url="https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189",                      Browser = br, ExternalLInks = TRUE)      el <- page$ExternalLinks  sprlnks <- el[grep("springer", el, fixed = TRUE)]    for (sprlnk in sprlnks) {    spr_page <- LinkExtractor(sprlnk)    il <- spr_page$InternalLinks    ttl <- spr_page$Info$Title    ttl <- trimws(strsplit(ttl, "|", fixed = TRUE)[[1]][1])    chapter_link <- il[grep("chapter", il, fixed = TRUE)][1]    chp_splits <- strsplit(chapter_link, "/", fixed = TRUE)    n <- length(chp_splits[[1]])    suff <- chp_splits[[1]][n]    suff <- gsub(".{2}$", "", suff)    pref <- chp_splits[[1]][n-1]    final_url <- paste0("https://link.springer.com/content/pdf/", pref, "/",                        suff, ".pdf")    print(final_url)    download.file(final_url, paste0(ttl, ".pdf"), mode = "wb")    Sys.sleep(5)  }    stop_browser(br)

To leave a comment for the author, please follow the link and comment on their blog: Anindya Mozumdar.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

ChemoSpecUtils Update

Posted: 26 Apr 2020 05:00 PM PDT

[This article was first published on R on Chemometrics & Spectroscopy using R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ChemoSpecUtils, a package that supports the common needs of ChemoSpec and ChemoSpec2D, has been updated to fix an unfortunate distance calculation error in version 0.4.38, released in January of this year. From the NEWS file for version 0.4.51:

  • Function rowDist, which supports a number of functions, was overhauled to address confusion in the documentation, and in my head, about distances vs. similarities. Also, different definitions found in the literature were documented more clearly. The Minkowski distance option was removed (ask if you want it back), code was cleaned up, documentation greatly improved, an example was added and unit tests were added. Plot scales were also corrected as necessary. Depending upon which distance option is chosen, this change affects hcaSpectra, plotSpectraDist, sampleDist and hcaScores in package ChemoSpec as well as hats_alignSpectra2D and hcaScores in package ChemoSpec2D.

This brings to mind a Karl Broman quote I think about frequently:

"Open source means everyone can see my stupid mistakes.
Version control means everyone can see every stupid mistake I've ever made."

Karl Broman

Karl Broman quote source

To leave a comment for the author, please follow the link and comment on their blog: R on Chemometrics & Spectroscopy using R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Proofs without Words using gganimate

Posted: 25 Apr 2020 05:00 PM PDT

[This article was first published on R on Notes of a Dabbler, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently watched the 2 part workshop (part 1, part 2) on ggplot2 and extensions given by Thomas Lin Pedersen. First of, it was really nice of Thomas to give the close to 4 hour workshop for the benefit of the community. I personally learnt a lot from it. I wanted to try out gganimate extension that was covered during the workshop.

There are several resources on the web that show animations/illustrations of proofs of mathematical identities and theorems without words (or close to it). I wanted to take a few of those examples and use gganimate to recreate the illustration. This was a fun way for me to try out gganimate.

Example 1:

This example is taken from AoPS Online and the result is that sum of first \(n\) odd numbers equals \(n^2\). \[ 1 + 3 + 5 + \ldots + (2n – 1) = n^2 \] The gganimate version of the proof (using the method in AoPS Online) is shown below (R code, html file)

Example 2:

This example is also taken from AoPS Online and the result is:

\[ 1^3 + 2^3 + \ldots + (n-1)^3 + n^3 = (1 + 2 + \ldots + n)^2 \] The gganimate version of the proof (using the method in AoPS Online) is shown below ( R code, html file):

Example 3

This example from AoPS Online illustrates the result

\[ \frac{1}{2^2} + \frac{1}{2^4} + \frac{1}{2^6} + \frac{1}{2^8} + \ldots = \frac{1}{3} \] The gganimate version of the proof (using the method in AoPS Online) is shown below ( R code, html file):

Example 4

According to Pythagoras theorem, \[ a^2 + b^2 = c^2 \] where \(a\), \(b\), \(c\) are sides of a right angled triangle (with \(c\) being the side opposite \(90^o\) angle)

There was an illustration of the proof of pythogoras theorem in a video from echalk.

The gganimate version of the proof is shown below ( R code, html file)

In summary, it was great to use gganimate for these animations since it does all the magic with making transitions work nicely.


To leave a comment for the author, please follow the link and comment on their blog: R on Notes of a Dabbler.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

A package to download free Springer books during Covid-19 quarantine

Posted: 25 Apr 2020 05:00 PM PDT

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.









Introduction

You probably already have seen that Springer released about 500 books for free following the COVID-19 pandemic. According to Springer, these textbooks will be available free of charge until at least the end of July.

Following this announcement, I already downloaded a couple of statistics and R programming textbooks from their website and I will probably download a few more in the coming weeks.

In this article, I present a package that saved me a lot of time and which may be of interest to many of us: the {springerQuarantineBooksR} package, developed by Renan Xavier Cortes.1

This package allows you to easily download all (or a selection of) Springer books made available free of charge during the COVID-19 quarantine.

With this large collection of high quality resources and my collection of top R resources about the Coronavirus, we do not have any excuse to not read and learn during this quarantine.

Without further ado, here is how the package works in practice.

Installation

After having installed the {devtools} package, you can install the {springerQuarantineBooksR} package from GitHub with:

# install.packages("devtools")  devtools::install_github("renanxcortes/springerQuarantineBooksR")  library(springerQuarantineBooksR)

Download all books at once

First, set the path where you would like to save all books with the setwd() function then download all of them at once with the download_springer_book_files() function. Note it takes several minutes since all books combined amount for almost 8GB.

setwd("path_of_your_choice") # where you want to save the books  download_springer_book_files(parallel = TRUE)

You will find all downloaded books (in PDF format) in a folder named "springer_quarantine_books", organized by category.2

Create a table of Springer books

You can load into an R session a table containing all the titles made available by Springer, with the download_springer_table() function:

springer_table <- download_springer_table()

This table can then be improved with the {DT} package to:

  • allow searching a book by its title or author
  • allow downloading the list of available books, and
  • make the Springer links clickable for instance
# install.packages("DT")  library(DT)    springer_table$open_url <- paste0(    'SpringerLink' # closing HTML tag  )    datatable(springer_table,    rownames = FALSE, # remove row numbers    filter = "top", # add filter on top of columns    extensions = "Buttons", # add download buttons    options = list(      autoWidth = TRUE,      dom = "Blfrtip", # location of the download buttons      buttons = c("copy", "csv", "excel", "pdf", "print"), # download buttons      pageLength = 5, # show first 5 entries, default is 10      order = list(0, "asc") # order the title column by ascending order    ),    escape = FALSE # make URLs clickable  )

Download only specific books

By title

Now, say that you are interested to download only one specific book and you know its title. For instance, suppose you want to download the book entitled "All of Statistics":

download_springer_book_files(springer_books_titles = "All of Statistics")

If you are interested to download all books with the word "Statistics" in the title, you can run:

springer_table <- download_springer_table()    library(dplyr)  specific_titles_list <- springer_table %>%    filter(str_detect(      book_title, # look for a pattern in the book_title column      "Statistics" # specify the title    )) %>%    pull(book_title)    download_springer_book_files(springer_books_titles = specific_titles_list)

By author

If you want to download all books from a specific author, you can run:

springer_table <- download_springer_table()    # library(dplyr)  specific_titles_list <- springer_table %>%    filter(str_detect(      author, # look for a pattern in the author column      "John Hunt" # specify the author    )) %>%    pull(book_title)    download_springer_book_files(springer_books_titles = specific_titles_list)

By subject

You can also download all books covering a specific subject:

springer_table <- download_springer_table()    # library(dplyr)  specific_titles_list <- springer_table %>%    filter(str_detect(      subject_classification, # look for a pattern in the subject_calssification column      "Statistics" # specify the subject    )) %>%    pull(book_title)    download_springer_book_files(springer_books_titles = specific_titles_list)

Acknowledgments

I would like to thank:

  • Renan Xavier Cortes (and all contributors) for providing this package
  • The springer_free_books project which was used as inspiration to the {springerQuarantineBooksR} package
  • And last but not least, Springer who offers many of their excellent books for free!

Thanks for reading. I hope this article will help you to download and read more high quality materials made available by Springer during this Covid-19 quarantine.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.

Get updates every time a new article is published by subscribing to this blog.


  1. I thank the author for allowing me to present his package in a blog post.↩

  2. Note that you can change the folder name by specifying the argument destination_folder = "folder_name".↩

To leave a comment for the author, please follow the link and comment on their blog: R on Stats and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This posting includes an audio/video/photo media file: Download Now

Comments

  1. My friend mentioned to me your blog, so I thought I’d read it for myself. Very interesting insights, will be back for more!
    Nuremberg Hotel

    ReplyDelete
  2. Hey – great blog, just looking around some blogs, seems a really nice platform you are using. I’m currently using WordPress for a few of my blogs but looking to change one of them over to a platform similar to yours as a trial run. Anything in particular you would recommend about it?
    try this web-site

    ReplyDelete
  3. I personally use them exclusively high-quality elements : you will notice these folks during: Beef Flavored CBD Oil for Pets

    ReplyDelete

Post a Comment