[R-bloggers] Guide to a high-performance, powerful R installation (and 7 more aRticles)

[R-bloggers] Guide to a high-performance, powerful R installation (and 7 more aRticles)

Link to R-bloggers

Guide to a high-performance, powerful R installation

Posted: 31 Aug 2018 01:08 PM PDT

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

R is an amazing language with 25 years of development behind it, but you can make the most from R with additional components. An IDE makes developing in R more convenient; packages extend R's capabilities; and multi-threaded libraries make computations faster. 

Since these additional components aren't included on the official R website, getting the ideal R environment set up can be a bit tricky. Fortunately, there's a handy R installation guide by Mauricio Vargas that explains how to get everything you need set up on Windows, Mac and Ubuntu Linux. On each platform, the guide describes how to install:

  • The R language engine
  • The RStudio IDE.
  • The tidyverse suite of packages
  • Multi-threaded math libraries (BLAS). On Windows, Mauricio recommends Microsoft R Open ("what made my R and Windows experience amazing"). For Mac and Unix he suggests installing OpenBLAS, but I'll add that Microsoft R Open provides BLAS acceleration on those platforms as well. It's easy to configure RStudio to use Microsoft R Open, too.

Find all the details in the installation guide, linked below.

DataCamp Community: How to install R on Windows, Mac OS X and Ubuntu

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Datazar Dektop v1.0.0 Released

Posted: 31 Aug 2018 07:35 AM PDT

(This article was first published on R Language in Datazar Blog on Medium, and kindly contributed to R-bloggers)

Today we're releasing Datazar's desktop application for Mac. This is the debut for the desktop app and we're thrilled to finally get it in the hands of everyone. The program gives you the might of the cloud while providing you an experience that can only be achivied via installed applications.

Create R, Python notebooks, scripts and consoles and get results so fast you can't even tell you're using the cloud. The app also includes Datazar Paper which is our very own interactive and reproducible technical/scientific paper.

We'll continualy release updates so if there is something you want added (or upgraded), tweet at us @datazarhq or email support@datazar.com and we'll add it to the feature list.

Download the desktop app here: https://www.datazar.com/desktop


Datazar Dektop v1.0.0 Released was originally published in Datazar Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: R Language in Datazar Blog on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

New Course: Marketing Analytics in R: Choice Modeling

Posted: 31 Aug 2018 07:31 AM PDT

(This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers)

Here is the course link.

Course Description

People make choices everyday. They choose products like orange juice or a car, decide who to vote for, and choose how to get to work. Marketers, retailers, product designers, political scientists, transportation planners, sociologists, and many others want to understand what drives these choices. Choice models predict what people will choose as a function of the features of the options available and can be used to make important product design decisions. This course will teach you how to organize choice data, estimate choice models in R and present findings. This course covers both analyses of observed real-world choices and the survey-based approach called conjoint analysis.

Chapter 1: Quickstart Guide (Free)

Our goal for this chapter is to get you through the entire choice modeling process as quickly as possible, so that you get a broad understanding of what we can do with choice models and how the choice modeling process works. The main idea here is that we can use a choice model to understand how customers' product choices depend on the features of those products. Do sportscar buyers prefer manual transmissions to automatic? By how much? In order to give you an overview, we will skip over many of the details. In later chapters, we will go back and cover important issues in preparing data, specifying and interpreting models and reporting your findings, so that you are fully prepared to use these methods with your own choice data.

Chapter 2: Managing and Summarizing Choice Data

There are many different places to get choice data and different ways it can be formatted. In this chapter, we will take data that is provided in several alternative formats and learn how to get it into shape for choice modeling. We will also discuss how you can build a survey to collect your own choice data.

Chapter 3: Building Choice Models

In this chapter, we take deeper dive into estimating choice models. To give you a foundation for thinking about choice models, we will focus on how the multinomial logit model converts the product features into a prediction for what the decision maker will choose. This will give you a framework for making decisions about which features to include in your model.

Chapter 4: Hierarchical Choice Models

Different people have different tastes and preferences. This seems intuitively obvious, but there is also extensive research in marketing showing that this is true. This chapter covers choice models where we assume that different decision makers have different preferences that influence their choices. When our models recognize that different consumers have different preferences, they tend to make larger share predictions for niche products that appeal to a subset of consumers. Hierarchical models are used in most commercial choice modeling applications, so it is important to understand how they work.

Prerequisites

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

New Course: Bayesian Modeling with RJAGS

Posted: 31 Aug 2018 07:28 AM PDT

(This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers)

Here is the course link.

Course Description

The Bayesian approach to statistics and machine learning is logical, flexible, and intuitive. In this course, you will engineer and analyze a family of foundational, generalizable Bayesian models. These range in scope from fundamental one-parameter models to intermediate multivariate & generalized linear regression models. The popularity of such Bayesian models has grown along with the availability of computing resources required for their implementation. You will utilize one of these resources – the rjags package in R. Combining the power of R with the JAGS (Just Another Gibbs Sampler) engine, rjags provides a framework for Bayesian modeling, inference, and prediction.

Chapter 1: Introduction to Bayesian Modeling (Free)

Bayesian models combine prior insights with insights from observed data to form updated, posterior insights about a parameter. In this chapter, you will review these Bayesian concepts in the context of the foundational Beta-Binomial model for a proportion parameter. You will also learn how to use the rjags package to define, compile, and simulate this model in R.

Chapter 2: Bayesian Models & Markov Chains

The two-parameter Normal-Normal Bayesian model provides a simple foundation for Normal regression models. In this chapter, you will engineer the Normal-Normal and define, compile, and simulate this model using rjags. You will also explore the magic of the Markov chain mechanics behind rjags simulation.

Chapter 3: Bayesian Inference & Prediction

In this chapter, you will extend the Normal-Normal model to a simple Bayesian regression model. Within this context, you will explore how to use rjags simulation output to conduct posterior inference. Specifically, you will construct posterior estimates of regression parameters using posterior means & credible intervals, you will test hypotheses using posterior probabilities, and you will construct posterior predictive distributions for new observations.

Chapter 4: Multivariate & Generalized Linear Models

In this final chapter, you will generalize the simple Normal regression model for application in broader contexts. You will incorporate categorical predictors, engineer a multivariate regression model with two predictors, and finally extend this methodology to Poisson multivariate regression models for count variables.

Prerequisites

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

New Course: Building Dashboards with flexdashboard

Posted: 31 Aug 2018 07:25 AM PDT

(This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers)

Here is the course link.

Course Description

Communication is a key part of the data science process. Dashboards are a popular way to present data in a cohesive visual display. In this course you'll learn how to assemble your results into a polished dashboard using the flexdashboard package. This can be as simple as adding a few lines of R Markdown to your existing code, or as rich as a fully interactive Shiny-powered experience. You will learn about the spectrum of dashboard creation tools available in R and complete this course with the ability to produce a professional quality dashboard.

Chapter 1: Dashboard Layouts (Free)

In this chapter you will learn how R Markdown and the flexdashboard package are used to create a dashboard, and how to customize the layout of components on your dashboard.

Chapter 2: Data Visualization for Dashboards

This chapter will introduce the many options for including data visualizations in your dashboard. You'll learn about how to optimize your plots for display on the web.

Chapter 3: Dashboard Components

In this chapter you will learn about other components that will allow you to create a complete dashboard. This includes ways to present everything from a single value to a complete dataset.

Chapter 4:Adding Interactivity with Shiny

This chapter will demonstrate how you can use Shiny to make your dashboard interactive. You'll keep working with the San Francisco bike sharing data and build a dashboard for exploring this data set.

Prerequisites

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Smoke from a distant fire

Posted: 31 Aug 2018 07:08 AM PDT

(This article was first published on Bayes Ball, and kindly contributed to R-bloggers)

Forest fires and air quality

August 31, 2018

It was recently announced that during 2018, British Columbia has seen the most extensive forest fire season on record. As I write this (2018-08-31) there are currently 442 wildfires burning in British Columbia. These fires have a significant impact on people's lives - many areas are under evacuation order and evacuation alert, and there are reports that homes have been destroyed by the blazes.

The fires also create a significant amount of smoke, which has been pushed great distances by the shifting winds. This includes the large population centres of Vancouver and Victoria in British Columbia, as well as the Seattle metropolitan region and elsewhere in Washington. (Clifford Mass, Professor of Atmospheric Sciences at the Universtiy of Washington in Seattle, has written extensively about the smoke events in the region; see for example Western Washington Smoke: Darkest Before the Dawn from 2018-08-22.)

The Province of British Columbia has many air quality monitoring stations around the province, and makes the data available. The measure most used for monitoring the effects on health is PM25 or PM2.5, for fine particles with a diameter of 2.5 microns (millionths of a metre). The B.C. government has a Current Particulate Matter map that colour codes the one hour average measures for all the testing stations around the province.

The data file and a simple plot

The DataBC Catalogue provides access to air quality data. There's "verified� to the end of 2017, and "unverified� for the past 30 days. Since we want to see what happened this month, it's the latter we want. (The page with the links to the raw files is here.)

The files are arranged by particulate or gas type; there's a table for ozone and another for sulpher dioxide, and others for the particulate matter. Note that the data are made available under the Province of B.C.'s Open Data license, and are in nice tidy form. And the date format is ISO 8601, which makes me happy.

To make sure we've got a reproducible version, I've saved the file I downloaded early this morning to my google drive. The link to the folder is here.

For the first plot, let's look at the PM2.5 level for my hometown of Victoria, B.C. The code below loads the R packages we'll use, reads the data, and generates the plot.

# tidyverse packages
library(tidyverse)
library(glue)

PM25_data <- readr::read_csv("PM25_2018-08-31.csv")

filter(STATION_NAME == "Victoria Topaz") %>% ggplot() + geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) + labs(x = "date", title = glue("Air quality: Victoria Topaz"), subtitle = "one hour average, µg/m3 of PM2.5", caption = "data: B.C. Ministry of Environment and Climate Change Strategy")

There are 61 air quality monitoring stations around British Columbia. It would be interesting to see how the air quality was in other parts of the region - and since over half (54% in 2017) of the province's population lives in the Vancouver Census Metropolitan Area (CMA), let's plot the air quality there. There are multiple stations in the Vancouver CMA, so I chose the one at Burnaby South…it's fairly central in the region.We can run this line of code to see a listing of all 61 stations (but we won't do that now…)

# list all the air quality stations for which there is PM2.5 data
unique(PM25_data$STATION_NAME)

And since we're going to be doing this often, let's wrap the code that filters for the location we want and runs the plot in a function. Note that we'll create a new variable station_name so all we need to do to change the plot is assign the name of the station we want, and off we go. Not only does this simplify our lives now, but is all-but-essential for a Shiny application.


# the air quality plot
PM25_plot <- function(datafile, station_name){
datafile %>%
filter(STATION_NAME == station_name) %>%
ggplot() +
geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) +
labs(x = "date",
title = glue("Air quality: ", station_name),
subtitle = "one hour average, µg/m3 of PM2.5",
caption = "data: B.C. Ministry of Environment and Climate Change Strategy")
}

Now that we've got the function, the code to create the plot for the Burnaby South station is significantly simplified: assign the station name, and call the function.

# our Burnaby plot
station_name <- "Burnaby South"

PM25_plot(PM25_data, station_name)

And what about the towns that are the closest to the fires? While there are fires burning across the province, the fires that are burning the forests of the Nechako Plateau have understandably received a lot of attention. You may have seen the news stories and images from Prince George like this and this, or the images of the smoke plume from the NASA Worldview site.

Prince George is east of many major fires, downwind of the prevailing westerly winds. So what has the air quality in Prince George been like?

station_name <- "Prince George Plaza 400"
PM25_plot(PM25_data, station_name)

Or still closer to the fires, the town of Burns Lake.

station_name <- "Burns Lake Fire Centre"
PM25_plot(PM25_data, station_name)

The town of Smithers is west of the fires that are burning on the Nechako Plateau and producing all the smoke experienced in Burns Lake and Prince George. The residents of Smithers have had a very different experience, only seeing smoke in the sky when the winds shifted to become easterly.

station_name <- "Smithers St Josephs" 
PM25_plot(PM25_data, station_name)


multiple stations in one plot

You may have noticed that the Y axis on the plots can be quite different - for example, Victoria reaches 300, Smithers gets to 400, and Prince George is double that at 800, and Burns Lake is more than double again. There are two ways we can compare multiple stations: a single plot, or faceted plots.

a line plot with four stations

station_name <- c("Burns Lake Fire Centre", "Prince George Plaza 400", 
                  "Smithers St Josephs", "Victoria Topaz")

PM25_data %>%
filter(STATION_NAME %in% station_name) %>%
ggplot() +
geom_line(aes(x = DATE_PST, y = REPORTED_VALUE, colour = STATION_NAME)) +
labs(x = "date",
title = glue("Air quality: Burns Lake, Prince George, Smithers, Victoria"),
subtitle = "one hour average, µg/m3 of PM2.5",
caption = "data: Ministry of Environment and Climate Change Strategy")

With four complex lines as we have here, it can be hard to discern which line is which. 

Use facets to plot the four stations separately

Facets give us another way to view the comparisons. In the first version, with the facets stacked vertically, it emphasizes comparisons on the X axis - that is, over time. In this way, we can see that the four locations have had smoke events that have occurred at different times.

station_name <- c("Burns Lake Fire Centre", "Prince George Plaza 400", 
                  "Smithers St Josephs", "Victoria Topaz")


PM25_data %>%
filter(STATION_NAME %in% station_name) %>%
ggplot() +
geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) +
facet_grid(STATION_NAME ~ .) +
labs(title = glue("Air quality: Burns Lake, Prince George, Smithers, Victoria"),
subtitle = "one hour average, µg/m3 of PM2.5",
caption = "data: B.C. Ministry of Environment and Climate Change Strategy") +
theme(axis.text.x=element_text(size=rel(0.75), angle=90),
axis.title = element_blank())



In the second version, the facets are placed horizontally, making comparisons on the Y axis clear. The smoke events in the four locations have been of very different magnitudes.

PM25_data %>%
filter(STATION_NAME %in% station_name) %>%
ggplot() +
geom_line(aes(x = DATE_PST, y = REPORTED_VALUE)) +
facet_grid(. ~ STATION_NAME) +
labs(title = glue("Air quality: Burns Lake, Prince George, Smithers, Victoria"),
subtitle = "one hour average, µg/m3 of PM2.5",
caption = "data: B.C. Ministry of Environment and Climate Change Strategy") +
theme(axis.text.x=element_text(size=rel(0.75), angle=90),
axis.title = element_blank())

These two plots show not only that Burns Lake and Prince George have had the most extreme smoke events, but that they have had sustained periods of poor air quality through the whole month. While the most extreme event in Smithers exceeds that of Victoria, there hasn't been a prolonged period of smoke in the air like the other three locations.

-30-

To leave a comment for the author, please follow the link and comment on their blog: Bayes Ball.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

Applying the “language hacking” approach to learning (and teaching) R and Python

Posted: 31 Aug 2018 06:03 AM PDT

(This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers)

Just after starting at Mango I made the decision to start learning Italian. I have always been interested in learning languages and I was really keen to go back to Italy, so I thought it would be something fun to do out of work. It turned out to have a much greater impact on my work than I expected - and not just because of project work based in Italy.

In the early stages of my learning, I read "Fluent in 3 Months" by Benny Lewis. If you have never heard the name before, Benny Lewis is a polyglot (someone who speaks multiple languages) who left school unable to speak anything other than English. After six months living in Spain and failing to learn the language, he switched his approach. In a matter of weeks, he was speaking Spanish with natives. Now he can speak a dozen languages to varying degrees. And this really got me thinking. Could his approach be applied to R and Python? Could we get more people engaged in the languages more quickly?

The "Language Hacking" Approach

To start let's consider the approach that Benny Lewis advocates for learning languages.

Part of the approach is to speak the language from the start, not a few days or weeks in, but on the first day. And to continue to speak the language every day. It's a simple but powerful idea. You arrange to have a conversation with a native speaker of the language and prepare a few sentences to have a basic conversation. It doesn't have to be long, it can literally be three or four sentences, but you are using the language from the start.

Combined with speaking from the start is the idea of "language hacking". This is really what makes the technique powerful. The idea that you don't need to know everything to be able to use a language. Think about that conversation you are having on day one. You won't know about how to conjugate verbs or all the rules of sentence structure or all the vocabulary, but you can certainly use a phrase book to find out how to ask "how are you?", "what is your name?" and respond "I am well", "my name is Aimee".

The fundamental concept of the approach that Benny Lewis proposes is to learn what you need to communicate right away. Don't spend months learning grammar and rules and hope that this will be enough to get by, just start to speak.

There are of course challenges to this approach, the biggest is the knowledge that you will make mistakes. This is typically the main blocker to language learning. Fear that you will make a mistake, but the reality is that nothing bad will happen if you try and get it wrong, generally the people you are talking to will helpfully point you in the right direction and you will learn from it. Once you get over this fear you can very quickly learn a language.

Language Hacking for R and Python

At the time that I read Benny Lewis' book, I was just starting to teach more and I was interested in whether it would be possible to teach R (and Python) this way. But what does language hacking mean for programming for data science?

The answer is simple, it means the same thing. If you want to "hack" learning R and Python for data science you focus on learning the code that you need to do what you need to do. Don't worry about the details of programming, put aside the ins and outs of functional or object oriented programming, forget the technical language. Just focus on getting things done.

For data scientists that typically means starting with a basic workflow. Your first "conversation" will typically be more along the lines of loading some data and generating some summaries. Let's think about that example for a moment.

Suppose that I am going to read the iris data from a csv file and find the mean sepal length. How much code does that take? Three or four lines. Do we need to spend hours or possibly days studying the rules of the language first or can we simply jump in with those lines? We can, and should, jump straight in and teach those three or four lines right at the start. Put yourself in the shoes of the learner. If after just minutes of learning you can see a result that is meaningful and useful – the chances are you will keep going and you will want to learn more. What's more is that you will start to experiment with that code. You will see how you can make changes to the code to find the mean of a different column, or maybe you will think about finding other summaries. You are no longer just learning rules of a language to be implemented, you are actively living the language.

You can very quickly build these "conversations" up to include grouping, performing common manipulation tasks and creating visualisations. In no time at all, you will be doing analytics, modelling and machine learning.

At Mango, we switched our R training to this approach around 3 years ago and we haven't looked back. Our trainers no longer teach programming the traditional way. They are all taught to teach the hacking approach, and they all come back from teaching with the same success stories. It took a little longer to convince our Python team that they should make the same change, but it is now the approach we take with all of our training. From a personal perspective, after years of avoiding Python because I didn't want to spend weeks learning to program, I was the first tester for the Python version of our training. Now I am comfortable with running some of my common analysis in Python, and whilst I still make mistakes and it takes me a bit longer than when I write R, I have finally got the confidence to consider Python as a solution as well as R and I can talk more confidently to my Python using colleagues.

In practical terms, I would strongly recommend focusing on the tidyverse for R and pandas for Python, with seaborn for graphics. These packages have been designed to make the tasks that we perform regularly with data easy and accessible, so if we are trying to hack our approach to learning and be able to use the languages quickly, why would we use anything else?

But What About Grammar?

You can get a long way in a language without the need to learn lots of grammar. Think about how you learned your native language, I don't remember being taught grammar when I started to speak but I could still communicate effectively. My friends are not actively teaching their pre-school aged children grammar. But they can communicate, and whilst it is not always the best way, they can get their message across. But eventually, to really master a language, you do need to get to grips with the grammar.

So those of you who are passionate about the detail of R or Python, who like the "best" way to do things, who want to promote programming paradigms and philosophies. Don't worry. There is still a place for this. It just doesn't come first, and it isn't necessary for everyone.

If all I want to do with Python is import data, and run some analytics, then I really don't need to worry about more than what I have achieved through language hacking. If I want to master it and be able to produce tools that are used by a wider community, then I do need to know more. The good news is that this is much easier for a programming language than a spoken one. There are rarely exceptions to rules for a start, and you don't have to learn an endless stream of tenses!

But we can do more to make even this accessible to learners. We can help them to understand the practical applications. We can focus on immediate needs rather than eventualities. We can provide constructive feedback that helps learners to develop their skills.

By making even the detail of a language interesting and accessible we ultimately end up with greater numbers of people who can speak the language and contribute to its success. But we must start with practical code that achieves a specific goal and leave the grammar for later.

 

 

 

To leave a comment for the author, please follow the link and comment on their blog: RBlog – Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

This posting includes an audio/video/photo media file: Download Now

How to Cite Packages

Posted: 30 Aug 2018 05:00 PM PDT

(This article was first published on Dominique Makowski, and kindly contributed to R-bloggers)

Citing the packages, modules and softwares you used for your analysis is important, both from a reproducibility perspective (statistical routines are often implemented in different ways by different packages, which could explain slight discrepancies in the results. Saying "I did this using this function from that package version 1.2.3� is a way of protecting yourself by being clear about what you have found doing what you have done) but also for acknowledging the work and time that people spent creating tools for others (sometimes at the expense of their own research).

  • That's great, but how to actually cite them?
  • I used about 100 packages, should I cite them all?

What should I cite?

Ideally, you should indeed cite all the packages that you used. However, it's not very diegetic. Therefore, I would recommand the following:

  1. Cite the main / important packages in the manuscript

This should be done for the packages that were central to your specific analysis (i.e., that got you the results that you reported) rather than data manipulation tools (even though these are as much important).

For example:

Statistics were done using R 3.5.0 (R Core Team, 2018), the rstanarm (v2.13.1; Gabry & Goodrich, 2016) and the psycho (v0.3.4; Makowski, 2018) packages. The full reproducible code is available in Supplementary Materials.

  1. Present everything in Supplementary Materials

Then, in Supplementary Materials, you show the packages and functions you used. Moreover, in R, you can include (usually at the end) every used package and their version using the sessionInfo() function.

How should I cite it?

Finding the right citation information is sometimes complicated. In R, this process is made quite easy, you simply run citation("packagename"). For instance, citation("dplyr"):

To cite ‘dplyr' in publications use:      Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2018). dplyr: A Grammar of Data Manipulation. R package version    0.7.6. https://CRAN.R-project.org/package=dplyr    A BibTeX entry for LaTeX users is      @Manual{,      title = {dplyr: A Grammar of Data Manipulation},      author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},      year = {2018},      note = {R package version 0.7.6},      url = {https://CRAN.R-project.org/package=dplyr},    }  

For other languages, such as Python or Julia, it might be a little trickier, but a quick search on google (or github) should provide you with all the necessary information (version, authors, date). It's better to have a slightly incomplete citation than no citation at all.

Previous blogposts

To leave a comment for the author, please follow the link and comment on their blog: Dominique Makowski.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Comments