[R-bloggers] The First Programming Design Pattern in pxWorks (and 4 more aRticles)

The First Programming Design Pattern in pxWorks
Video: How to Scale Shiny Dashboards
Hack: The ‘[‘ in R lists
BASIC XAI with DALEX— Part 1: Introduction
Hack: The “count(case when … else … end)” in dplyr

The First Programming Design Pattern in pxWorks

Posted: 18 Oct 2020 04:44 AM PDT

[This article was first published on gtdir, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

First of all, we need to explain a few things in more detail.

(Re)Introduction

pxWorks is an open source programming platform that enables the following, among other things:

Implement data-mining programming logic in a clear fashion by modelling code around the flow of data.
Use a mixture of any scripting languages in the same project seamlessly without introducing any intermediary code or any extra packages. (Our examples will be mostly written in R.)
Delegate code writing with an assurance that the written code will be transparent and easy to follow and to debug.
Create prototypes quickly and easily.

Programming Logic (Control Flow) in pxWorks

Any computer program can be represented by a graph. In pxWorks, graph nodes represent operations and graph edges represent the direction of control flow.

To enable programming loops without making logic complicated, the platform uses just two types of connections: unconditional and conditional.

The simplest program is the one that uses unconditional connections. Such connections are represented on the canvas by grey lines. The graph with only unconditional connections represents a simple program in which each node that has inputs waits until all the code blocks associated with connected inputs have been processed.

Nodes that have conditional inputs allow to introduce loops into control flow. Conditional connections are represented by magenta lines. Nodes that have both unconditional and conditional input connections wait for their turn to execute the code based on the following rule: either all the unconditionally connected nodes have been (re)calculated or at least one conditional node has been (re)calculated and generated an input file.

So in the first case, with unconditional links, the triggering of the code takes place regardless of whether an input file is generated for the dependent block (hence the execution is unconditional).

In the second case, with conditional links, the triggering of the code takes place only on condition that an input file has been generated after running the earlier block.

Even more details on this subject can be found here.

The First Design Pattern: Heartbeat

Before proceeding any further, you might want to get the example file here. (To run the example, you will need to unzip it and open in pxWorks.)

The first and simplest use case might be periodic retrieval (and processing) of some data using an R/Python/Julia/etc. or any mixture of these. We will use R.

To implement this design pattern we need a block that will initiate the control flow, let's call it 'init,' and a heartbeat block, which is simply a script that generates an output file and passes the control flow back to its own input socket. There is no need to generate the file every time, but for simplicity, we will keep regenerating it every time the script is run.
The heartbeat output can be linked to any number of blocks that need to be run after the heartbeat block. Without complicating the program with actual data retrieval and processing, for demonstration purposes, we will simply generate random numbers and plot them. When you run the script, to see the plot, simply click on the "graph" icon in the main menu. pxWorks should open a new window which will display the latest generated plot.

So the heartbeat block will keep running perpetually and will trigger scripts in dependent blocks.

To stop the heartbeat block, the generated file must be deleted and the script must stop generating the file.

In further posts, we will demonstrate other design patterns we use in our data analysis workflow. This first example already shows how simple it is to introduce programming logic using just two types of connections to model the control flow rather than multiple types of blocks as done in some other platforms.

Things become so much simpler. Instead of thinking about programming architecture, one becomes free to think about the data as programming complexity vanishes.

###

To leave a comment for the author, please follow the link and comment on their blog: gtdir.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post The First Programming Design Pattern in pxWorks first appeared on R-bloggers.

Video: How to Scale Shiny Dashboards

Posted: 18 Oct 2020 02:15 AM PDT

[This article was first published on r – Appsilon Data Science | End to End Data Science Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This presentation was a part of a joint virtual webinar with Appsilon and RStudio entitled "Enabling Remote Data Science Teams". Find a direct link to the presentation here.

How to Scale a Shiny App to Hundreds of Users

In this video, Appsilon's VP of the Board & Co-Founder Damian Rodziewicz explains best practices for scaling Shiny applications in production. Damian explains three of the areas that Appsilon focuses on to scale Shiny applications: Frontend Leveraging, Extracting Computations, and creating a stable and scalable Architecture.

R Shiny applications are fast by default but can become extremely slow if they are not properly built, especially when there are tens or hundreds of people using them. Having best practices in mind from the beginning of the project can save you a lot of trouble down the line.

Learn More: Why You Should Use R Shiny For Enterprise Application Development

Vertical and Horizontal Scaling

If you intend to scale your Shiny app, there are two concepts we need to explore: Vertical Scaling and Horizontal Scaling.

Horizontal and vertical scaling It's best to start with proper vertical scaling – you should make sure the application is fast and robust in the first place while running on a single machine, and then you can add as many machines as you want in an efficient way (horizontal scaling). With this in mind, let's return to our three previously mentioned areas: Leveraging Frontend, Extracting Computations, and Setting the Architecture.

Pillars of making Shiny apps faster

Below is a quick rundown of each area, but please reference the video presentation for a full explanation. Above all, it's important to Make the Shiny Layer Thin. This means that Shiny should only be doing the work that it's best at – creating an interface between R and your browser. The rest of the work (such as interactivity or long computations) should be offloaded to the browser or handled by the database, etc.

Leverage Frontend

Render inputs in UI and update them in Server – failing to do so requires re-rendering entire widgets, which makes the application run slower.
Run inline JavaScript – the package shinyjs allows you to do this. It's best used to make some quick toggles.
Set all actions in JavaScript – handle things like button clicks with JavaScript, not with Shiny.
Learn more about leveraging frontend in Shiny here.

Extract Computations

Remote API – the Plumber library is excellent for doing this. You rarely need the entire dataset when using the application, so why not filter it down first and then load only what you need when you need it. This logic is easily wrapped into a simple API.
Use a database – loading large files in memory isn't scalable for tens/hundreds/thousands of users. Using a database can dramatically improve the performance of Shiny apps.
Are heavy calculations freezing your Shiny app? Appsilon is developing the shiny.worker package to address this problem. shiny.worker is currently still under development, but it can be made available to clients and non-profit organizations on request.

Architecture

RStudio Connect and Shiny Server Open Source allows you to deploy applications quickly.
We use Ansible – to provision the whole infrastructure, install requirements, RStudio Connect, and deploy the application.
Learn more about the options for deploying Shiny apps here.

Learn more

Appsilon is an RStudio Full Service Certified Partner. We are global leaders in Shiny and we specialize in advanced enterprise Shiny apps for Fortune 500 companies. Reach out to us at hello@appsilon.com.

Article Video: How to Scale Shiny Dashboards comes from Appsilon Data Science | End to End Data Science Solutions.

To leave a comment for the author, please follow the link and comment on their blog: r – Appsilon Data Science | End to End Data Science Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Video: How to Scale Shiny Dashboards first appeared on R-bloggers.

This posting includes an audio/video/photo media file: Download Now

Hack: The ‘[‘ in R lists

Posted: 17 Oct 2020 08:43 PM PDT

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Assume that you have a list and you want to get the n-th element of each component or generally to subset the list. You can use the command sapply(list, "[", c(1,2,3,..))

Let's see this in practice.

  mylist<-list(id<-1:10,               gender<-c("m","m","m","f","f","f","m","f","f","f"),               amt<-c(5,20,30,10,20,50,5,20,10,30)               )    mylist

Output:

> mylist  [[1]]   [1]  1  2  3  4  5  6  7  8  9 10    [[2]]   [1] "m" "m" "m" "f" "f" "f" "m" "f" "f" "f"    [[3]]   [1]  5 20 30 10 20 50  5 20 10 30

Let's say that we want to get the 3rd and 6th element of the list:

  sapply(mylist, "[", c(3,6))

Output:

     [,1] [,2] [,3]  [1,] "3"  "m"  "30"  [2,] "6"  "f"  "50"

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Hack: The '[' in R lists first appeared on R-bloggers.

This posting includes an audio/video/photo media file: Download Now

BASIC XAI with DALEX— Part 1: Introduction

Posted: 17 Oct 2020 08:24 PM PDT

[This article was first published on R in ResponsibleML on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

BASIC XAI

Introduction to model exploration with code examples for R and Python

Hello!

Welcome to "BASIC XAI with DALEX" series.

In this post, we will take a closer look at some algorithms used in explainable artificial intelligence. You will find here an introduction to methods of global and local model evaluation. Each description will include a technical introduction, example analysis, and code in R and Python.

So, shall we start?

First — why should I use XAI?

Nowadays, the quick and dirty approach to develop a predictive model is to try a large number of different ML algorithms and choose the single result that maximizes some validation criteria. This often results in complex models called black boxes. Why? Sometimes these elastic algorithms find models with greater predictive power, sometimes they can detect tricky relationships between variables, and sometimes all models are of similar performance but there are more complex ones so they are more often selected.

But there is a price to pay in this quick and dirty scheme. When we choose complex yet elastic models, we often lose the interpretability of them. To understand what decisions are made by the trained model, algorithms and tools are being developed to help human experts to understand how models are working. There is plenty of methods developed under the explainable artificial intelligence (XAI) umbrella that can be used to explain or explore complex models.

Second — which to choose: global vs local?

A growing number of tools for explanation are emerging because different stakeholders have different needs.

Global explanations are those that describe model behavior on the whole data set. This allows us to deduce how the model behaves generally/ usually/ on average.

Local explanations, on the other hand, refer to a single prediction, to a specific client/property/patient on which model operates. Usually, local explanations show which and how different variables contribute to the model prediction.

These differences are shown in the XAI pyramid below. The left part of the pyramid corresponds to the assessment of a single observation and the right part to the whole model. We can ask various questions about the model. On the left are questions related to a specific prediction. On the right are questions about the model in general.

From the top, we start with more general questions that can be answered with a single number or few numbers, like what is the predictive performance of the model (this can be summarised with a single number like AUC or RMSE), or a prediction value for a single observation (a single number). The following levels refer to the more and more specific methods, which we will discuss in this basic XAI series.

Biecek, P. and Burzykowski, T. **Explanatory Model Analysis**

Third — let's get a model in R and Python

In this example, we will use the apartments dataset (collected in Warsaw, available in DALEX package in R and Python). The data set describes 1000 apartments with six variables such as surface, floor, no.rooms, construction.year, m2.price, and district. We will create a model that predicts the price of an apartment, so let's start with a black box regression model — random forest. The package that we will use in these examples is DALEX.

Below we have the code in Python and R, which allows us to transform the data, build a model, and explainer. The explainer is an object/adapter that wraps the model and creates a uniform structure and interface for operations.

<a href="https://medium.com/media/28adf9e4153413905cccacf6b2cd8fe1/href" rel="nofollow" target="_blank">https://medium.com/media/28adf9e4153413905cccacf6b2cd8fe1/href</a>

If you want it, you can use ready-made objects prepared by us, you can find here.

In the next part, we will learn about a method for global variable importance — Permutational Variable Importance.

Many thanks to Przemyslaw Biecek and Jakub Wiśniewski for their support on this blog.

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

In order to see more R related content visit https://www.r-bloggers.com

BASIC XAI with DALEX— Part 1: Introduction was originally published in ResponsibleML on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: R in ResponsibleML on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post BASIC XAI with DALEX— Part 1: Introduction first appeared on R-bloggers.

This posting includes an audio/video/photo media file: Download Now

Hack: The “count(case when … else … end)” in dplyr

Posted: 17 Oct 2020 08:13 PM PDT

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When I run quires in SQL (or even HiveQL, Spark SQL and so on), it is quite common to use the syntax of count(case when.. else ... end). Today, I will provide you an example of how you run this type of commands in dplyr.

Let's start:

  library(sqldf)  library(dplyr)      df<-data.frame(id = 1:10,                 gender = c("m","m","m","f","f","f","m","f","f","f"),                 amt= c(5,20,30,10,20,50,5,20,10,30))    df

Let's get the count and the sum per gender in different columns in SQL.

  sqldf("select count(case when gender='m' then id else null end) as male_cnt,                count(case when gender='f' then id else null end) as female_cnt,                sum(case when gender='m' then amt else 0 end) as male_amt,                sum(case when gender='f' then amt else 0 end) as female_amt                from df")

Output:

  male_cnt female_cnt male_amt female_amt  1        4          6       60        140

Let's get the same output in dplyr. We will need to subset the data frame based on one column.

  df%>%summarise(male_cnt=length(id[gender=="m"]),                 female_cnt=length(id[gender=="f"]),                 male_amt=sum(amt[gender=="m"]),                 female_amt=sum(amt[gender=="f"])                 )

Output:

  male_cnt female_cnt male_amt female_amt  1        4          6       60        140

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Hack: The "count(case when … else … end)" in dplyr first appeared on R-bloggers.

This posting includes an audio/video/photo media file: Download Now

THE BENEFIT

Search This Blog