[R-bloggers] ANOVA vs Multiple Comparisons (and 8 more aRticles) |
- ANOVA vs Multiple Comparisons
- The Shift and Balance Fallacies
- Benford’s law meets IPL, Intl. T20 and ODI cricket
- 10 Must-Know Tidyverse Features!
- My year in R
- 2 Months in 2 Minutes – rOpenSci News, October 2020
- Overengineering in ML – business life is not a Kaggle competition
- Zoom talk on “Organising exams in R” from the Grenoble R user group
- Climate Change & AI for GOOD | Online Open Forum Oct 15th
Posted: 15 Oct 2020 05:56 AM PDT
[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. When we run an ANOVA, we analyze the differences among group means in a sample. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. ANOVA Null and Alternatve HypothesisThe null hypothesis in ANOVA is that there is no difference between means and the alternative is that the means are not all equal. \(H_0: \mu _1= \mu _2=…= \mu _K \) This means that when we are dealing with many groups, we cannot compare them pairwise. We can simply answer if the means between groups can be considered as equal or not. Tukey's HSDWhat about if we want to compare all the groups pairwise? In this case, we can apply the Tukey's HSD which is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other. Example of ANOVA vs Tukey's HSDLet's assume that we are dealing with the following 4 groups:
Clearly, we were expecting the ANOVA to reject to Null Hypothesis but we would also to know that the Group a and Group b are not statistically different and the same with the Group c and Group d Let's work in R: library(multcomp) library(tidyverse) # Create the four groups set.seed(10) df1 <- data.frame(Var="a", Value=rnorm(100,10,5)) df2 <- data.frame(Var="b", Value=rnorm(100,10,5)) df3 <- data.frame(Var="c", Value=rnorm(100,11,6)) df4 <- data.frame(Var="d", Value=rnorm(100,11,6)) # merge them in one data frame df<-rbind(df1,df2,df3,df4) # convert Var to a factor df$Var<-as.factor(df$Var) df%>%ggplot(aes(x=Value, fill=Var))+geom_density(alpha=0.5) ANOVA# ANOVA model1<-lm(Value~Var, data=df) anova(model1) Output: Analysis of Variance Table Response: Value Df Sum Sq Mean Sq F value Pr(>F) Var 3 565.7 188.565 6.351 0.0003257 *** Residuals 396 11757.5 29.691 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Clearly, we reject the null hypothesis since the p-value is 0.0003257 Tukey's HSDLet's apply the Tukey HSD test to test all the means. # Tukey multiple comparisons summary(glht(model1, mcp(Var="Tukey"))) Output: Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = Value ~ Var, data = df) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) b - a == 0 0.2079 0.7706 0.270 0.99312 c - a == 0 1.8553 0.7706 2.408 0.07727 . d - a == 0 2.8758 0.7706 3.732 0.00129 ** c - b == 0 1.6473 0.7706 2.138 0.14298 d - b == 0 2.6678 0.7706 3.462 0.00329 ** d - c == 0 1.0205 0.7706 1.324 0.54795 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Adjusted p values reported -- single-step method) As we can see from the output above, the difference between c vs a and c vs b found not be statistically significant although they are from different distributions. The reason for that is the "issue" with the t-test a vs ct.test(df%>%filter(Var=="a")%>%pull(), df%>%filter(Var=="c")%>%pull()) Output: Welch Two Sample t-test data: df %>% filter(Var == "a") %>% pull() and df %>% filter(Var == "c") %>% pull() t = -2.4743, df = 189.47, p-value = 0.01423 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3343125 -0.3761991 sample estimates: mean of x mean of y 9.317255 11.172511 t-test b vs c t.test(df%>%filter(Var=="b")%>%pull(), df%>%filter(Var=="c")%>%pull()) Output: Welch Two Sample t-test data: df %>% filter(Var == "b") %>% pull() and df %>% filter(Var == "c") %>% pull() t = -2.1711, df = 191.53, p-value = 0.03115 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.1439117 -0.1507362 sample estimates: mean of x mean of y 9.525187 11.172511 As we can see from above, the means of the two groups, in both cases, found to be statistically significant, if we ignore the multiple comparisons. DiscussionWhen we are dealing with multiple comparisons and we want to apply pairwise comparisons, then Tukey's HSD is a good option. Another approach is to consider the P-Value Adjustments. You can also have a look at how you can consider the multiple comparisons in A/B/n Testing To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post ANOVA vs Multiple Comparisons first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||
The Shift and Balance Fallacies Posted: 15 Oct 2020 01:06 AM PDT
[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Two related fallacies I see in machine learning practice are the shift and balance fallacies (for an earlier simple fallacy, please see here). They involve thinking logistic regression has a bit simpler structure that it actually does, and also thinking logistic regression is a bit less powerful than it actually is. The fallacies are somewhat opposite: the first fallacy is shifting or re-weighting data doesn't change much, and the second is that re-balancing is a necessary pre-processing step. As the two ideas seem to contradict each other it would be odd if they were both true. In fact we are closer to both being false. The shift fallacyThe shift fallacy is as follows. We fit two models This is easy to disprove in R. library(wrapr) # build our example data # modeling y as a function of x1 and x2 (plus intercept) d <- wrapr::build_frame( "x1" , "x2", "y" | 0 , 0 , 0 | 0 , 0 , 0 | 0 , 1 , 1 | 1 , 0 , 0 | 1 , 0 , 0 | 1 , 0 , 1 | 1 , 1 , 0 ) knitr::kable(d)
First we fit the model with each data-row having the same weight. m <- glm( y ~ x1 + x2, data = d, family = binomial()) m$coefficients ## (Intercept) x1 x2 ## -1.2055937 -0.3129307 1.3620590 Now we build a balanced weighting. We are up-sampling both classes so we don't have any fractional weights (fractional weights are fine, but they trigger a warning in w <- ifelse(d$y == 1, sum(1 - d$y), sum(d$y)) w ## [1] 2 2 5 2 2 5 2 # confirm prevalence is 0.5 under this weighting sum(w * d$y) / sum(w) ## [1] 0.5 Now we fit the model for the balanced data situation. m_shift <- glm( y ~ x1 + x2, data = d, family = binomial(), weights = w) m_shift$coefficients ## (Intercept) x1 x2 ## -0.5512784 0.1168985 1.4347723 Notice that all of the coefficients changed, not just the intercept term. And we have thus demonstrated the shift fallacy. The balance fallacyAn additional point is: the simple model without re-weighting is the better model on this training data. There appears to be an industry belief that to work with unbalanced classes one must re-balance the data. In fact moving to "balanced data" doesn't magically improve the model quality, what it does is helps hide some of the bad consequences of using classification rules instead of probability models (please see here for some discussion). For instance our original model has the following statistical deviance (lower is better): deviance <- function(prediction, truth) { -2 * sum(truth * log(prediction) + (1 - truth) * log(1 - prediction)) } deviance( prediction = predict(m, newdata = d, type = 'response'), truth = d$y) ## [1] 7.745254 And our balanced model has a worse deviance. deviance( prediction = predict(m_shift, newdata = d, type = 'response'), truth = d$y) ## [1] 9.004022 Part of this issue is that the balanced model is scaled wrong. It's average prediction is, by design, inflated. mean(predict(m_shift, newdata = d, type = 'response')) ## [1] 0.4784371 Whereas, the original model average to the same as the average of the truth values (a property of logistic regression). mean(predict(m, newdata = d, type = 'response')) ## [1] 0.2857143 mean(d$y) ## [1] 0.2857143 So let's adjust the balanced predictions back to the correct expected value (essentially Platt scaling). d$balanced_pred <- predict(m_shift, newdata = d, type = 'link') m_scale <- glm( y ~ balanced_pred, data = d, family = binomial()) corrected_balanced_pred <- predict(m_scale, newdata = d, type = 'response') mean(corrected_balanced_pred) ## [1] 0.2857143 We now have a prediction with the correct expected value. However, notice this deviance is still larger than the simple un-weighted original model. deviance( prediction = corrected_balanced_pred, truth = d$y) ## [1] 7.803104 Our opinion is: re-weighting or re-sampling data for a logistic regression is pointless. The fitting procedure deals with un-balanced data quite well, and doesn't need any attempt at help. We think this sort of re-weighting and re-sampling introduces complexity, the possibility of data-leaks with up-sampling, and a loss of statistical efficiency with down-sampling. Likely the re-sampling fallacy is driven by a need to move model scores to near ConclusionSome tools, such as logistic regression, work best on training data that accurately represents the distributions facts of problem, and do not require artificially balanced training data. Also, re-balancing training data is a bit more involved than one might think, as we see more than just the intercept term changes when we re-balance data. Take logistic regression as the entry level probability model for classification problems. If it doesn't need data re-balancing then other any tool claiming to be universally better than it should also not need artificial re-balancing (though if they are internally using classification rule metrics, some hyper-parameters or internal procedures may need to be adjusted). Prevalence re-balancing is working around mere operational issues: such as using classification rules (instead of probability models), using sub-optimal metrics (such as accuracy). However, there operational issues are better directly corrected than worked around. A lot of the complexity we see in modern machine learning pipelines is patches patching unwanted effects of previous patches. (The source for this article can be found here, and a rendering of it here.) To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post The Shift and Balance Fallacies first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||
Benford’s law meets IPL, Intl. T20 and ODI cricket Posted: 15 Oct 2020 12:28 AM PDT
[This article was first published on R – Giga thoughts …, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. "To grasp how different a million is from a billion, think about it like this: A million seconds is a little under two weeks; a billion seconds is about thirty-two years." "One of the pleasures of looking at the world through mathematical eyes is that you can see certain patterns that would otherwise be hidden." Steven Strogatz, Prof at Cornell University IntroductionWithin the last two weeks, I was introduced to Benford's Law by 2 of my friends. Initially, I looked it up and Google and was quite intrigued by the law. Subsequently another friends asked me to check the 'Digits' episode, from the "Connected" series on Netflix by Latif Nasser, which I strongly recommend you watch. Benford's Law also called the Newcomb–Benford law, the law of anomalous numbers, or the First Digit Law states that, when dealing with quantities obtained from Nature, the frequency of appearance of each digit in the first significant place is logarithmic. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30.1% of the time, the number 2 about 17.6%, number 3 about 12.5% all the way to the number 9 at 4.6%. This interesting logarithmic pattern is observed in most natural datasets from population densities, river lengths, heights of skyscrapers, tax returns etc. What is really curious about this law, is that when we measure the lengths of rivers, the law holds perfectly regardless of the units used to measure. So the length of the rivers would obey the law whether we measure in meters, feet, miles etc. There is something almost mystical about this law. The law has also been used widely to detect financial fraud, manipulations in tax statements, bots in twitter, fake accounts in social networks, image manipulation etc. In this age of deep fakes, the ability to detect fake images will assume paramount importance. While deviations from Benford Law do not always signify fraud, to large extent they point to an aberration. Prof Nigrini, of Cape Town used this law to identify financial discrepancies in Enron's financial statement resulting in the infamous scandal. Also the 2009 Iranian election was found to be fradulent as the first digit percentages did not conform to those specified by Benford's Law. While it cannot be said with absolute certainty, marked deviations from Benford's law could possibly indicate that there has been manipulation of natural processes. Possibly Benford's law could be used to detect large scale match-fixing in cricket tournaments. However, we cannot look at this in isolation and the other statistical and forensic methods may be required to determine if there is fraud. Here is an interesting paper Promises and perils of Benford's law A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, …, 9}) occurs with probability
This law also works for number in other bases, in base b >=2 Interestingly, this law also applies to sports on the number of point scored in basketball etc. I was curious to see if this applied to cricket. Previously, using my R package yorkr, I had already converted all T20 data and ODI data from Cricsheet which is available at yorkrData2020, I wanted to check if Benford's Law worked on the runs scored, or deliveries faced by batsmen at team level or at a tournament level (IPL, Intl. T20 or ODI). Thankfully, R has a package benford.analysis to check for data behaviour in accordance to Benford's Law, and I have used this package in my post This post is also available in RPubs as Benford's Law meets IPL, Intl. T20 and ODI library(data.table) library(reshape2) library(dplyr) library(benford.analysis) library(yorkr) In this post, I have randomly check data with Benford's law. The fully converted dataset is available in yorkrData2020 which I have included above. You can try on any dataset including ODI (men,women),Intl T20(men,women),IPL,BBL,PSL,NTB and WBB. 1. Check the runs distribution by Royal Challengers BangaloreWe can see the behaviour is as expected with Benford's law, with minor deviations load("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails/Royal Challengers Bangalore-BattingDetails.RData") rcbRunsTrends = benford(battingDetails$runs, number.of.digits = 1, discrete = T, sign = "positive") rcbRunsTrends ## ## Benford object: ## ## Data: battingDetails$runs ## Number of observations used = 1205 ## Number of obs. for second order = 99 ## First digits analysed = 1 ## ## Mantissa: ## ## Statistic Value ## Mean 0.458 ## Var 0.091 ## Ex.Kurtosis -1.213 ## Skewness -0.025 ## ## ## The 5 largest deviations: ## ## digits absolute.diff ## 1 1 14.26 ## 2 7 13.88 ## 3 9 8.14 ## 4 6 5.33 ## 5 4 4.78 ## ## Stats: ## ## Pearson's Chi-squared test ## ## data: battingDetails$runs ## X-squared = 5.2091, df = 8, p-value = 0.735 ## ## ## Mantissa Arc Test ## ## data: battingDetails$runs ## L2 = 0.0022852, df = 2, p-value = 0.06369 ## ## Mean Absolute Deviation (MAD): 0.004941381 ## MAD Conformity - Nigrini (2012): Close conformity ## Distortion Factor: -18.8725 ## ## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values! 1a. Plot trendsNote: The Digits Distribution plot, is the plot of interest. The second order Digits Distribution is a relatively new test and is based on sorting the data and plotting the differences. The test can be applied to any data set and nonconformity usually signals an unusual issue related to data integrity. Anyway, Benford's Law applies only to the first Digits Distribution plot. For a deeper analysis, the other plots besides other statistical tests may be required. There are other approaches to determine anamolies. I would assume, an easy way is to use Benford's law and progressively dig deeper. plot(rcbRunsTrends)
2. Check the 'balls played' distribution by Royal Challengers Bangaloreload("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails/Royal Challengers Bangalore-BattingDetails.RData") rcbBallsPlayedTrends = benford(battingDetails$ballsPlayed, number.of.digits = 1, discrete = T, sign = "positive") plot(rcbBallsPlayedTrends)
3. Check the runs distribution by Chennai Super KingsThe trend seems to deviate from the expected behavior to some extent in the number of digits for 5 & 7. load("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails/Chennai Super Kings-BattingDetails.RData") cskRunsTrends = benford(battingDetails$runs, number.of.digits = 1, discrete = T, sign = "positive") cskRunsTrends ## ## Benford object: ## ## Data: battingDetails$runs ## Number of observations used = 1054 ## Number of obs. for second order = 94 ## First digits analysed = 1 ## ## Mantissa: ## ## Statistic Value ## Mean 0.466 ## Var 0.081 ## Ex.Kurtosis -1.100 ## Skewness -0.054 ## ## ## The 5 largest deviations: ## ## digits absolute.diff ## 1 5 27.54 ## 2 2 18.40 ## 3 1 17.29 ## 4 9 14.23 ## 5 7 14.12 ## ## Stats: ## ## Pearson's Chi-squared test ## ## data: battingDetails$runs ## X-squared = 22.862, df = 8, p-value = 0.003545 ## ## ## Mantissa Arc Test ## ## data: battingDetails$runs ## L2 = 0.002376, df = 2, p-value = 0.08173 ## ## Mean Absolute Deviation (MAD): 0.01309597 ## MAD Conformity - Nigrini (2012): Marginally acceptable conformity ## Distortion Factor: -17.90664 ## ## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values! 3a. Plot the trendsplot(cskRunsTrends)
##3b. Check details of suspicious behavious Interestingly the package benford.analysis has functions to get details of data which result in the deviation. The getSuspects() function returns data which are 'suspicious'. We probably need to look at aberrations with a pinch of salt. A look at the statistical distribution and other investigation would need to be carried out to determine the cause. suspects <- getSuspects(cskRunsTrends, battingDetails) suspects ## batsman ballsPlayed fours sixes runs strikeRate bowler ## 1: JA Morkel 18 1 2 29 161.11 nobody ## 2: MS Dhoni 31 2 0 23 74.19 Shahid Afridi ## 3: MS Dhoni 22 1 1 22 100.00 Shoaib Ahmed ## 4: SK Raina 19 1 2 25 131.58 nobody ## 5: MS Dhoni 37 6 1 58 156.76 nobody ## --- ## 311: MS Dhoni 12 3 1 25 208.33 nobody ## 312: SK Raina 43 5 2 54 125.58 nobody ## 313: Harbhajan Singh 8 0 0 2 25.00 nobody ## 314: SK Raina 13 4 0 22 169.23 nobody ## 315: AT Rayudu 21 2 0 25 119.05 nobody ## wicketFielder wicketKind wicketPlayerOut date ## 1: nobody notOut notOut 2008-05-06 ## 2: Shahid Afridi caught MS Dhoni 2008-05-06 ## 3: Shoaib Ahmed caught MS Dhoni 2009-04-27 ## 4: nobody caught and bowled SK Raina 2009-04-27 ## 5: nobody notOut notOut 2009-05-04 ## --- ## 311: nobody notOut notOut 2018-04-22 ## 312: nobody notOut notOut 2018-04-22 ## 313: nobody notOut notOut 2018-05-22 ## 314: nobody bowled SK Raina 2018-05-22 ## 315: nobody notOut notOut 2019-04-17 ## venue opposition ## 1: MA Chidambaram Stadium, Chepauk Deccan Chargers ## 2: MA Chidambaram Stadium, Chepauk Deccan Chargers ## 3: Kingsmead Deccan Chargers ## 4: Kingsmead Deccan Chargers ## 5: Buffalo Park Deccan Chargers ## --- ## 311: Rajiv Gandhi International Stadium, Uppal Sunrisers Hyderabad ## 312: Rajiv Gandhi International Stadium, Uppal Sunrisers Hyderabad ## 313: Wankhede Stadium Sunrisers Hyderabad ## 314: Wankhede Stadium Sunrisers Hyderabad ## 315: Rajiv Gandhi International Stadium, Uppal Sunrisers Hyderabad ## winner result ## 1: Deccan Chargers NA ## 2: Deccan Chargers NA ## 3: Deccan Chargers NA ## 4: Deccan Chargers NA ## 5: Chennai Super Kings NA ## --- ## 311: Chennai Super Kings NA ## 312: Chennai Super Kings NA ## 313: Chennai Super Kings NA ## 314: Chennai Super Kings NA ## 315: Sunrisers Hyderabad NA 4. Check runs distribution in all of Indian Premier League (IPL)battingDF <- NULL teams <-c("Chennai Super Kings","Deccan Chargers","Delhi Daredevils", "Kings XI Punjab", 'Kochi Tuskers Kerala',"Kolkata Knight Riders", "Mumbai Indians", "Pune Warriors","Rajasthan Royals", "Royal Challengers Bangalore","Sunrisers Hyderabad","Gujarat Lions", "Rising Pune Supergiants") setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails") for(team in teams){ battingDetails <- NULL val <- paste(team,"-BattingDetails.RData",sep="") print(val) tryCatch(load(val), error = function(e) { print("No data1") setNext=TRUE } ) details <- battingDetails battingDF <- rbind(battingDF,details) } ## [1] "Chennai Super Kings-BattingDetails.RData" ## [1] "Deccan Chargers-BattingDetails.RData" ## [1] "Delhi Daredevils-BattingDetails.RData" ## [1] "Kings XI Punjab-BattingDetails.RData" ## [1] "Kochi Tuskers Kerala-BattingDetails.RData" ## [1] "Kolkata Knight Riders-BattingDetails.RData" ## [1] "Mumbai Indians-BattingDetails.RData" ## [1] "Pune Warriors-BattingDetails.RData" ## [1] "Rajasthan Royals-BattingDetails.RData" ## [1] "Royal Challengers Bangalore-BattingDetails.RData" ## [1] "Sunrisers Hyderabad-BattingDetails.RData" ## [1] "Gujarat Lions-BattingDetails.RData" ## [1] "Rising Pune Supergiants-BattingDetails.RData" trends = benford(battingDF$runs, number.of.digits = 1, discrete = T, sign = "positive") trends ## ## Benford object: ## ## Data: battingDF$runs ## Number of observations used = 10129 ## Number of obs. for second order = 123 ## First digits analysed = 1 ## ## Mantissa: ## ## Statistic Value ## Mean 0.4521 ## Var 0.0856 ## Ex.Kurtosis -1.1570 ## Skewness -0.0033 ## ## ## The 5 largest deviations: ## ## digits absolute.diff ## 1 2 159.37 ## 2 9 121.48 ## 3 7 93.40 ## 4 8 83.12 ## 5 1 61.87 ## ## Stats: ## ## Pearson's Chi-squared test ## ## data: battingDF$runs ## X-squared = 78.166, df = 8, p-value = 1.143e-13 ## ## ## Mantissa Arc Test ## ## data: battingDF$runs ## L2 = 5.8237e-05, df = 2, p-value = 0.5544 ## ## Mean Absolute Deviation (MAD): 0.006627966 ## MAD Conformity - Nigrini (2012): Acceptable conformity ## Distortion Factor: -20.90333 ## ## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values! 4b. Plot trends all of IPLWe can see that the trend follows quite closely to Benford's curve for all of IPL plot(trends)
5. Check Benford's law in India matchessetwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20BattingBowlingDetails") load("India-BattingDetails.RData") indiaTrends = benford(battingDetails$runs, number.of.digits = 1, discrete = T, sign = "positive") plot(indiaTrends)
6. Check Benford's law in all of Intl. T20setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20BattingBowlingDetails") teams <-c("Australia","India","Pakistan","West Indies", 'Sri Lanka', "England", "Bangladesh","Netherlands","Scotland", "Afghanistan", "Zimbabwe","Ireland","New Zealand","South Africa","Canada", "Bermuda","Kenya","Hong Kong","Nepal","Oman","Papua New Guinea", "United Arab Emirates","Namibia","Cayman Islands","Singapore", "United States of America","Bhutan","Maldives","Botswana","Nigeria", "Denmark","Germany","Jersey","Norway","Qatar","Malaysia","Vanuatu", "Thailand") for(team in teams){ battingDetails <- NULL val <- paste(team,"-BattingDetails.RData",sep="") print(val) tryCatch(load(val), error = function(e) { print("No data1") setNext=TRUE } ) details <- battingDetails battingDF <- rbind(battingDF,details) } intlT20Trends = benford(battingDF$runs, number.of.digits = 1, discrete = T, sign = "positive") intlT20Trends ## ## Benford object: ## ## Data: battingDF$runs ## Number of observations used = 21833 ## Number of obs. for second order = 131 ## First digits analysed = 1 ## ## Mantissa: ## ## Statistic Value ## Mean 0.447 ## Var 0.085 ## Ex.Kurtosis -1.158 ## Skewness 0.018 ## ## ## The 5 largest deviations: ## ## digits absolute.diff ## 1 2 361.40 ## 2 9 276.02 ## 3 1 264.61 ## 4 7 210.14 ## 5 8 198.81 ## ## Stats: ## ## Pearson's Chi-squared test ## ## data: battingDF$runs ## X-squared = 202.29, df = 8, p-value < 2.2e-16 ## ## ## Mantissa Arc Test ## ## data: battingDF$runs ## L2 = 5.3983e-06, df = 2, p-value = 0.8888 ## ## Mean Absolute Deviation (MAD): 0.007821098 ## MAD Conformity - Nigrini (2012): Acceptable conformity ## Distortion Factor: -24.11086 ## ## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values! 5a. Plot trendsplot(intlT20Trends)
6. Check Benford's law in ODIThis plot also nicely follows the Benford's predicted curve setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiBattingBowlingDetails") teams <-c("Australia","India","Pakistan","West Indies", 'Sri Lanka', "England", "Bangladesh","Netherlands","Scotland", "Afghanistan", "Zimbabwe","Ireland","New Zealand","South Africa","Canada", "Bermuda","Kenya","Hong Kong","Nepal","Oman","Papua New Guinea", "United Arab Emirates","Namibia","Cayman Islands","Singapore", "United States of America","Bhutan","Maldives","Botswana","Nigeria", "Denmark","Germany","Jersey","Norway","Qatar","Malaysia","Vanuatu", "Thailand") battingDF<-NULL for(team in teams){ battingDetails <- NULL val <- paste(team,"-BattingDetails.RData",sep="") print(val) tryCatch(load(val), error = function(e) { print("No data1") setNext=TRUE } ) details <- battingDetails battingDF <- rbind(battingDF,details) } odiTrends = benford(battingDF$runs, number.of.digits = 1, discrete = T, sign = "positive") odiTrends ## ## Benford object: ## ## Data: battingDF$runs ## Number of observations used = 23766 ## Number of obs. for second order = 179 ## First digits analysed = 1 ## ## Mantissa: ## ## Statistic Value ## Mean 0.468 ## Var 0.089 ## Ex.Kurtosis -1.204 ## Skewness -0.069 ## ## ## The 5 largest deviations: ## ## digits absolute.diff ## 1 5 240.18 ## 2 4 190.84 ## 3 9 177.47 ## 4 8 157.69 ## 5 1 66.28 ## ## Stats: ## ## Pearson's Chi-squared test ## ## data: battingDF$runs ## X-squared = 100.07, df = 8, p-value < 2.2e-16 ## ## ## Mantissa Arc Test ## ## data: battingDF$runs ## L2 = 0.002845, df = 2, p-value < 2.2e-16 ## ## Mean Absolute Deviation (MAD): 0.004609365 ## MAD Conformity - Nigrini (2012): Close conformity ## Distortion Factor: -14.92332 ## ## Remember: Real data will never conform perfectly to Benford's Law. You should not focus on p-values! Plot trendsplot(odiTrends)
The data for other formats is available at yorkrData2020. Feel free to try it out yourself. ConclusionMaths rules our lives, more than we are aware, more that we like to admit. It is there in all of nature. Whether it is the recursive patterns of Mandelbrot sets, the intrinsic notion of beauty through the golden ratio, the murmuration of swallows, the synchronous blinking of fireflies or in the almost univerality of Benford's law on natural datasets, mathematics govern us. Isn't it strange that while we humans pride ourselves of freewill, the runs scored by batsmen in particular formats conform to Benford's rule for the first digits. It almost looks like, the runs that will be scored is almost to extent predetermined to fall within specified ranges obeying Benford's law. So much for choice. Something to be pondered over! Also see
To leave a comment for the author, please follow the link and comment on their blog: R – Giga thoughts …. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Benford's law meets IPL, Intl. T20 and ODI cricket first appeared on R-bloggers. | ||||||||||||||||||||||||
10 Must-Know Tidyverse Features! Posted: 14 Oct 2020 01:00 PM PDT
[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. R Tutorials UpdateInterested in more R tutorials? Learn more R tips:
Register for our blog to get new articles as we release them. Tidyverse UpdatesThere is no doubt that the This means that there are new methods available in the It's incumbent on any analyst to stay up to date with new methods. This post covers ten examples of approaches to common data tasks that are better served by the latest First let's load our The dataset presents several observations of anatomical parts of penguins of different species, sexes and locations, and the year that the measurements were taken. 1. Selecting columns
A full set of tidyselect helper functions can be found in the documentation here. 2. Reordering columns
Similar to
3. Controlling mutated column locationsNote in the 4. Transforming from wide to longThe 5. Transforming from long to wideIt's just as easy to move back from long to wide. 6. Running group statistics across multiple columns
7. Control output columns names when summarising columnsThe columns in 8. Running models across subsetsThe output of If we wanted to run a model for each species you could do it like this: It's not usually that useful to keep model objects in a dataframe, but we could use other tidy-oriented packages to summarize the statistics of the models and return them all as nicely integrated dataframes: 9. Nesting dataOften we have to work with subsets, and it can be useful to apply a common function across all subsets of the data. For example, maybe we want to take a look at our different species of penguins and make some different graphs of them. Grouping based on subsets would previously be achieved by the following somewhat awkward combination of The new function The nested data will be stored in a column called 10. Graphing across subsetsArmed with Now we can easily display the different scatter plots to show, for example, that our penguins exemplify Simpson's Paradox: Author: Jim Gruman, Data Analytics Leader Serving enterprise needs with innovators in mobile power, decision intelligence, and product management, Jim can be found at https://jimgruman.netlify.app.
To leave a comment for the author, please follow the link and comment on their blog: business-science.io. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post 10 Must-Know Tidyverse Features! first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||
Posted: 14 Oct 2020 11:00 AM PDT
[This article was first published on R on Amit Levinson, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Learning R for a little over a year now was and still is a great experience. But a year isn't a lot, so why make a blog post about it? I believe that pausing what one is doing and periodically evaluating if this pursuit is the right direction for him or her – is a healthy process. Doing so can help you acknowledge your accomplishments and think of where you're heading. And I'm glad I have the opportunity to do it in the following post. The JourneyI was wondering how to summarize my past year: A list of resources? a story? a listicle? I decided to go with more of an item-list somewhat chronologically ordered that I believe captures my experience. Of course like a lot of many other things in life, the timeline discussed isn't completely rigid as some items I did concurrently or jumped back and forth. 1. Hearing about RI first heard of R when it was used in a hierarchical linear models workshop I attended. The workshop focused more on the statistics part of the analysis so we didn't go in depth into the code. Subsequently I heard about R twice – Once from a friend studying Psychology, Yarden Ashur, and from my sister, Maayan Levinson, a statistician with the CBS. I'll admit it took me some time to pick it up, but eventually I did. 2. First learning stepsMy friend from Psychology also told me there's a recorded R course for psychology available on moodle (A platform for online course information) I could use freely. The course was led by Yoav Kessler and proved to be a fantastic introduction. I followed along with the course and did the different assignments until we reached ggplot, a library for plotting in R. 3. Joining the TidyTuesday communityBy the time we reached ggplot in the psychology course I was already somewhat familiar with Twitter where the #Tidytuesday mostly takes place. #Tidytuesday is an amazing project where every week a new dataset is published for the #rstats community to analyze, visualize and post their results on Twitter. My excitement and motivation to participate were extremely high: So many professional and experienced R-users working on the same dataset, conjuring amazing visualizations and posting their code for others to explore (and all this for free)?! I was blown away. So I followed along on Twitter for a week or two until I said OK, let's give it a try. It was week 38 in 2019 and we were working on visualizing national parks. I wasn't really sure what to do, so I did a minimal exploration of the data and noticed an interesting increase of visitors in national parks across years, which seemed intuitive and perfect for a first visualization. Following the basic area-graph I made I remembered a visualization a week earlier from Ariane Aumaitre using roller-coaster icons in her graph. Knowing nothing on how to integrate icons, I adapted her code into my visualization to create a nice scenery for the mountain the data displayed (see tweet below). I was pretty satisfied at the time and the feedback from the #rstats community was incredible – I was hooked on the project.
I believe the project was a fantastic introduction to continuously analyzing and visualizing data in R. Participating in the project provides a safe, motivating and rich setting to practice and learn R. Additionally, I didn't have anything that 'forced' me to learn R, so knowing that every week I had a new data set to analyze and visualize along with others provided me with a sense of routine and commitment. 4. Opening a GitHub accountFollowing the first visualization for #Tidytuesday I wanted to share the code I wrote. At the time I was only using GitHub to read code written by others. Using the Happy git with r guide I was able to properly upload my code and synchronize future work. Since then, using GitHub taught me so much: Reading others' code and discovering new functions; Organizing my own code so others can easily read it and thus 'forcing me' to clean it once I finished a project; and having a place to host all my efforts. I sincerely believe that opening a GitHub account to share everything I did was an important and pivoting moment learning R. Although I still have so much more to learn when it comes to cleaning code and project management, a lot of what I know now is attributed to having GitHub repositories and code as accessible as possible for others to explore and learn. 5. Visualizing things I was interested inAs I was participating in #Tidytuesday Eliud Kipchoge broke (unofficially) the two-hour marathon barrier. I found this amazing and wanted to visualize the comparison between the new record and older ones. I manually copied the marathon record values from Wikipedia and used that to plot running icons representing the different records. It wasn't an aesthetic plot but it was definitely rewarding. I've since improved the visualization by making it reproducible and eventually wrote a blog post explaining the process of how I made it. Similarly, a month or two later I plotted bomb shelter locations around my house amidst missiles fired towards Israel, all in R while using Google maps. I finally took an opportunity to make a visualization that related to my daily routine. 6. Continuous learningWell, it's kind of redundant to say this, as we're always learning, but it is important: After I joined the #Tidytuesday community, I started again to actively learn about visualizations and data wrangling in addition to solidifying my basic knowledge of R. For this I relied on the following sources:
7. Making my own websiteA month or two into learning R I noticed people had their own websites they made in R. Again I was fascinated at how this was possible – Not only can I wrangle data and beautifully visualize it but I can also build my own website? and for 10$ I can use my own domain? This was crazy!
Nearing mid January (4+- months into R) I decided it's time open my own website. I had a few things I already made (Eliud Kipchoge's record and the bomb shelters around my house) and also wanted a place for others to learn more about me. I scrolled and followed along the blogdown book for creating websites with R, viewed some of Allison Hill's blogdown workshops and other resources. Eventually, I was setup and had my website live, done in R, hosted for free on Netlify and GitHub with an elegant Hugo Academic theme, and my own domain for only 10$! I was amazed at how easy and rewarding this was. I mean, I had no knowledge (and still don't) of HTML, CSS or anything else to build a website and here I conjured one, and pretty easily! I highly recommend creating a website. Even if you're not an R user, I think a personal website is a great motivator for writing blogs; a platform for others to learn more about you and a not so difficult thing to do today. Opening a website has definitely motivated me to learn much more by writing about it (here's a great talk by David Robinson on The unreasonable effectiveness of public work). 8. Giving a talk about RDuring Passover (April) 2020, the Israel-2050 fellows group sent out a call inviting individuals to talk about anything they wanted. I decided to take the opportunity and give a talk there, and following that to a group of friends of mine that meet periodically with someone presenting something. Although being only ~7 months into learning R, I wanted to share its amazing abilities for wrangling and visualizing data, the extreme difference of using it compared to SPSS I learned and how it helped me explore intriguing questions I had. So I sat down, wrote an outline, and made a presentation using the {Xaringan} package. You can find the slides here. The talk was great (I think) and some of the participants even followed up inquiring about resources to get started, how can they do this in R, etc. However, more importantly, making and giving the talk forced me to think about what is it in R that I like. Organizing these thoughts and communicating them in a way that is appealing to the audience was a fantastic opportunity to stop and think about exactly that: Why do I like working in R and why should they join it. My first talk about R (use your keyboard arrow to scroll through it). 9. Integrating R into my daily workUsing R as a research assistant – I was very fortunate that the researcher I work for, Dr. Jennifer Oser, was (and still is) very supportive of integrating R into our daily work. I remember as we started analyzing our data and trying to make sense of it I was debating whether to open SPSS, Excel or R. Luckily, I knew how to do some of what we wanted to in R so I turned to use that. I believe we've greatly progressed since, so much that I find it absurd to use something else now. If you can integrate R into your daily work it's definitely a bonus, I know I learned a lot (I mean a lot) about rmarkdown and version control once I started using R in my research assistant position. Integrating R into my thesis – The reason I initially started learning R was so that I could analyze my thesis' findings and finish my MA with a new skill. No one forced me to use R, and I'm sure I could have done OK with SPSS (or maybe not?), but I was keen on using R in my thesis; it was an exciting and challenging experience. Prior to my thesis I've mostly done visualizations and descriptive reports so it was great working on regression models, reliability and other forms of reports. I also learned more about version control, using the same functions I wrote for the pilot study and my main analysis and so forth. I couldn't imagine producing SPSS tables and integrating them every time in a separate text document; plus, it was very rewarding trying to automate the process as much as possible. 10. Blog, and then blog some moreI imagine you've heard this saying a lot, but I definitely agree with it: If you like it then you should For example, to learn what was Term frequency inverse document frequency (tf-idf) I implemented it in analyzing the tfidf of 4 books by political theorists' I like. At one point I wanted to learn more about Bernoulli trials so I explored the uncertainty in the Israeli lottery. Alternatively, write about a challenge you faced and how you solved it.In another example, I wrote about presenting a static summary of categorical variables from my thesis pilot survey (found here). Don't write for others to click on your website; rather, write for you to learn or communicate something you want to share with your future self and the world, no matter who reads it. SummarySo this was a not-so-short recap of my last year, which I hope was of value. A lot of the above is owed to the amazing R community – Any and every one who blogs, shares his code, interacts about R on social media and was forthcoming. I'm very grateful to the many people I've reached out to with random questions, wanting to join their course or inquire about further reading. It's interesting to think back about something you've done and if and how would you have done it differently. As to the latter, I'm not sure, and I'm kind of glad that it happened the way it did. I think my main takeaways are:
If you're looking for a place to start, Oscar Baruffa compiled a fantastic resouce aggregating ~100 books about R (most are free). What's next for meGreat question! Honestly I don't know. I hope to finish my thesis soon and search for a job that'll require me to work with R and visualize data. In addition, I'll probably also try and learn some Tableau and improve my SQL skills as they are somewhat sought after in various jobs I looked at. As to R, I hope to learn some new concepts and statistical analyses; incorporate more #Tidytuesdays into my weekly routine; and analyze some data I have waiting around for a blog post. Of course everything is flexible, and in that case I really don't know what's waiting but I'm definitely excited about it! To leave a comment for the author, please follow the link and comment on their blog: R on Amit Levinson. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post My year in R first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||
2 Months in 2 Minutes – rOpenSci News, October 2020 Posted: 14 Oct 2020 11:00 AM PDT
[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. | ||||||||||||||||||||||||
Overengineering in ML – business life is not a Kaggle competition Posted: 14 Oct 2020 03:41 AM PDT
[This article was first published on That's so Random, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. "Overengineering is the act of designing a product to be more robust or have more features than often necessary for its intended use, or for a process to be unnecessarily complex or inefficient." This is how the Wikipedia page on overengineering starts. It is the diligent engineer who wants to make sure that every possible feature is incorporated in the product, that creates an overengineered product. We find overengineering in real world products, as well as in software. It is a relevant concept in data science as well. First of all, because software engineering is very much a part of data science. We should be careful not to create dashboards, reports and other products that are too complex and contain more information than the user can stomach. But maybe there is a second, more subtle lesson, in overengineering for data scientists. We might create machine learning models that predict too well. Sounds funny? Let me explain what I mean by it. In machine learning, theoretically at least, there is an optimal model give the available data in the train set. It is the one that gives the best predictions on new data, is the one that has just the right level of complexity. It is not too simple, such that it would miss predictive relationships between feature and target (aka is not underfitting), but it also not so complex that it incorporates random noise in the train set (aka is not overfitting).The golden standard within machine learning is to hold out a part of the train set to represent new data, to gauge where on the bias-variance continuum the predictor is. Either by using a test set, by using cross-validation, or, ideally, using both. Machine learning competitions, like the ones on Kaggle, challenge data scientists to find the model that is as close to the theoretical optimum as possible. Since different models and machine learning algorithms typically excel in different areas, oftentimes the optimal result is attained by combining them in what called an ensemble. Not seldom are ML competitions won by multiple contestants who joined forces and combined their models into one big super model. In the ML competition context, there is no such thing as "predicting too well". Predicting as well as you can is the sheer goal of these competitions. However, in real-world applications this is not the case, in my opinion. There the objective is (or maybe should be) creating as much business value as possibles. With this goal in mind we should realize that optimizing machine learning models comes with costs. Obviously, there is the salary of the data scientist(s) involved. As you come closer to the optimal model, the more you'll need to scrape for improvement. Most likely, there will be diminishing returns on the time spent as the project progresses in terms of gained prediction accuracy. But costs can also be in the complexity of the implementation. I don't mean the model complexity here, but the complexity of the product as a whole. The amount of code written might increase sharply when more complex features are introduced. Or using a more involved model might require the training to run on multiple cores or will increase the training time by, say, fivefold. Making your product more complex makes it more vulnerable for bugs and more dificult to maintain in the future. Although the predictions of a more complex model might be (slightly) better, it's business value might actually be lower than a simpler solution, because of this vulnaribility. The strange-sounding statement in the introduction of this blog "We might create machine learning models that perform too well", might make more sense now. Too much time and money can be invested, creating a product that is too complex and performs too well for the business needs it serves. With other words, we are overengineering the machine learning solution. Figthing overengineeringThere are at least two ways that will help you not to overengineer a machine learning product. First of all, by building a product incrementally. Probably no surprise coming from a proponent of working in an agile way, I think starting small and simple is the way to go. If the predictions are not up to par with the business requirements, see where the biggest improvement can be made in the least amount of time adding the least amount of complexity to the product. Then, assess again and start another cycle if needed. Until you arrive at a solution that is just good enough for the business need. We could call this Occam's model, the simplest possible solution that fulfills the requirements. Secondly, by realising that the call if the predictions are good enough to meet business needs is a business decision, not a data science choice. If you have someone on your team who is responsible for allocation of resources, planning, etc. (PO, manager, business lead, however they is called), it should be predominantly their call if there is need for further improvement. The question of these people to data scientists is too often "Is the model good enough, already?", where it should be "What is the current performance of the model?". As a data scientist, in the midst of optimisation, you might not be the best judge of good enough. Our ideas for further optimisation and general perfectionism could cloud our judgement. Rather, we should make it our job to inform the business people as best as we can about the current performance, and leave the final call to them. To leave a comment for the author, please follow the link and comment on their blog: That's so Random. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Overengineering in ML - business life is not a Kaggle competition first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||
Zoom talk on “Organising exams in R” from the Grenoble R user group Posted: 14 Oct 2020 01:55 AM PDT
[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Due to the current sanitary situation, the Grenoble (France) R user group has decided to switch 100% online. The advantage to this is that our talks will now open for anybody around the globe The next talk will be on October 22nd, 2020 at 5PM (FR): Organising exams in R
Link to the event: https://www.eventbrite.com/e/organising-exams-in-r-tickets-125308530187 Link to Zoom: https://us04web.zoom.us/j/76885441433?pwd=bUhvejdUb2sxa29saEk5M3NlMldBdz09 Link to the Grenoble R user group 2020/2021 calendar: https://r-in-grenoble.github.io/sessions.html Hope to see you there! Zoom talk on "Organising exams in R" from the Grenoble R user group was first posted on October 14, 2020 at 2:55 pm. ©2020 “R-posts.com“. Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at tal.galili@gmail.com To leave a comment for the author, please follow the link and comment on their blog: R-posts.com. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Zoom talk on "Organising exams in R" from the Grenoble R user group first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||
Climate Change & AI for GOOD | Online Open Forum Oct 15th Posted: 14 Oct 2020 01:55 AM PDT
[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Join Data Natives for a discussion on how to curb Climate Change and better protect our environment for the next generation. Get inspired by innovative solutions which use data, machine learning and AI technologies for GOOD. Lubomila Jordanova, Founder of Plan A, and featured speaker, explains that "the IT sector will use up to 51% of the global energy output in 2030. Let's adjust the digital industry and use Data for Climate Action, because carbon reduction is key to making companies future-proof." When used carefully, AI can help us solve some of the most serious challenges. However, key to that success is measuring impact with the right methods, mindsets, and metrics. The founders of startups that developed innovative solutions to combat humanity's biggest challenge, will share their experiences and thoughts: Brittany Salas (Co-Founder at Active Giving) | Peter Sรคnger (Co-Founder/Executive Managing Director at Green City Solutions GmbH) | Shaheer Hussam (CEO & Co-Founder at Aetlan) | Lubomila Jordanova (Founder at Plan A) | Oliver Arafat (Alibaba Cloud's Senior Solution Architect) Details Climate Change & AI for GOOD | Online Open Forum Oct 15th was first posted on October 14, 2020 at 2:55 pm. ©2020 “R-posts.com“. Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at tal.galili@gmail.com To leave a comment for the author, please follow the link and comment on their blog: R-posts.com. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Climate Change & AI for GOOD | Online Open Forum Oct 15th first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now |
You are subscribed to email updates from R-bloggers. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
Comments
Post a Comment