[R-bloggers] How to Backtest your Crypto Trading Strategies in R (and 5 more aRticles) | |
- How to Backtest your Crypto Trading Strategies in R
- Recession Forecasting with a Neural Net in R
- #FunDataFriday – The Big Book of R
- High School Swimming State-Off Tournament Texas (2) vs. Florida (3)
- Correcting for confounded variables with GLMs
- Deploying flexdashboard on Github Pages
How to Backtest your Crypto Trading Strategies in R Posted: 04 Sep 2020 03:29 AM PDT
[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Few words about Trading StrategiesOne of the biggest challenges is to predict the Market. Many people have developed their own trading strategies, some of them are advanced based on Machine Learning and Artificial Intelligence algorithms like LSTM, xgBoost, Random Forest etc, some others are based on Statistical models like ARIMA, and some others are based on technical analysis. Whatever Trading Strategy we are about to apply we need to backtest it, meaning to simulate it and finally to evaluate it. Today, we will provide an example of how you can easily backtest your own trading strategy in R. Define the Trading StrategyFor illustrative purposes, we defined an arbitrary Trading Strategy which does not make much sense, but it is good to work on it as an example. Let's define the rules: When the close price of the Cryptocurrency is X consecutive days in the same direction (i.e. 7 consecutive days "up" or 7 consecutive days "down") then we Open a Position as follows:
Once we have Opened our positions we use the following alerts:
Every position is closed at the open price. Of course, we can extend this assumption by considering hourly or even per minute data. Every trade is 1 unit but we can change this to a multiplier or to a fraction. Notice that the assumption of the 1-unit does not affect our outcome, since we communicate the ROI which is the (P/L) as a percentage. R Code for to backtest the Trading StrategyYou can have a look at how we can get the Cryptocurrency prices in R and how to count the consecutive events in R. Below we build a function which takes as parameters:
Notice that the open positions that have not met the alert criteria of SL and TP and still "Active" and we return them with an "Active" status and as "Profit" we return their current "Profit". library(tidyverse) library(crypto) back_testing<-function(symbol="BTC", consecutive=7, SL=0.1, TP=0.1, start_date = "20180101") { df<-crypto_history(coin = symbol, start_date = start_date) df<-df%>%mutate(Sign = ifelse(close>lag(close),"up", "down"))%>% mutate(Streak=sequence(rle(Sign)$lengths)) df<-df%>%select(symbol, date, open, high, low, close, Sign, Streak)%>%na.omit()%>% mutate(Signal = case_when(lag(Sign)=="up" & lag(Streak)%%consecutive==0~'short', lag(Sign)=="down" & lag(Streak)%%consecutive==0~'long', TRUE~""), Dummy=TRUE ) Trades<-df%>%filter(Signal!="")%>%select(Open_Position_Date=date, Open_Position_Price=open, Dummy, Signal) Trades Portfolios<-Trades%>%inner_join(df%>%select(-Signal), by="Dummy")%>%filter(date>Open_Position_Date)%>%select(-Dummy)%>%mutate(Pct_Change=open/Open_Position_Price-1)%>% mutate(Alert = case_when(Signal=='long'& Pct_Change>TP~'TP', Signal=='long'& Pct_Change< -SL~'SL', Signal=='short'& Pct_Change>TP~'SL', Signal=='short'& Pct_Change< -SL~'TP' ) )%>%group_by(Open_Position_Date)%>%mutate(Status=ifelse(sum(!is.na(Alert))>0, 'Closed', 'Active')) Active<-Portfolios%>%filter(Status=='Active')%>%group_by(Open_Position_Date)%>%arrange(date)%>%slice(n())%>% mutate(Profit=case_when(Signal=='short'~Open_Position_Price-open, Signal=='long'~open-Open_Position_Price))%>% select(symbol, Status, Signal, date, Open_Position_Date, Open_Position_Price, open, Profit) Closed<-Portfolios%>%filter(Status=='Closed')%>%na.omit()%>%group_by(Open_Position_Date)%>%arrange(date)%>%slice(1)%>% mutate(Profit=case_when(Signal=='short'~Open_Position_Price-open, Signal=='long'~open-Open_Position_Price))%>% select(symbol, Status, Signal, date, Open_Position_Date, Open_Position_Price, open, Profit) final<-bind_rows(Closed,Active)%>%ungroup()%>%arrange(date)%>%mutate(ROI=Profit/Open_Position_Price, Running_ROI=cumsum(Profit)/cumsum(Open_Position_Price)) return(final) } Results of the BacktestLet's assume that we want to backtest the trading strategy that we described earlier with the following parameters: ttt<-back_testing(symbol="BTC", consecutive=5, SL=0.1, TP=0.15, start_date = "20180101")
DiscussionAs we can see the Trading Strategy ended with -2.18% P/L without taking into account the transaction fees. You can see also that in some periods was profitable (2018) and in some other periods was not (2019), that is why is very important to backtest a trading strategy for many different periods. Notice that with this function we can backtest hundred of backtest strategies with some small tweaks. Instead of this simple trading strategy could be an advanced machine learning model. Once we define the trading signals, then the backtest is the same. To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post How to Backtest your Crypto Trading Strategies in R first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Recession Forecasting with a Neural Net in R Posted: 04 Sep 2020 12:30 AM PDT
[This article was first published on R – Franklin J. Parker, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. I spend quite a bit of time at work trying to understand where we are in the business cycle. That analysis informs our capital market expectations, and, by extension, our asset allocation and portfolio risk controls. For years now I have used a trusty old linear regression model, held together with bailing wire, duct tape, and love. It's a good model. It predicted the 2020 recession and the 2008 recessions (yes, out of sample). Even so, a couple of years ago I began experimenting with a neural net to (a) increase the amount of data I could shove at it and (b) possibly get a better model. Of course, I wouldn't retire my old model just yet, I'd just run them along side each other. So, here is the framework for forecasting a recession using a neural net in R. What are We Trying to Predict, Anyway?The first question we need to answer is what we want our output to be. Typically, markets see recessions coming 3 to 6 months before they officially hit. If I'm going to add value to a portfolio, I need to be rebalancing out of risk assets before everyone else. So, for my purposes, I'd like to have 9 to 12 months of lead time to get ahead of a recession. Ideally, then, our model would look like this: Except, instead of getting 1s concurrently with a recession, we would get them 12-months in advance. This, then, will be our output: USREC data with a 12-month lead. The next challenge is finding the data points which reliably predict recessionary environments. Now let's pause here for a second and acknowledge some danger… We could data mine this thing and swap indicators in and out until we have an over-fitted model that doesn't work going forward. Personally, I'm not a fan of that. It may be a bird-dog style analysis, pointing the way to potential information, but to me that is too dangerous. I could construct any story to "justify" random inputs to a model. I would rather begin with indicators that (1) have been at least acknowledged and studied by others, and (2) I can build a convincing theory for how it would influence/indicate the business cycle a priori. Then I'll plug them in and see if the model is effective. I'd rather have an imperfect hammer than a brittle one that breaks on first impact. InputsTor the sake of this demonstration, I've settled on three inputs. Each of these has academic research behind it and has worked ex post. What's more, I can build strong justifications for why each of these should both indicate and influence the economy as a whole. Of course there are others, but this is just a demonstration after all!
Now, we won't be using the raw figure of each of these. We are going to convert them into "signal" form. The idea is to boil-down the essence of the signal so the neural net has to do as little work as possible. That is, in my opinion, the most effective way to increase your accuracy without increasing model fragility. The Code – Get and Wrangle DataWe are going to need five libraries for this one: library(quantmod) library(tidyverse) library(nnet) library(DALEX) library(breakDown) Note that DALEX and breakDown are more about visualization and understanding of the model later. Next, let's load our raw data. By the way, you can get these symbols from a search on the St. Louis Fed website. The symbol is immediately after the title. If you haven't been to the FRED database… well, then we can't be friends anymore and I can't invite you to my birthday party. # Get data getSymbols( c('USREC', 'UNRATE', 'T10Y3M', 'DFF'), src = 'FRED') # Note the differing periods of data USREC %>% head() # Monthly data USREC 1854-12-01 1 1855-01-01 0 1855-02-01 0 1855-03-01 0 1855-04-01 0 1855-05-01 0 UNRATE %>% head() # Monthly data UNRATE 1948-01-01 3.4 1948-02-01 3.8 1948-03-01 4.0 1948-04-01 3.9 1948-05-01 3.5 1948-06-01 3.6 DFF %>% head() # Daily data DFF 1954-07-01 1.13 1954-07-02 1.25 1954-07-03 1.25 1954-07-04 1.25 1954-07-05 0.88 1954-07-06 0.25 T10Y3M %>% head() # Daily data (sort of) T10Y3M 1982-01-04 2.32 1982-01-05 2.24 1982-01-06 2.43 1982-01-07 2.46 1982-01-08 2.50 1982-01-11 2.32 Our data is not aligned, so we have a touch of work to do to wrangle it, then signal-fy it. Though the other data starts much earlier, the yield curve data doesn't start until 1982, so we'll cut everything before then. # Wrangle Data, Convert to Signals # Get all data into monthly format, starting in 1982 recessions <- USREC unemployment <- UNRATE fedfunds <- to.monthly(DFF)$DFF.Open yieldcurve <- to.monthly(T10Y3M)$T10Y3M.Open Again, I won't cover all of the theory behind the signalification of our data, but just know that this isn't me making random stuff up. (You should never trust anyone who says that, by the way.) The other important component of these signals is how they are shifting through time. Because the nnet package doesn't look backward at the data, it only takes in this moment, we need to add inputs that look backward. I'm doing that by lagging each signal by 3 months and 6 months. # Unemployment signal: 12-month slow moving average minus # the headline unemployment rate. signal_unemployment <- SMA(unemployment, n = 12) - unemployment signal_unemployment_lag3 <- lag(signal_unemployment, n = 3) signal_unemployment_lag6 <- lag(signal_unemployment, n = 6) # FedFunds signal: effective federal funds rate minus the # maximum fedfunds rate over the past 12 months signal_fedfunds <- fedfunds - rollmax(fedfunds, k = 12, align = 'right') signal_fedfunds_lag3 <- lag(signal_fedfunds, n = 3) signal_fedfunds_lag6 <- lag(signal_fedfunds, n = 6) # Yield curve signal: 10-year US treasury yield minus the 3-month # US treasury yield. signal_yieldcurve <- yieldcurve signal_yieldcurve_lag3 <- lag(signal_yieldcurve, n = 3) signal_yieldcurve_lag6 <- lag(signal_yieldcurve, n = 6) Recall that we want a 12-month lead time on our recession indicator, so we are going to adjust USREC data by 12-months for the purposes of training our model. # Since we want to know if a recession is starting in the next 12 months, # need to lag USREC by NEGATIVE 12 months = lead(12) signal_recessions <- recessions %>% coredata() %>% # Extract core data lead(12) %>% # Move it 12 months earlier as.xts(., order.by = index(recessions)) # Reorder as an xts, with Recessions adjusted If you remember from neural net training 101, a neural net has hyperparameters, i.e. how big the net is and how we manage weight decay. This means we can't simply train a neural net. We have to tune it also. Which brings us to the really sucky part: we need to divide our data into a train set, tune set, and out-of-sample test set. This means tedious "window" work. Ugh. # Seperate data into train, test, and out-of-sample # Training data will be through Dec 1998 # Testing data will be Jan 1999 through Dec 2012 # Out of sample will be Jan 2013 through July 2020 start_train <- as.Date('1982-08-01') end_train <- as.Date('1998-12-31') start_test <- as.Date('1999-01-01') end_test <- as.Date('2012-12-31') start_oos <- as.Date('2013-01-01') end_oos <- as.Date('2020-07-31') train_data <- data.frame( 'unemployment' = window( signal_unemployment, start = start_train, end = end_train ), 'unemployment_lag3' = window( signal_unemployment_lag3, start = start_train, end = end_train ), 'unemployment_lag6' = window( signal_unemployment_lag6, start = start_train, end = end_train ), 'fedfunds' = window( signal_fedfunds, start = start_train, end = end_train), 'fedfunds_lag3' = window( signal_fedfunds_lag3, start = start_train, end = end_train), 'fedfunds_lag6' = window( signal_fedfunds_lag6, start = start_train, end = end_train), 'yieldcurve' = window( signal_yieldcurve, start = start_train, end = end_train), 'yieldcurve_lag3' = window( signal_yieldcurve_lag3, start = start_train, end = end_train), 'yieldcurve_lag6' = window( signal_yieldcurve_lag6, start = start_train, end = end_train), 'recessions' = window( signal_recessions, start = start_train, end = end_train) ) test_data <- data.frame( 'unemployment' = window( signal_unemployment, start = start_test, end = end_test ), 'unemployment_lag3' = window( signal_unemployment_lag3, start = start_test, end = end_test ), 'unemployment_lag6' = window( signal_unemployment_lag6, start = start_test, end = end_test ), 'fedfunds' = window( signal_fedfunds, start = start_test, end = end_test), 'fedfunds_lag3' = window( signal_fedfunds_lag3, start = start_test, end = end_test), 'fedfunds_lag6' = window( signal_fedfunds_lag6, start = start_test, end = end_test), 'yieldcurve' = window( signal_yieldcurve, start = start_test, end = end_test), 'yieldcurve_lag3' = window( signal_yieldcurve_lag3, start = start_test, end = end_test), 'yieldcurve_lag6' = window( signal_yieldcurve_lag6, start = start_test, end = end_test), 'recessions' = window( signal_recessions, start = start_test, end = end_test) ) oos_data <- data.frame( 'unemployment' = window( signal_unemployment, start = start_oos, end = end_oos ), 'unemployment_lag3' = window( signal_unemployment_lag3, start = start_oos, end = end_oos ), 'unemployment_lag6' = window( signal_unemployment_lag6, start = start_oos, end = end_oos ), 'fedfunds' = window( signal_fedfunds, start = start_oos, end = end_oos), 'fedfunds_lag3' = window( signal_fedfunds_lag3, start = start_oos, end = end_oos), 'fedfunds_lag6' = window( signal_fedfunds_lag6, start = start_oos, end = end_oos), 'yieldcurve' = window( signal_yieldcurve, start = start_oos, end = end_oos), 'yieldcurve_lag3' = window( signal_yieldcurve_lag3, start = start_oos, end = end_oos), 'yieldcurve_lag6' = window( signal_yieldcurve_lag6, start = start_oos, end = end_oos), 'recessions' = window( signal_recessions, start = start_oos, end = end_oos) ) colnames(train_data) <- colnames(test_data) <- colnames(oos_data) <- c( 'unemployment', 'unemployment_lag3', 'unemployment_lag6', 'fedfunds', 'fedfunds_lag3', 'fedfunds_lag6', 'yieldcurve', 'yieldcurve_lag3', 'yieldcurve_lag6', 'recessions' ) Comment below if you have a more efficient way of doing this! The Code – Train & Tune Our Neural NetFrom here we need to make some educated, but mostly guess-like, decisions. We must specify a minimum and maximum neural net size. I remember reading once that a neural network hidden layer should be no less than about 35% the number of inputs, and no larger than 1.5 times the total number of inputs. I'm open to critique here, but that's what I'll specify for the minimum and maximum. # Hyperparameters for tuning # Minimum net size size_min <- (ncol(train_data) * 0.35) %>% round(0) # Maximum net size size_max <- (ncol(train_data) * 1.5) %>% round(0) size <- seq(size_min, size_max, 1) The next hyperparameter is weight decay. As far as I know, there is really no rule of thumb for this, other than it should be 0 < w < 1. For illustration, I'm going to sequence it between 0.1 and 1.5. # Weight decay settings, start at seq(2, 50, 2) then up it. # This figure will get divided by 100 in the coming code. w_decay <- seq(10, 150, 2) Yay! Now the fun part begins! We want the neural network that delivers the best possible Brier score, and we want to output the optimal model and hyperparameters for inspection and use later. Normally I wouldn't log the Brier scores as they go, but I want to see how net size and various weight decays effect our model, just for fun. best_brier <- 1 # Prime variable net_size_optimal <- 5 # Prime variable w_decay_optimal <- 0.2 # Prime variable # Log evolution of RMSE, rows are net size, columns are weight decay brier_evolution <- 0 i <- 1 j <- 1 for(n in size){ for(w in w_decay){ # Train model model_fit <- nnet( as.matrix(train_data$recessions) ~ unemployment + unemployment_lag3 + unemployment_lag6 + fedfunds + fedfunds_lag3 + fedfunds_lag6 + yieldcurve + yieldcurve_lag3 + yieldcurve_lag6, data = as.matrix(train_data), maxit = 500, size = n, decay = w/100, linout = 1 ) # Test model (for hyperparameter tuning), return Brier model_predict <- predict( model_fit, newdata = as.matrix(test_data) ) brier <- mean( (model_predict - test_data$recessions)^2 ) # If this is the best model, store the hyperparams and the model if( brier < best_brier ){ net_size_optimal <- n w_decay_optimal <- w/100 best_brier <- brier model <- model_fit } brier_evolution[i] <- brier i <- i + 1 } } And BOOM! We've got ourselves an artificial neural network! The Code – ResultsFirst, let's see how our hyperparameters affected our Brier score. # Drop Brier score into a matrix brier_evolution <- matrix( brier_evolution, nrow = length(w_decay), ncol = length(size), byrow = FALSE ) rownames(brier_evolution) <- w_decay/100 colnames(brier_evolution) <- size # View Brier score results persp( x = w_decay/100, y = size, z = brier_evolution, ticktype = 'detailed', xlab = 'Weight Decay', ylab = 'Net Size', zlab = 'Brier Score', shade = 0.5, main = 'Brier Score, Effect of Net Size and Weight Decay') Yielding It is so interesting to me how weight decay is, by far, the most influential hyperparameter. Size matters, but not nearly as much. Also, we should note there is not too much difference between the worst model we tested and the best, as the Brier score drops from 0.15 to 0.10. Just to put Brier scores into perspective: a score of 0 is perfect prescience, and a score of 0.50 is no better than chance. Tetlock claims that we need a Brier score of 0.30 or better to generate portfolio alpha (this is a firm-wide metric, see plot from his paper below), but it looks like 0.20 and below is really where you want to be. ![]() At 0.10, we are on track! Of course, this is in-sample. How did we do out-of-sample? # Use model Out of Sample, plot results model_predict_oos <- predict( model, newdata = as.matrix(oos_data) ) data.frame( 'Date' = index( as.xts(oos_data)), 'Probability' = coredata(model_predict_oos) ) %>% ggplot(., aes(x = Date, y = Probability) )+ geom_line(size = 2, col = 'dodgerblue4')+ labs(title = 'Probability of Recession in the Next 12 Months')+ xlab('')+ ylab('')+ theme_light()+ theme( axis.text = element_text(size = 12), plot.title = element_text(size = 18)) Which looks pretty good! Here we see that the traditional metrics produced a pretty reliable indication that 2020 would see a recession. My trusty-rusty old linear regression model predicted the same, by the way! I am very interested to know our out-of-sample Brier score. > brier_oos <- mean( (model_predict_oos - oos_data$recessions)^2, na.rm = T ) > brier_oos [1] 0.04061196 0.04 is about as close to prescience as you can get. Honestly, it scares me a little because it is a bit too good. It is also suspicious to me that the model produced negative probabilities of a recession for 2014 through 2016. Before putting this into production I'd really like to know why that is. What's Driving These ResultsUsing the DALEX and breakDown packages, we can examine what is driving the results. # Try to understand what drives the model using DALEX package model_explanation <- explain( model, data = oos_data, predict_function = predict, y = oos_data$recessions ) bd <- break_down( model_explanation, oos_data[ length(oos_data), 1:(ncol(oos_data)-1) ], keep_distributions = T ) plot(bd) Giving us This illustrates the largest factors effecting the outcome in the most recent data point. It looks like the model has a standard bias of 0.081 (the intercept), and the most recent 3-month lagged yield curve data is what moved the needle most. All-in, the yield curve is the strongest signal in this data point (which doesn't surprise me at all, it is the most well-known and widely studied because it has accurately predicted every recession since nineteensixtysomething). We can also see how changes in any of the data points affect the output of the model profile <- model_profile( model_explanation, variables = c( 'unemployment', 'unemployment_lag3', 'unemployment_lag6', 'fedfunds', 'fedfunds_lag3', 'fedfunds_lag6', 'yieldcurve', 'yieldcurve_lag3', 'yieldcurve_lag6' ) ) plot(profile, geom = 'aggregates') Again, the yield curve appears to be the most influential variable as changes there (x-axis) dramatically affect the output (y-axis). That said, unemployment is also a significant factor, except when lagged by 6 months. Changes there seem to have no effect. In the end, I'd say this is a productive model. It appears robust across the various inputs, and it has an excellent out-of-sample Brier score for predicting recessions (though 2020 is our only data point for that). I am concerned that the Brier score is too good and we have created a fragile model that may not work with future inputs, and I am concerned about the negative probabilities produced in 2013 through 2016 (this may be a function of the fragility inherent in the model–those years carried unprecedented inputs). Ultimately, there is enough here for me to dig and refine further, but I wouldn't put this exact model into production without addressing my concerns. As always, if you have critiques of anything here, please share them so that I can correct errors and learn! To leave a comment for the author, please follow the link and comment on their blog: R – Franklin J. Parker. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Recession Forecasting with a Neural Net in R first appeared on R-bloggers. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
#FunDataFriday – The Big Book of R Posted: 04 Sep 2020 12:11 AM PDT
[This article was first published on #FunDataFriday - Little Miss Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. WHAT IS IT?The big book of R is an open-source web page created by #rstats community member Oscar Baruffa. The page functions as an easy to navigate, one-stop-shop for available books on the R programming language.
WHY IS IT AWESOME?It's awesome because it's so easy to browse for new content on our beloved R. Oscar jumpstarted the website with over 100 books and since the launch, there has been an ongoing stream of contributions from the open-source community. Best of all, most of the books are either free or very low cost.
HOW TO GET STARTED?It's very simple, go to the website and start browsing for new R content! If you have a favorite book that you don't see listed, please contribute your content by submitting a pull request or issue into the GitHub repo.
To leave a comment for the author, please follow the link and comment on their blog: #FunDataFriday - Little Miss Data. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post #FunDataFriday - The Big Book of R first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
High School Swimming State-Off Tournament Texas (2) vs. Florida (3) Posted: 03 Sep 2020 11:00 AM PDT
[This article was first published on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Welcome to round two of the State-Off. Today we have Texas (2) taking on Florida (3) for the right to compete in the State-Off championships! Don't forget to update your version of Since I like to use the same style
Getting ResultsAs I mentioned in the We'll just pull results for Texas and Florida, and then stick them together with
Scoring the MeetHere's one of the new functions for For the State_Off we've been using a
We have our first split result of of the State-Off, with the Texas boys and Florida girls each winning. This is a combined affair though, so let's see which state will advance…
And it's Texas, living up to it's higher seed!
Swimmers of the MeetSwimmer of the Meet criteria is the same as it's been for the entire State-Off. First we'll look for athletes who won two events, thereby scoring a the maximum possible forty points. We'll also grab the All-American cuts to use as a tiebreaker, in case multiple athletes win two events. It's possible this week we'll have our first multiple Swimmer of the Meet winner – the suspense!
Boys
Joshua Zuchowski is the boys swimmer of the meet. He's a new face for this particular award – in Florida's first round meet he finished third behind two guys from Illinois. Also no boy won two events.
Girls
Lillie Nordmann continues her winning ways, being named girls swimmer of the meet for the second meet in a row. While she hasn't written in to answer my questions about swimming outdoors in cold weather she did tell SwimSwam that she's postponing her collegiate career because of the ongoing pandemic. I don't blame her. As these results attest, she's a champ, and frankly this way she'll be able to focus more on trying to make the Olympic team. Also, it's warmer in Texas than it is in Palo Alto (or nearby San Jose). Just saying. Actually I'm not just saying – I have data on the temperatures in San Jose and Houston, from the National Weather Service. Let me show you what I'm talking about, in both Fahrenheit and Celsius (the State-Off has a surprisingly international audience). Those are the monthly average temperatures. Plotting
The DQ ColumnA new feature in
Not surprisingly most DQs are seen in relays. False starts can and do happen as teams push for every advantage. The line between a fast start and a false start is exceedingly fine. No one is trying to false start, it just happens sometimes. Breaststroke on the other hand is a disciple where intentional cheating can and sometimes does prosper. Recall Cameron van der Burgh admitting to taking extra dolphin kicks after winning gold in Rio, or Kosuke Kitajima having "no response" to Aaron Peirsol's accusation (backed up by video) that he took an illegal dolphin kick en route to gold in Beijing. Breaststrokers also seem to struggle with the modern "brush" turn, and ring up a lot of one-hand touches. The Texas results (5A, 6A) actually say what infraction was committed, and it's the usual mix of kicking violations and one hand touches. Florida doesn't share reasons for DQs, but just from my own experience – kicking and one hand touches. Must be in the water. Or not, because butterflyers don't seem to suffer from the same issues. Could be the the butterfly recovery is more difficult to stop short, or to do in an unbalanced fashion such that only one hand makes contact with the wall? I don't know, it's an genuine mystery, but it is nice that my personal observations from officiating are borne out in the data. Plotting Times With SwimmeRWe might be interested in seeing a distribution of times for a particular event. There's a problem though. Swimming times are traditionally reported as minutes:seconds.hundredths. In Here we see classic "bell" curves for each state, with a few very fast or very slow (comparatively speaking) swimmers, and a bunch of athletes clumped right in the middle. In ClosingTexas wins the overall, despite a winning effort from the Florida girls. Next week at Swimming + Data Science we'll have number one seed California (1) vs. number five Pennsylvania (5). We'll also dig further into new To leave a comment for the author, please follow the link and comment on their blog: Swimming + Data Science. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post High School Swimming State-Off Tournament Texas (2) vs. Florida (3) first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Correcting for confounded variables with GLMs Posted: 03 Sep 2020 11:00 AM PDT
[This article was first published on Bluecology blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Correcting for confounded variables with GLMsGeneral (and generalized) linear models can be useful for analyzing For instance, we used GLMs in a meta-analysis of the rates species are shifting their ranges under climate change. We used the GLMs to correct for differences in ways different studies had measured species ranges, so we could then study the unique effects of ecological variables. You can also think of these GLMs with multiple covariates as 'statistically' (rather than experimentally) controlling for the effects of each variable when looking at the effects of the other variables. In this post I'll demonstrate this application for statistical controls with a simple example. Simulate dataWe'll start by simulating some data. Let's say the data represent fish We'll assume data are poisson distributed, to account for the fact that To simulate the data we will specify an intercept and an habitat effect Then we need to make up the covariate data. We made it so habitat area was on average twice as big in the second Now assemble a 'design matrix' ( Now make the 'true' means for our simulation and the fish abundance We took an exponent to ensure positive values. If this confuses you so far, you can read more about linear models and then generalized linear models on my other blogs. Plot simulations So not strong evidence of any effect of habitat area on abundance. What So it appears the red region (the region with Naive GLMNow let's fit a GLM assuming we don't know about bioregion Habitat is significant (p < 0.001), but the estimate is in the wrong To see this, we can do a plot with the So predicting an effect of habitat area that decreases with habitat Region adjusted GLMOk, so now try including region: Habitat area is significant and positive now. Note also the effect size So there's basically no evidence for the model without bioregion But notice that the region effect is approximately correct (-23), but We see this if we do a plot, and habitat area appears insignificant: Now replot the predictions, asking for habitat area just in the first Now we see the trend. So our GLM has corrected for the absence of fish in bioregion 2 and To leave a comment for the author, please follow the link and comment on their blog: Bluecology blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Correcting for confounded variables with GLMs first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Deploying flexdashboard on Github Pages Posted: 03 Sep 2020 11:00 AM PDT
[This article was first published on Rami Krispin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. One of Github's coolest features is Github Pages, which enables you to create and deploy websites under github.com domain for free. The most common usage of Github Pages is the deployment of project documentation. The R community is widely using it to deploy different R Markdown formats such as package documentation with pkgdown, blogs with blogdown, books with bookdown, etc. Recently, as I was looking for a way to share Covid19 data, I had this eureka moment realizing that it should be straightforward to deploy flexdashboard or any other format of R Markdown (none Shiny) on Github Pages. After a short googling, I come across this example by Phil Batey. Since then, I started to utlized this feature in some of my packages (visualizing Covid19 datasets). Here are some use cases of such dashboards:
This post provides a short tutorial for deploying flexdashboard on Github Pages. Flexdashboard on Github PagesThe flexdashboard package provides a customized format for building interactive dashboards. It is a simplistic, useful, and fast method for developing a static dashboard that does not require a big data or back-end server (although you can use flexdashboard with Shiny to create a dynamic dashboard with back-end server support). To deploy flexdashboard on Github Pages, you will need:
Simple exampleThe following example demonstrates the deployment process of flexdashboard on Github Pages. We will start by creating a new repository for the deployment (can also deploy on an existing repository as well) using the Github web interface: Note that the repository must be public to expose it on a Github page. We will set the name, keep all the checkboxes unchecked, and create the repository. The next step is to create a local folder and sync it with the
Before we sync the new folder with the More details about the Next, we will use the built-in flexdashboard template by selecting on Rstudio on the top left options Once creating the template, a new file will pop-up on the source pane named Now we are ready to render the dashboard! Click on the You can noticed on the screenshot above that the dashboard file name (on the dashboard top right) is The last step of the deployment is to sync the local repo with the We will init a git work space on the local folder we created for the project, and commit the folder content by using the origin address from the red box above: Now, if all went well, you should see the content of the local folder on the Github repository. The last step would be to set the Github Page. Go to the repository And… it deployed! https://ramikrispin.github.io/flexdashboard_example/ Note that the page URL is a combination of ResourcesHere are some useful resources for deploying flexdashboard on Github Pages:
To leave a comment for the author, please follow the link and comment on their blog: Rami Krispin. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Deploying flexdashboard on Github Pages first appeared on R-bloggers. This posting includes an audio/video/photo media file: Download Now |
You are subscribed to email updates from R-bloggers. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
Thank you because you have been willing to share information with us. we will always appreciate all you have done here because I know you are very concerned with our. Non custodial exchange
ReplyDeleteI’m going to read this. I’ll be sure to come back. thanks for sharing. and also This article gives the light in which we can observe the reality. this is very nice one and gives indepth information. thanks for this nice article... crypto market maker
ReplyDeleteI’m going to read this. I’ll be sure to come back. thanks for sharing. and also This article gives the light in which we can observe the reality. this is very nice one and gives indepth information. thanks for this nice article... market making bot
ReplyDelete