[R-bloggers] A look at past bear markets and implications for the future (and 8 more aRticles) |
- A look at past bear markets and implications for the future
- Free Coupon for our R Video Course: Introduction to Data Science
- A Little Something From Practical Data Science with R Chapter 1
- Online R, Python & Git Training!
- All you need to know on Multiple Factor Analysis …
- U.S. Census Counts Data
- Outlier Days with R and Python
- Flatten the COVID-19 curve
- Vectorising like a (semi)pro
A look at past bear markets and implications for the future Posted: 16 Mar 2020 11:47 AM PDT [This article was first published on Data based investing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The S&P 500 is officially in a bear market, and the crash from the high valuation levels has been fast and painful. There is however light at the end of the tunnel. In this post I'll demonstrate how the US stock market has developed during past bear markets and how the market has recovered during the ten years after the peak. The reason for choosing the ten years as the horizon is because I believe that you should not invest in stocks any money that you are going to need in the next ten years. The chance of having positive returns increases substantially with time and is almost ninety percent for a period of ten years. The worst annual return for a ten-year period has been about negative four percent since 1928 (sources). We'll use monthly total return data of S&P 500 from Shiller beginning from the year 1871 until the very end of last year. The index has been reconstructed to represent the US stock market for dates the S&P 500 didn't exist yet. The reason why we go so far back in time is to include as many bear markets as possible. Panics and manias have always existed, and the human nature has not changed enough in the past 150 years to make the past data less valid. There has however been a substantial change in the spread of information, which causes panic to spread faster and may possibly make bear markets shorter and deeper. First, let's take a look at the 14 bear markets found in the data in nominal terms, which describes how a portfolio would have developed without taking inflation into account. The horizontal black line indicates the drop needed to reach a bear market at minus 20 percent, and a blue color indicates that the return has been positive in the 10 years following the peak i.e. the ending value is higher than the value at the peak, and a red color indicates the opposite. Click to enlarge images Only two of the fourteen bear markets did not recover in ten years from the initial peak. Not surprisingly, the two bear markets were the ones that peaked at bubble territory in 1929 and 2000. Notice that the bear markets that peaked in 1919 and 1987 we followed by the exact same bubbles. Below is the same plot with real returns, so the returns describe the actual change of purchasing power by taking inflation into account. Notice that since bear markets are defined as being down by twenty percent in nominal terms, the returns might not dip below the black line because of deflation. In real terms, four of the fourteen bear markets did not recover after ten years of peaking. Judging by the history, this still leaves us an over 70 percent chance of the index being higher in the next ten years after inflation. Note that the bear market that peaked in 1968 is overlapping heavily with the bear market that peaked in 1972, so they could be considered to be the same bear market, which would increase our chances even further. Let's then plot the bear markets in red on top of the index to get a sense of the lengths of the bear markets, from peak to full recovery. The average length of a bear market from peak until recovery has been 3.95 years and the fall length from the peak until bottom i.e. a peak to trough time was 1.45 years. The longest bear market during the 1930s Great Depression was 15.33 years, and the longest time the stock market fell was 2.75 years. Lastly, let's take a look at just the drawdowns. The bear market threshold is again indicated with a black horizontal line. The monthly data is only until the end of the year 2019, so the recent drawdown of early 2020 is missing from the graph. At the time of writing, the index is down 27 percent, with only seven of the historical drawdowns being as severe as this one. The average drop in a bear market using monthly data has been 33.9 percent, with a maximum of 81.8 percent during the 1930s. Notice again that these are total returns. The drawdowns have been worse during periods with high valuations, as measured by Shiller CAPE or P/B. The maximum drawdowns seem to have also increased with time, which may be caused by lower valuations at the beginning of the time frame and possibly also because people have been more connected than ever, which makes the spread of panic easier. To conclude, this bear market has been rough and short this far. However, judging by the history, most bear markets recover fully in ten years. The valuations that are still elevated compared to history may however make the index to not to recover as much as in past bear markets. Be sure to follow me on Twitter for updates about new blog posts like this! The R code used in the analysis can be found here. To leave a comment for the author, please follow the link and comment on their blog: Data based investing. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
Free Coupon for our R Video Course: Introduction to Data Science Posted: 16 Mar 2020 09:46 AM PDT [This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL https://www.udemy.com/course/introduction-to-data-science/?couponCode=ITDS2020 . Please check it out and share it! To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
A Little Something From Practical Data Science with R Chapter 1 Posted: 16 Mar 2020 07:39 AM PDT [This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Here is a small quote from Practical Data Science with R Chapter 1.
Interested? Please check it out. To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
Online R, Python & Git Training! Posted: 16 Mar 2020 07:34 AM PDT [This article was first published on r – Jumping Rivers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Hey there! Here at Jumping Rivers, we have the capabilities to teach you R, Python & Git virtually. For the last three years we have been running online training courses for small groups (and even 1 to 1). How is it different to an in-person course?It's the same, but also different! The course contents is the same, but obviously the structure is adapted to online training. For example, rather than a single long session, we would break the day up over a couple of days and allow regular check-in points. For the courses, we use whereby.com. This provides screen-sharing for both instructor and attendees, none of the interactivity is lost. What about IT restrictions?Don't worry! If your current IT security/infrastructure is a problem, we have two solutions:
What is the classroom size?We have a maximum online classroom size of 12, including the instructor. Attendees will get the opportunity for a follow-up "virtual coding clinic", split into smaller class sizes, in order to enquire about anything related to the course or how they can apply it to their work. If you would like to enquire about virtual training, either email info@jumpingrivers.com or contact us via our website. The post Online R, Python & Git Training! appeared first on Jumping Rivers. To leave a comment for the author, please follow the link and comment on their blog: r – Jumping Rivers. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
All you need to know on Multiple Factor Analysis … Posted: 15 Mar 2020 10:00 PM PDT [This article was first published on François Husson, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Multiple facrtor analysis deals with dataset where variables are organized in groups. Typically, from data coming from different sources of variables. The method highlights a common structure of all the groups, and the specificity of each group. It allows to compare the results of several PCAs or MCAs in a unique frame of reference. The groups of variables can be continuous, categorical or can be a contingency table. Implementation with R softwareSee this video and the audio transcription of this video: Course videosTheorectical and practical informations on Multiple Factor Analysis are available in these 4 course videos:
Here are the slides and the audio transcription of the course. MaterialsHere is the material used in the videos:
To leave a comment for the author, please follow the link and comment on their blog: François Husson. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. | ||||||||
Posted: 15 Mar 2020 07:12 PM PDT [This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. As promised previously, I packaged up the U.S. Census data that I pulled together to make the population density and pyramid animations. The package is called uscenpops and it's available to install via GitHub or with Instead of an animation, let's make the less-flashy but, frankly, in all likelihood more useful small multiple plot seen here. With the package installed we can produce it as follows:
That's what the dataset looks like. We'll lengthen it, calculate a relative frequency (that we won't use in this particular plot) and add a base value that we'll use for the ribbon boundaries below.
Next we set up some little vectors of labels and colors, and then make a mini-dataframe of what we'll use as labels in the plot area, rather than using the default strip labels in
As before, the trick to making the pyramid is to set all the values for one category (here, males) to negative numbers.
The calls to To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
Outlier Days with R and Python Posted: 15 Mar 2020 05:00 PM PDT [This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to another installment of Reproducible Finance. Today's post will be topical as we look at the historical behavior of the stock market after days of extreme returns and it will also explore one of my favorite coding themes of 2020 – the power of RMarkdown as an R/Python collaboration tool. This post originated when Rishi Singh, the founder of tiingo and one of the nicest people I have encountered in this crazy world, sent over a note about recent market volatility along with some Python code for analyzing that volatility. We thought it would be a nice project to post that Python code along with the equivalent R code for reproducing the same results. For me, it's a great opportunity to use Before we get started, if you're unfamiliar with using R and Python chunks throughout an Let's get to it. Since we'll be working with R and Python, we start with our usual R setup code chunk to load R packages, but we'll also load the Note that I set my tiingo token twice: first using Next we will use a Python chunk to load the necessary Python libraries. If you haven't installed these yet, you can open the RStudio terminal and run Let's get to the substance. The goal today is look back at the last 43 years of S&P 500 price history and analyze how the market has performed following a day that sees an extreme return. We will also take care with how we define an extreme return, using rolling volatility to normalize percentage moves. We will use the mutual fund Let's start by passing a URL string from tiingo to the We just created a Python object called We just created a Python object called Heading back to R for viewing, we see that the date column is no longer a column – it is the index of the data frame and in We now have our prices, indexed by date. Let's convert adjusted closing prices to log returns and save the results in a new column called Next, we want to calculate the 3-month rolling standard deviation of these daily log returns, and then divide daily returns by the previous rolling 3-month volatility in order to prevent look-ahead error. We can think of this as normalizing today's return by the previous 3-months' rolling volatility and will label it as Finally, we eventually want to calculate how the market has performed on the day following a large negative move. To prepare for that, let's create a column of next day returns using Now, we can filter by the size of the Finally, let's loop through and see how the mean next day return changes as we filter on different extreme negative returns or we can call drop tolerances. We will label the drop tolerance as It appears that as the size of the drop gets larger and more negative, the mean bounce back tends to get larger. Let's reproduce these results in R. First, we import prices using the We can use Now let's We used a First, we will define a sequence of drop tolerances using the Next, we will create a function called Notice how that function takes two arguments: a drop tolerance and data frame of returns. Next, we pass our sequence of drop tolerances, stored in a variable called Have a quick glance up that the results of our Python Alright, let's have some fun and get to visualizing these results with Here's what happens when we expand the upper bound to a drop tolerance of -2% and make our intervals smaller, moving from .25% increments to .125% increments. Check out what happens when we expand the lower bound, to a -6% drop tolerance. I did not expect that gap upward when the daily drop passes 5.25%. A quick addendum that if I had gotten my act together and finished this 4 days ago I would not have included, but I'm curious how this last week has compared with other weeks in terms of volatility. I have in mind to visualize weekly return dispersion and that seemed a mighty tall task, until the brand new To break up our returns by week, we call We first filter our data with Now we run our From here, we can We can also plot the standard deviation of returns for each week. That's all for today! Thanks for reading and stay safe out there.
To leave a comment for the author, please follow the link and comment on their blog: R Views. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
Posted: 15 Mar 2020 04:00 PM PDT [This article was first published on Theory meets practice..., and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Abstract:We discuss why the message of flattening the COVID-19 curve is right, but why some of the visualizations used to show the effect are wrong: Reducing the basic reproduction number does not just stretch the outbreak, it also reduces the final size of the outbreak. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The markdown+Rknitr source code of this blog is available under a GNU General Public License (GPL v3) license from github. MotivationCurrent discussions about interventions for the ongoing COVID-19 outbreak talk a lot about flattening the epidemic curve, i.e. to slow down the outbreak dynamics. Because of limited health capacities, stretching out the outbreak over a longer time period will ensure, that a larger proportion of those in need of hospital treatment will actually get it. Other advantages of this approach are to win time in order to find better treatment forms and, possibly, to eventually develop a vaccine. Visualization of the flatten-the-curve-effect often look like this one taken from Twitter:
The origin of these illustration is discussed here. As much as I support the message and reasons for flattening the curve, some of the visualizations have shortcomings from an infectious disease modelling point of view: They transport the message that the number of individuals, which -as an result of the outbreak- will need hospitalization, is fixed. Hence, the figure suggests that it's impossible to avoid a certain number of infected (say 40-70% of the population), but we save lives by stretching out hospital cases over time. Although the conclusion is correct, the premise is IMHO wrong: Reducing the basic reproduction number by drastically reducing contacts or quickly isolating infectious diseases also reduces the size of the outbreak. Also others, like Ben Bolker have pointed out this flaw. We shall use a simple and common mathematical model from infectious disease modelling to illustrate this point. This model is easily implemented in R – showing how is a secondary objective of this post. The R code of this post is available from github. A word of caution at this point: The numbers and illustrations used in this post address the point of visualization and are not an attempt to generate actual policy advice. Susceptible-Infectious-Recovered modellingA well-known mathematical model to describe the dynamics of an infectious disease in a population is the so called susceptible-infectious-recovered (SIR) compartment model (Kermack and McKendrick 1927). This model assumes that each individual in the population population belongs to one of three states:
It is assumed that at time zero everyone is susceptible except for \(m\) individuals, which are infectious at time zero. Once infected an individual becomes infectious and then recovers. Mathematically, we shall by \(S(t)\), \(I(t)\) and \(R(t)\) denote the number of susceptible, infectious and recovered in the population at time \(t\). Furthermore, it is assumed that the population consists of a constant number of \(N\) individuals and at all times \(S(t)+I(t)+R(t)=N\). In other words the population is closed and does not vary over time. The dynamics in the number of susceptibles, infectious and recovered are now described using the following deterministic ordinary differential equation (ODE) system: \[ What does this mean? It denotes the movement of individuals between the three categories, in particular the movement from \(S\rightarrow I\) and \(I \rightarrow R\). The most important term in the equation system is \(\beta S(t) I(t)\) and can be motivated as follows: Consider one specific infectious individual at time \(t\), this individual meets a specific other person in the population at the rate of \(\beta\) contacts per time unit (say day). It is assumed that this rate is the same no matter which other person we talk about (aka. homogeneous mixing in the population). Of course this is a very strong simplification, because it ignores, e.g., the distance between two individuals and that you tend to mix with more with peers. But in a large population, the average is a good description. Hence, the number of contacts with susceptible individuals per time unit is \(\beta S(t)\). Now summing these contacts over all infectious individuals at time \(t\) leads to \(\sum_{j=1}^{I(t)}\beta S(t) = \beta I(t) S(t)\). Note that this is a non-linear term consisting of both \(I(t)\) and \(S(t)\). In the above process description,for the ease of exposition, it is assumed that once an infectious individual meets a susceptible person, then the disease is always transmitted from the infected to the susceptible person. Hence, the transmission probability does not depend on, e.g., how long the infectious individual has already been infectious. An equivalent way of formulating this statement is to say that each individual has contacts at rate \(\alpha\) for meeting a specific other person, and a proportion \(p\) of these contacts results in an infection. Then \(\beta = \alpha p\) is the rate at which infectious contacts occur. The second component of the model is the term \(\gamma I(t)\). Again considering one infectious individual it is assumed that the rate at which \(I\rightarrow R\) transition occurs happens at the constant rate \(\gamma\). This means that individuals are on average \(1/\gamma\) days infectious before they recover from the disease. In other words: The smaller \(\gamma\) is the longer people are infectious and, hence, the longer they can transmit the disease to others. Note: Recovering from an epidemic modelling point of view does not distinguish between individuals which recover by becoming healthy or by dying – what is important is that they do not contribute to the spread of the disease anymore. One important quantity, which can be derived from the above ODE equation system is the so called basic reproduction number, aka. \(R_0\) and is defined as (Diekmann, Heesterbeek, and Britton 2013) the expected number of secondary cases per primary case in a completely susceptible population. It is computed as** \(R_0 = N \frac{\beta}{\gamma}\). This means that if we consider the dynamics of a disease in generation time, i.e. in a time scale where one time unit is the time period between infection in the primary case and infection in the secondary case, then \(R_0\) denotes the growth factor in the size of the population at the beginning of the outbreak. What is special about the beginning of the outbreak? Well, more or less all contacts an infectious individual has, will be with susceptible individuals. However, once a large part of the population has already been infected, then \(R_0\) does not necessarily describe the expected number of cases per primary anymore. For the COVID-19 outbreak, since it is assumed that little immunity exists against the disease, all individuals will be susceptible and, hence, almost all contacts an infectious individual has, will be with susceptible persons. However, at a later stage in the epidemic, due to the depletion of susceptibles, the number of secondary cases since the population Assuming \(R(0)\) and letting \(I(0)=m\) we obtain \(S(0) = N-m\). We can use this initial configuration together with a numerical solver for ODEs as implemented, e.g., in the Assuming a hypothetical population of \(N = 1,000,000\) and a contact rate of \(\beta = 0.0000004\) means that the contact rate with a given individual is 0.0000004 contacts per day. The choice of \(\gamma = 0.2\) corresponds to an average length of the infective period of 5 days. Furthermore, assuming Altogether, this leads to an \(R_0\) of 2.25, which roughly corresponds to the \(R_0\) of SARS-CoV-2. We can now solve the ODE system using the above parameters and an initial number of infectious of, say, 10: Here we have introduced \(s(t) = S(t)/N\) and \(i(t) = I(t)/N\) as, respectively, the proportion of susceptible and infectious individuals in the population. Note that \(I(t)\) is the number of currently infectious persons. Since a person is usually infectious for more than one day this curve is not equivalent to the number of new infections per day. If interest is in this value, which would typically be what is reported by health authorities, this can be computed as \[ The epidemic curve of new infections per day is shown below: Another important quantity of the model is an estimate for how many individuals are ultimately infected by the disease, i.e. \(1-s(\infty)\) in a population where initially everyone is susceptible to the disease. This can be either calculated numerically from the above numerical solution as: or by numerically solving the following recursive equation (Diekmann, Heesterbeek, and Britton 2013, 15): \[ We can use the above equation to verify that the larger \(R_0\), the larger is the final size of the outbreak: However, despite a value of \(R_0>1\), not the entire population will be infected, because of the depletion of susceptibles. Hence, the exponential growth rate interpretation of \(R_0\) is only valid at the beginning of an outbreak. Reducing \(R_0\)As we could see from the equation defining \(R_0\) in our simple SIR-model, there are two ways to reduce the \(R_0\) of a disease:
For simplicity we shall only be interested in the first case and let's purse the following simple strategy, where the measures only depend on time: \[ The final size in the two cases: Here we used the rather conservative estimate that \(R_0\) can be reduced by 60% to 1.35 for a few weeks, after the reduction is 80% of the original \(R_0\), i.e. 1.8. Things of course become more optimistic, the larger the reduction is. One trap is though to reduce \(R_0\) drastically and then lift the measures too much – in this case the outbreak is delayed, but then almost of the same peak and size, only later. The simple analysis in this post shows that the final size proportion with interventions is several percentage points smaller than without interventions. The larger the interventions, if done right and timed right, the smaller the final size. In other words: the spread of an infectious disease in a population is a dynamic phenomena. Time matters. The timing of interventions matters. If done correctly, they stretch the outbreak and reduce the final size! DiscussionThe epidemic model based approaches to flatten the curve shows that the effect of reducing the basic reproduction number is not just to stretch out the outbreak, but also to limit the size of the outbreak. This is an aspect which seems to be ignored in some visualizations of the effect. The simple SIR model used in this post suffers from a number of limitations: It is a deterministic construction averaging over many stochastic phenomena. However, at the large scale of the outbreak we are now talking about, this simplification appears acceptable. Furthermore, it assumes homogeneous mixing between individuals, which is way too simple. Dividing the population into age-groups as well as their geographic locations and modelling the interaction between these groups would be a more realistic reflection of how the population is shaped. Again, for the purpose of the visualization of the flatten-the-curve effect again I think a simple model is OK. More involved modelling covering the establishment of the disease as endemic in the population are beyond this post, so is the effectiveness of the case-tracing. For more background on the modelling see for example the YouTube video about the The Mathematics of the Coronavirus Outbreak by my colleague Tom Britton or the work by Fraser et al. (2004). It is worth pointing out that mathematical models are only tools to gain insight. They are based on assumptions which are likely to be wrong. The question is, if a violation is crucial or if the component is still adequately captured by the model. A famous quote says: All models are wrong, but some are useful… Useful in this case is the message: flatten the curve by reducing contacts and by efficient contact tracing. LiteratureDiekmann, Odo, Hans Heesterbeek, and Tom Britton. 2013. Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press. Fraser, Christophe, Steven Riley, Roy M Anderson, and Neil M Ferguson. 2004. "Factors That Make an Infectious Disease Outbreak Controllable." Proceedings of the National Academy of Sciences of the United States of America 101 (16): 6146–51. Kermack, W. O., and A. G. McKendrick. 1927. "A Contribution to the Mathematical Theory of Epidemics." Proceedings of the Royal Society, Series A 115: 700–721. Soetaert, Karline, Thomas Petzoldt, and R. Woodrow Setzer. 2010. "Solving Differential Equations in R: Package deSolve." Journal of Statistical Software 33 (9): 1–25. https://doi.org/10.18637/jss.v033.i09. To leave a comment for the author, please follow the link and comment on their blog: Theory meets practice.... R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now | ||||||||
Posted: 14 Mar 2020 05:00 PM PDT [This article was first published on Data Imaginist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is slow! That is what they keep telling us (they being someone who "knows" R is a weird thing. Especially for people who has been trained in a classical This post will take you through the design of a vectorised function. The genesis The problemI have a height-map, that is, a matrix of numeric values. You know what? Let's This is just some simplex noise of course, but it fits our purpose… Anyway, we have a height-map and we want to find the local extrema, that is, the Vectorised, smecktorisedNow, had you been a trained C-programmer you would probably have solved this We already knew this. We want something vectorised, right? But what is Shit… To figure this out, we need to be a bit more clear about what we mean with a
We want to talk about 3.. Simply implementing this in compiled code would be Thinking with vectorsR comes with a lot of batteries included. Some of the more high-level function Going back to our initial problem of finding extrema: What we effectively are That's a lot of talk, here is the final function: (don't worry, we'll go through it in a bit) This function takes a matrix, and a neighborhood radius and returns a new matrix Let's go through it: Here we are simply doing some quick calculations upfront for reuse later. The Most of the magic happens here, but it is not that apparent. What we do is that This is where all the magic appear to happen. For each cell in the window, we If you want to leave now because I'm moving the goal-posts by my guest. Anyway, what is happening inside the This is really just wrapping up, even though the actual computations are Does it work?I guess that is the million dollar question, closely followed by "is it As for the first question, let's have a look Lo and behold, it appears as if we succeeded. Can vectorisation save the world?No… More to the point, not every problem has a nice vectorised solution. Further, If you are still not convinced then read through To leave a comment for the author, please follow the link and comment on their blog: Data Imaginist. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This posting includes an audio/video/photo media file: Download Now |
You are subscribed to email updates from R-bloggers. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
Comments
Post a Comment