Using Rmarkdown without re-running long commandsEmily DolsonOctober 3, 2015One potential obstacle to using Rmarkdown with computationally-intensive projects is that waiting for the whole thing to run again every time you make a small change to your document is a pain. So here are a few workarounds. Option 1: Put commands that only need to be run once in an eval=FALSE chunk It’s usually important to include commands to download and unpack data with your script, so that it’s clear which data you were working with. However, you probably never need to run those commands more than once. In this case, the best options is probably to put them inside an r chunk, with the ```{r, eval=FALSE} #download huge dataset ``` Note that if someone else downloads and runs your file, they will need to set the Option 2: Use caching To deal with this very problem, chunks in R markdown have a ```{r, cache=TRUE, cache.comments=TRUE} #code that takes forever to run #I can add this comment later without rerunning the code that takes forever! ``` Option 3: Load previous results conditionally Caching is all well and good, but what if you were playing around in the console (rather than your Rmarkdown file), and created some object that took forever to create. Or what if the script that you’re documenting can’t even run on your computer (for instance, maybe you had to run it on the HPCC instead)? In this case, your best bet is to save the object from wherever you created it, and then have your Rmarkdown code look for the file you stored it in before trying to run the code again. This idea was originally suggested in this stackoverflow post. Here’s an example of how it might work: First you store the object into a file:
Then make sure this file is in the current working directory and add something like this to the part of your Rmarkdown script where you need to use ```{r}
``` By following these approaches, you can make sure that your code is reproducible without needing to wait for it to reproduce itself every time you knit the file!
10 Comments
Seeking help onlineAllison SussmanOctober 16, 2015Description: Sometimes (more often than not) no matter what you do, you cannot figure out what is going wrong in your code. So the solution is to reach out to others for help, typically via email or in an online forum. Showing your code is only half the information you need to post. Using sessionInfo() will spit out all the relevant information about your version of R, your operating system, and the packages you have installed. This information goes a long way for the folks willing to help you.
It’s pretty simply and will really save you and the people helping you a little bit of time and trouble. Debugging R functionsEmily Dolson09/23/2015For the most part, R pacakges have pretty great documentation on how to use them. When there are areas where the documentation is unclear, there are generally lots of people on the internet who have had the same problem and figured out the solution. But occasionally you come across a problem that no one else seems to have had. The internet being as vast as it is, 90% of the time this is an indication that you have made a typo or something. Sometimes, though, you’re just actually the first person to have encountered this problem (or been sufficiently determined to solve it). So what are you supposed to do? Here are some steps that you can take to debug your code: Step 1: Try to re-create the problem in the simplest possible way possible. You probably encountered it while you were doing some very specific thing to your data in the midst of all sorts of complicated transformations and plotting. That means that it could be the result of your data, transformations, plotting, an underlying problem with the package you’re using, or any combination thereof. So make a completely new script where you do the thing that isn’t working in as much isolation as possible. Instead of using your actual data, it’s often a good idea to placeholder data. A dataframe composed entirely of 1s is usually sufficient, unless you’re using a function that depends on you having actual variation in your data. In that case, you can just choose a series of simple placeholder numbers, or get fancy and fill in your data.frame with randomly generated data from a function like Okay, so now you’ve created a simpler context to test your problem. Great. One of two things should have happened: - You are no longer getting the error: Yay! You have something to go off of! Start adding the actual complexity of your problem back in gradually and see at what point it breaks. - You are still having the same problem. This is a sign of a more serious problem. Definitely google any error messages you’re getting, or else a general description of the problem. If you’re not finding anything, then specifically search stackoverflow with the same criteria. Often, the best way to search stackoverflow is to start asking a question and look at the list of potentially related questions it suggests. This also situates you well to ask the question, if none of the related questions answer it! Still don’t have a solution? Congratulations! You have probably found an obscure bug. Sometimes the only option is to dive into the code. This is absolutely a measure of last resort, but following are some thoughts on how to do it as painlessly as possible. Step 2: Get the code. Assuming the thing that’s giving you problems is a function (something that takes input through parentheses and returns output), this is actually pretty straightforward. If you type the name of the function into the console withut the parentheses, it will print out all of the code for that function. As an example, we can try this on a common function:
This tells us that, under the hood,
So, Those were pretty short examples. If you’re lucky enough that the function you’re having a problem with is this simple, you can probably just run it line by line in the console to see what it’s doing and why that’s different than what you expect it to be doing. Would that it were always so simple. Step 3: Search for relevant words. Odds are, the function you’re dealing with is long and complicated and interacts with lots of other things that you have no desire to take the time to understand. Your console should have some sort of search function (often ctrl-F), that will let you enter a string of text to search for. If your problem has to do with a specific argument to the function, try searching for that For example, say I’ve got this function that says I can pass it a string of text and it will print that text below the shape it draws on my plot. But when I try to do that (for instance, by typing
(the rest of the function omitted for brevity) So now I know that little tiny bit of the function is all I really need to worry about to figure out what’s going on. But how? Step 4: Add print statements. R has a lovely function called
In order to edit the function and add print statements, the best thing to do is open a new script file and copy and paste the code for the function into it from your console. You can give it a new name to more easily run it:
In the example above, a first series of print statements might look like this:
Let’s say I run
That tells me that the code got to the first print statement, printed the value of the argument (which was exactly what it should have been), and then didn’t get to either of the other two print statements. Well, that explains why setting In this example, we find out that the code that uses
That means that my_text is only used if That’s one specific example of debugging - what you need to do will vary wildly based on what problem you have to be encountering. However, I think that this overall outline of how to approach debugging hard problems generalizes pretty well from bug to bug, so I hope it can be helpful to others! |
Spatial Ecology @ MSUClick on "Category" below to search for R code compiled by the Zarnetske Spatial & Community Ecology Lab and students in MSU's Spatial Ecology graduate course (FOR870/FW870) Category
All
Archive
October 2016
|