The Prairie in Bloom

With the school year over and nice weather upon us, I’ve had some free time to get out and explore the landscape beyond Williston. I’ve been out on the prairie twice to discover the array of early wildflowers in the grasslands. This is the time of year when I feel like I could actually enjoy this place. Unfortunately, it doesn’t last very long.

It’s said that there are only two seasons on the northern plains – a very long winter, and a short summer. Spring and Fall exist, but only for a week. All year round, the wind can blow strong, though there are calm days too. Winter here is tough. We get enough snow to block access to many recreation trails, but not enough snow to take out skis and snowshoes. To be fair, it’s a bit snowier east of here. Summers become very hot and dry. With no forests to take shade in, hiking in the summer heat can be miserable. And then there are the transition seasons. Fall might be more tolerable for hiking. It’s still dry, and the temperatures are more tolerable, but it can be quite windy. Strong winds are just as bad as a strong sun for turning a nice adventure into unpleasantness. Spring is wet. Wet from the snowmelt, and wet from rain. Spring is typically mud season anywhere you go, but here in the Dakota Badlands, the mud is an especially sticky and slick clay that is not only a nuisance to hike through but can be downright dangerous on those badlands slopes. Spring also comes with ticks, though that’s a hazard I can deal with and is not unique to this area.

Continue reading “The Prairie in Bloom”

A Spiritual Moment

My trip to the Redwoods had a profound effect on me. I can’t really explain what it was or why. I’ve been to many amazing and beautiful places never come back as humbled and rejuvenated as I had on this last trip. But everything about it just put me at ease and at awe. It was a combination of spending time with the kids while experiencing a mature old-growth forest among some of the oldest and tallest trees on the planet. I came back to Williston ready to take on the world and maybe inspired to leave in search of better country.

And then, not even a week after my return:

Continue reading “A Spiritual Moment”

What’s your travel style?

When I think about the way I travel and the way others travel, there are two extremes at the ends of a spectrum: on one side, we fit in everything we can see during our limited time at a destination; on the other side, we stick to a small area and get to know it really well. Where do you typically fit on this spectrum?

When we book trips to major destinations, there is extensive planning involved as well as the expense of the trip. On top of that, there are seemingly infinite places in the world that we would like to visit someday. So, trips to major destinations might end up being or seeming like once-in-a-lifetime experiences. Thus, we have the temptation to take advantage of our limited time at that location and see all of the sights. The advantage is that we don’t miss out on anything. The disadvantage is that we only get to dabble our toes in the proverbial waters. It can sometimes feel superficial.

On the other end of the spectrum, we could stay in one place and explore it deeply. Yes, you don’t get to hit up all of the touristy highlights, but what you get instead is a deep connection to that one local place. Maybe it’s getting to know the food and drink, maybe it’s really diving into the history, or maybe it’s getting to know the plants and animals and geology. I’m not making any judgments regarding any location along this spectrum. Sometimes the purpose of our travel dictates where we put that trip. And sometimes outside circumstances decide that for us.

Continue reading “What’s your travel style?”

What have I been up to?

Sunset on the Prairie

It has been quite some time since I last made a post here, so I’d say I’m a bit overdue for an update. For the past two and a half years, I have been living in northwestern North Dakota. Saying this is an adjustment is an understatement. North Dakota is considerably flatter than any place I’ve lived, and there are almost no trees here. Winters are extremely harsh and long. We can have weeks where the daytime high does not exceed 0º F, and the wind will send a chill to your core. Our corner of the state doesn’t get much snow compared to northern Idaho or eastern North Dakota, but when we do get snow, the wind blows it into drifts which can shut down the highways. And winter can persist until May without any sign of spring. Summer has the opposite problem. The wind still blows, but without much shade, the days are hot and dry, except when gnarly thunderstorms roll through dropping large hail and threatening tornadoes. Ok, that last part sounds scary and it is when it happens, but we aren’t technically in Tornado alley, so those events are relatively rare.

Continue reading “What have I been up to?”

Fourth of July

View on Grandmother Mountain

For many years, I have spent my Fourth of July basking in the part of America that I enjoy the most: its wild and natural beauty. It started in 2011 when I explored the Hobo Cedar Grove for the first time. Then again in 2013 when I hiked Grandmother Mountain. In 2015, I spent the fourth in the Seven Devils with friends. This year, I returned to Grandmother Mountain for what may be my last visit to one of my favorite peaks in the vicinity of Moscow.

Continue reading “Fourth of July”

Why you’re working from home, Part 2: A Shiny Model

Immediately after I published my last post, I wasn’t content with the manner in which I conveyed the SIR model. Simply posting graphs from scenarios that I ran isn’t exciting. It’s passive, and it doesn’t actively demonstrate for the reader how social distancing does work to reduce infection rates. I wanted something interactive. Something that you, my readers, can play with. So I built the model in Shiny.

Shiny is a tool for R that makes data visualization interactive. I had never used Shiny. But with a few hours of reading the introductory tutorial, I had my own custom application built with a basic SIR model. And I’m ready to share it with you.

As a reminder, here is our basic model, graphically and mathematically with a description of the parameters:

 \frac{dS}{dt} = -\frac{\beta I S}{N} \newline \frac{dI}{dt} = \frac{\beta I S}{N} - \gamma I \newline \frac{dR}{dt} = \gamma I

The parameters are \beta, a composite that includes the probability any one person interacts with another and the probability that successful transfer occurs with an interaction; \gamma, a composite that includes the probability an infected individual recovers or dies (from the disease or naturally); S, I, and R, the number of susceptible, infected, and removed individuals; and N, the total population size which should be equal to S + I + R.

Here’s how you can play with my interactive model. If you are an R user, grab the code here: https://github.com/matthew-singer/ShinyToys. The file you want is called sirapp.r. You will need to install the shiny package, but it’s worth having.

If you are not an R user, you can play with the app which is hosted here: http://mineral2.shinyapps.io/SIRModel. If you are an R user, please download my script from the first link and run it locally because I only get 25 hours of active app time with my shinyapps.io account, and I’d like it to be available for educational purposes to non-science people.

Remember that social distancing acts to reduce the value of \beta. As you play with different scenarios, note that as \beta increases or decreases, what happens to the Infectious curve (Red)? Note the approximate time that infections peak, as well as the quantity of infections at peak.
How long does it take for the disease to disappear from the population? How does population size affect the response?

Remember that this isn’t an accurate model of COVId-19. It is a basic and generic model of infectious disease spread. However it is still useful in understanding how our collective and individual behaviors can affect the way diseases spread through the population.

Why you’re working from home: An introduction to epidemiological modeling

The COVID-19 virus is sweeping the world causing an equally contagious pandemic of fear and confusion. Depending on where you live, you may be ordered to stay home, going out only when necessary, or there may be no restrictions on your life, leaving it up to you to decide how to go about your day during this tumultuous time. Two ideas keep popping up in social media: social distancing and flatten the curve. These often come with memes and infographics explaining why staying home and staying away from other people can help control the spread of this epidemic. I thought I would take a different approach. This post discusses the origin of these ideas by exploring where the curve comes from and just how social distancing influences it. I am going to talk about epidemiological modeling, or how we use math to model and predict the spread and eradication of diseases in a population. Bear with me as there will be math, but I am going to try and make this easy to follow for the non-biologists reading this.

What we are about to create is a dynamic state change model. All that means is individuals exist in different states which can change over time with a certain probability or rate. Let us suppose we have a population of individuals and a new disease gets introduced. We have two states: Infected or Not Infected. Our model is going to explain how people who are not infected become infected and how infected people become not infected. Now we are not so much concerned with the actual mechanisms of infection or disinfection so much as the rates at which changes in state occur.

Depending on the disease, we can further break down the Not Infected state. People can be not infected because they have never come in contact with the disease, in which case we can call them Susceptible. But people can also be not infected because they had the disease and recovered. And if recovered individuals are immune to re-infection, they are not susceptible. So instead we will call them Removed because they can no longer get or transmit the disease. This is how many common diseases work, and to our knowledge, how COVID-19 works. Let us draw this out graphically.

Three states in our model: Susceptible (S), Infected (I) and Removed/Recovered (R) as well as the order of transition between each.

In this model, Susceptible individuals can become Infected, and Infected individuals can become Removed or recovered. From here onward, I am going to refer to this state as Removed because recovery isn’t the only way to get into this state. We will discuss that soon enough. It is worth noting that an individual can go straight from Susceptible to Removed as well, and that is one way epidemiologists use this model to estimate the minimum number of people needed to be vaccinated to prevent an epidemic spread of disease. However, we are going to keep our model simple for now and follow the flow through the states as described.

To form our model, we have to make some assumptions, each of which can be relaxed once the basic model is finished. We are going to assume that the population size doesn’t change, and we are going to assume that everyone is moving about randomly such that any individual as the same probability of coming in contact with any other individual. Thus, there is no geographical limitation. Finally, we’re going to assume that our population is isolated from all other populations.

So we have a population with N individuals. And if nobody is infected, then everybody is Susceptible. But as long as nobody is infected, there is nobody in the population to spread the disease. But as soon as one or more individuals become infected, they can spread the disease to a susceptible individual turning them into an infected individual. The rate at which this happens depends on the number of infected individuals (I) in the population and the probability that an encounter between infected and susceptible individuals results in the successful transfer of the disease. This is called Transmissibility and will be represented as part of the term. also includes the rate of encounters among people in general. Susceptible people turn into Infected people at the rate of \beta I.

Susceptible people can also recover or die from the disease, effectively removing them from the model. This happens at a probability of \gamma, and the rate at which infected individuals become removed can be modeled as \gamma I. Here is our diagram again with the transition rates.

Our model with the transition rates.

So we are at time t, and we want to know the number of people in each state at the next time step, t+1. For most diseases, we model time steps as days.

S_{t+1} = S_{t} - \beta I_{t} \frac{S_{t}}{N}

Because infected individuals can only infect susceptible individuals, we expect that the population of susceptible individuals to decrease with each day. We are multiplying the transition rate by \dfrac{S}{N} because we need to account for the probability that an infected individual contacts a susceptible individual. We can re-write this equation to give us the change in state S as follows:

S_{t+1} - S_{t}  = -\frac{\beta I S}{N}

or

 \frac{dS}{dt} = -\frac{\beta I S}{N}

Now let’s look at how the number of infected and removed individuals will change. Infected individuals should grow as susceptible individuals become infected, but we’ll also have to subtract the number of susceptible individuals that become removed and add them to the removed state.

 \frac{dI}{dt} = \frac{\beta I S}{N} - \gamma I \newline \frac{dR}{dt} = \gamma I

Now we have three equations describing the change in population in each state. Before we look at what this means, I want to pay special attention to the second equation, the one that describes the change in infected individuals. This is a classic birth-death model in which the first term represents “births” as new people get infected and the second term represents “deaths” as people die or recover from the disease and get removed from the model. Some information describing the COVID-19 spread makes reference to a term called R_O (pronounced R-naught). This is the intrinsic rate of growth, or the average number of people an infected individual transmits the virus to. R_O comes from this model:

 R_O = \frac{\beta}{\gamma}

When R_O > 1, the disease is spreading. When R_O<1, the disease is in decline.

Let’s look at how this model can be applied to a population. Let’s suggest we Let’s look at how this model can be applied to a population. Let’s suggest we live in a city of 1 million people. Ten of them have confirmed cases of the disease. I have set \beta = 0.9 and \gamma = 0.3. The curves in the graph below represent the total number of individuals in each state on each day.

Note that the number of infected cases rises exponentially until it hits a critical point. This is where there are so many infected or recovered individuals that the disease has a hard time finding susceptible individuals to spread to. Eventually, the number of infected individuals declines and the disease is eradicated from the population. Not everyone was infected – a small minority of lucky individuals made it through the epidemic without catching the disease. It’s also worth noting that it took 26 days from only 10 infections to reach a peak infection of 338,660, or just over 1/3 of the total population infected at one time. Now imagine that 20% of infected individuals required hospitalization. That’s about 70,000 people. Cities of 1 million people do not have 70,000 hospital beds.

Now, these parameters that I have chosen are simply hypothetical and do not represent the actual parameters of COVID-19. I’m not even sure what those parameters are, but this model assumes that 30% of infected patients will recover in one day, when COVID-19 recovery times are more like 10-14 days from the time symptoms present. I could show that by reducing \gamma to 0.03 to suggest that 30% of infected patients will recover after 10 days. Doing so exaggerates the lag in the removed growth curve, but also highlights that the infection would spread to be even more prevalent.

Results of the model with beta=0.9 and gamma=0.03

In this scenario, we reach peak infection at 24 days with a total of 859,948 or 86% of the population infected. Nobody gets spared from the disease.

Social Distancing – Why you’re working from home.

When we say “flatten the curve,” we’re talking about the red curve in each of these graphs. The curve of number of infected over time. How does social distancing do this? Well, remember that our parameter \beta represents not just the transmissibility, but also the probability of encounters occurring? If we stay home, we reduce the probability of interacting with other people, and thus reduce the parameter \beta. Let’s go back to our first example, where \gamma = 0.3. We hit peak infection of 338,660 on day 26. Let’s suppose social distancing measures were put in place forcing us to stay home as much as possible, reducing \beta to 0.4.

Flattening the Curve

In this scenario, we have “flattened” that red curve. The peak infection takes place on day 102 with a peak infected population of 35,253. If 20% of infected individuals required hospitalization, we’d only need 7,000 hospital beds, which might be realistic for a city of 1 million. The other benefit here is that over half of the population never gets sick. Now again, this supposes that 30% of infected individuals on one day can recover by the next. But it does show how social distancing can work to reduce the strain on healthcare professionals as well as reduce the number of cases of infection in general.

The one cost to social distancing and “flattening the curve” is that it delays the peak infection rate of the disease and delays its eventual eradication. If we prematurely go back to business as usual, as Donald Trump has expressed his desire, cases of COVID-19 will rise faster than we’re seeing today, and our healthcare system will be overrun. Not only will COVID-19 patients not get the required treatment they’ll need to avoid a fatal outcome, but they’ll be utilizing resources that non-COVID-19 patients need for their survival as well. Car crashes, heart attacks, strokes, cancer, other infections and diseases – these will continue to prevail and people won’t get the care they need.

Nobody really knows how long we’ll have to practice social distancing. In my toy model, the peak infection occurred 102 days after the initial 10 person infection. That’s 3 months. And my model doesn’t consider the lag time for recovery, which will not only require more stringent social distancing action, but will offset the time to peak infection even more. It hasn’t been two months since the first case appeared in the United States, and some communities are just now seeing their first cases. In my toy model, the disease is effectively eradicated from the population by day 244. But at day 210, there are 10 individuals left. If social distancing restrictions were lifted at that point, the disease would resurge, slightly worse than the original surge with social distancing in place, but not nearly as bad as if no social distancing were put in place at all.

The real-world data with respect to COVID-19 is messy. For one, we don’t really have good estimates of \beta under normal conditions or social distancing. In the US, social distancing measures aren’t unified, and have been put in place at different points of the epidemic spread for different communities. But the biggest piece making this difficult to predict is the actual number of cases out there. We can guess by adding the 2 week lag from initial exposure to time when symptoms appear. But also, we have been slow to roll out any kind of extensive testing. Most patients are told to go home and self quarantine as though they had the disease, with only a small portion of cases getting tested and confirmed positive. Combine that with the number of people who aren’t taking the disease or social distancing measures seriously and we can only estimate the worst and best case scenarios and update model parameters as more data comes in. But we cannot relax our social distancing measures until the bulk of the epidemic has passed, and that could be months, if not a year, from now.


The model I presented today is called the SIR model for the three states and transitions it describes. It’s a simple, but useful model for understanding the spread of infectious disease. But variants exist to cover the gamut of disease behavior. If there is no recovered immunity, the model is simply an SI model in which Susceptible individuals become Infected, and then become Susceptible again. If a disease is incurable – carried for life, the R portion of the model is modified to only include natural death. We can even account for the spread of pathogens from the deceased to the living, as happened when Ebola broke out in Africa. For diseases like COVID-19 which have age-structured mortality and/or susceptibility, age structure in the population can be built into the model. And of course no population is an island. Our hypothetical city of 1 million has people coming and going from other cities, and we can link models in a metapopulation with migration rates between each subpopulation. We can even include geographic probabilities of encounters to show that a person on one side of a city is unlikely to come in contact with a person on the opposite end. While these complicate the math a bit, they can give us finer scale predictions as to how and where a disease might spread.

These epidemiological models are also used to understand vaccinations. Vaccines are one way of transitioning from susceptible to removed without traveling through the infectious state. From these equations, we can predict the proportion of a population needed to be vaccinated to prevent the spread of a disease and eventual epidemic from forming. This way, the portion of the population that can’t get vaccinated may also stay safe. Remember when the measles was on the rise again after a wave of anti-vaccination fever spread rampant? It’s because the vaccination rates fell below the necessary numbers required to stave off the growth of measles.

You can learn more about epidemiological modelling here, but it gets a bit technical and math dense. The point of this post was to show the origins of the ideas behind “flattening the curve” and social distancing. It does work, but it requires us all to buy in and participate. And we’re in it for the long haul. If you still don’t understand, I’ll be happy to try and explain it over a chat.

If you want to play with your own SIR model, here’s some R code to get you started:

#===========================
# SIR Model of Infectious Disease

SIR <- function(S,I,R,beta,gamma,T=100){
  i = 1
  N = S+I+R
  
  while(i<T){
    
    dS = -1*beta*S[i]*I[i]/N
    dI = beta*S[i]*I[i]/N - gamma*I[i]
    dR = gamma*I[i]
    
    S[i+1] = S[i] + dS
    I[i+1] = I[i] + dI
    R[i+1] = R[i] + dR
    
    i=i+1
  }
  
  return(data.frame(Time=1:T,S,I,R))
}

Sand Mountain Trail

I don’t get out hiking or geocaching often these days. With geocaching, it makes sense. I’ve found nearly all of the geocaches in a close distance to home and town, forcing me to travel farther distances just to make a find. But when it comes to hiking, I have less of an excuse. I don’t live in Moscow. I live near Deary, 25 miles east, which puts me 25 miles closer to the mountains. It puts me at the edge of the mountains, the foothills if you prefer. There are hiking trails all around. The closest is spud hill, from which there is an amazing view from the top. Then there’s the Potlatch River loop with great opportunities for flora and fauna sightings. There are more trails yet back in the Vassar Meadows area and up near Palouse Divide. I don’t have to go very far to get a nature fix. And yet, what keeps me from going out is more of a psychological barrier than a physical or economic one.

Continue reading “Sand Mountain Trail”

Geocaching

In case you weren’t aware, Geocaching is one of my hobbies turned obsession that fills my life with joy. Geocaching is a game in which people hide containers and post the coordinates on the web for others to enter into a GPS and go out and find. The game began in May of 2000. On May 2, the US Government declassified signals from the GPS satellites making them available to the public. This increased the accuracy of commercial GPS receivers from around 100m down to 10m. The next day, Dave Ulmer hid a stash in the woods outside of Portland, OR and posted the coordinates to a listserv. A few days later, it was found, and it didn’t take long for this idea to catch on. Within that first year, several hundred geocaches had been hidden world-wide with their coordinates posted for others to find. The largest repository of geocaches is hosted at geocaching.com. It’s free to play, and today, the game doesn’t even require that you have a GPS receiver since smartphones are able to communicate with the GPS satellites.

I was formally introduced to the game in 2007 when I created a lesson on GPS orienteering for a summer camp at the Max McGraw Wildlife Foundation. The activities culminated with me hiding containers around the property and having the kids wander around with some GPS receivers to find them. But it wasn’t until 2011, when I was able to buy my own GPS receiver, that I really started geocaching. Since then, I’ve accumulated 1663 finds, mostly around the inland northwest. It’s the perfect hobby, as it pairs well with my love of hiking, travelling. I’ve explored so many unique places that I would have overlooked if geocaching hadn’t brought me there.

A map of my geocache finds in the northwest.

Geocaching is a spatial game that accumulates a lot of data. Geocaching.com aggregates personal statistics for each user – you can see mine here. However, as fun as it is to analyze my own caching behavior, I’m also interested in larger questions about the game. For example, what places are more active in the game than others? While the game isn’t about the numbers, the numbers can tell us a lot about the game. Defining how a place is geocaching-friendly isn’t easy, and there are a lot of variables to consider. Unfortunately, I don’t have access (at least not easily) to the full data hosted at geocaching.com. So I have chosen the number of geocaches as a proxy for how active an area might be. And my definition of area is going to be at the level of state, because I can easily grab the total number of active geocache hides in each state from state regional searches on the website.

What states or regions of the country are most active in geocaching? Here is my data. The number of caches was collected manually using regional searches on the evening of June 12. The state area and population size were gathered from Wikipedia, and population is a 2018 estimate. Perhaps we’ll do this again when the 2020 census numbers get released.

State sq mi Geocaches Cache Density Population* Population Density
Alabama 52420.07 14826 0.28 4887871 93.24
Alaska 665384.04 7591 0.01 737438 1.11
Arizona 113990.30 39083 0.34 7171646 62.91
Arkansas 53178.55 12014 0.23 3013825 56.67
California 163696.32 132475 0.81 39557045 241.65
Colorado 104093.67 26075 0.25 5695564 54.72
Connecticut 5543.41 8008 1.44 3572665 644.49
Delaware 2488.72 2890 1.16 967171 388.62
District Of Columbia 68.34 248 3.63 702455 10278.83
Florida 65757.70 41034 0.62 21299325 323.91
Georgia 59425.15 14254 0.24 10519475 177.02
Hawaii 10931.72 2442 0.22 1420491 129.94
Idaho 83568.95 18557 0.22 1754208 20.99
Illinois 57913.55 30905 0.53 12741080 220.00
Indiana 36419.55 21611 0.59 6691878 183.74
Iowa 56272.81 21753 0.39 3156145 56.09
Kansas 82278.36 11676 0.14 2911505 35.39
Kentucky 40407.80 16920 0.42 4468402 110.58
Louisiana 52378.13 5663 0.11 4659978 88.97
Maine 35379.74 9388 0.27 1338404 37.83
Maryland 12405.93 12116 0.98 6042718 487.08
Massachusetts 10554.39 16871 1.60 6902149 653.96
Michigan 96713.51 34800 0.36 9995915 103.36
Minnesota 86935.83 29094 0.33 5611179 64.54
Mississippi 48431.78 7136 0.15 2986530 61.66
Missouri 69706.99 13898 0.20 6126452 87.89
Montana 147039.71 8059 0.05 1062305 7.22
Nebraska 77347.81 9602 0.12 1929268 24.94
Nevada 110571.82 22599 0.20 3034392 27.44
New Hampshire 9349.16 11436 1.22 1356458 145.09
New Jersey 8722.58 14081 1.61 8908520 1021.32
New Mexico 121590.30 17722 0.15 2095428 17.23
New York 54554.98 32343 0.59 19542209 358.21
North Carolina 53819.16 22174 0.41 10383620 192.94
North Dakota 70698.32 3020 0.04 760077 10.75
Ohio 44825.58 28631 0.64 11689442 260.78
Oklahoma 69898.87 14428 0.21 3943079 56.41
Oregon 98378.54 31066 0.32 4190713 42.60
Pennsylvania 46054.35 39347 0.85 12807060 278.09
Rhode Island 1544.89 3613 2.34 1057315 684.40
South Carolina 32020.49 5701 0.18 5084127 158.78
South Dakota 77115.68 9354 0.12 882235 11.44
Tennessee 42144.25 18136 0.43 6770010 160.64
Texas 268596.46 67902 0.25 28701845 106.86
Utah 84896.88 29369 0.35 3161105 37.23
Vermont 9616.36 4988 0.52 626299 65.13
Virginia 42774.93 15162 0.35 8517685 199.13
Washington 71297.95 29634 0.42 7535591 105.69
West Virginia 24230.04 5708 0.24 1805832 74.53
Wisconsin 65496.38 29874 0.46 5813568 88.76
Wyoming 97813.01 5772 0.06 577737 5.91

There is tremendous variation in the number of geocaches hidden in each state. Excluding the District of Columbia, with only 248 active hides, the state with the fewest geocaches is Hawaii at 2442, and on the mainland, it’s Delaware at only 2890. Meanwhile, California leads the way with the most geocaches hidden at 138,475. That’s quite a spread, though the median number of hides per state is 14,826.

The first question that comes to mind is whether the number of hides is limited by the size of the state. After all, California is a huge state. Delaware and Hawaii are pretty small. And given that people who run Geocaching.com have set a rule that geocaches must not be placed within 0.1 mile of another geocache, there is a limit to the number of caches that can be hidden in a finite area.

The number of geocaches in each state as a function of the size of the state. Both axes are log transformed.

There is certainly a trend here. Larger states, on average, have more geocaches. Though the larger the area, the larger the variation in hide count. Without that D.C. outlier, we still have a positive slope, though it’s less steep. To gage just how much area influences the number of caches, we should look at the density, or the number of caches placed per square mile. We are looking for no effect of area on density of hides.

Cache Density as a function of state size. Axes are log transformed.

What we see is a negative association between the density of caches placed and the area of the state. So even though smaller states have fewer caches, overall, they are more densely packed with geocaches. In some ways, this makes sense. The large states of the western half of the country have a lot of open land, public and private. Some of this land, including national parks and designated wilderness areas, is off limits to physical geocaches. Some of this area is just difficult to get to. And some of it is private property – big ranches and farms where the public doesn’t have permission to trespass. When you look at states like Colorado, Montana, California, you’ll notice that geocaches are densely packed into cities with fewer caches in rural areas. Though still, in popular hiking areas, there are still a lot of geocaches hidden along the trails.

But small states have rural places, too. So wouldn’t this affect the density of geocaches? Well, yes. But maybe these small states have less rural land than the larger states. The rural land is broken up into smaller parcels with more public right of ways with which to put a geocache. Or maybe it’s not about the land at all, but about the people living there. Perhaps more people just means more geocachers, which means more geocaches being hidden in a given area.

Cache Density as a function of Population Density by state. Both axes are log transformed.

Here we see our tightest trend. The geocache density appears to be explained rather well by population density. This would also explain why urban areas see so many more caches than rural areas, even within a state. There are still differences among urban centers as to the density of caches, and that may also be explained by population size. Or maybe geocaching is more popular in cities with a higher aptitude for an outdoor lifestyle. Denver, Salt Lake City and Seattle are all some of the densest cities when it comes to geocaches. Perhaps I will find a way to aggregate such data for analysis. But at the state level, the number of people per square mile nicely explains the number caches hidden per square mile.

Let’s look at some maps.

Number of Caches hidden in each state.
Density of caches in each state.
Population density of each state.

If you’re an avid geocacher, and you want an active geocaching community, where should you live? Well, if we define active geocaching community solely by the number of geocaches placed, it appears that the northeast coast is the place to be. While these states are small and each have a small number of geocaches hidden, collectively, it is the densest area for geocache placement. And this seems to be driven by population density.

Of course, there are more variables to consider. The number of geocaches doesn’t always represent the quality of geocaches. Many people hide film canisters in lamp post skirts in a parking lot. The first time I saw one of these, it was neat. But after a hundred of them, it gets old. Many are placed for the sake of being placed, rather than bringing people to a unique and special area. Judging the quality of hides from numbers is itself a difficult task. Geocaching.com does have a system by which premium members can award favorite points to deserving caches, and this might be one method by which we can estimate the quality of a hide. The number of unique geocachers that are actively hiding and searching in an area will also determine how active a community is with the game. Lewiston and Clarkston once had over 400 hides in a four mile radius. Over half of them were owned by 3 prolific geocachers who have since archived their hides and left the area. Gathering and aggregating data on users is out of my ability at the moment.

And power trails can skew the numbers. These are caches placed 0.1 mile apart along a road or trail for the sole purpose of enhancing find counts. They are typically not quality hides, though a few trails on rural roads do take you into some scenic locations. The famous ET Highway in Nevada boasts over 2000 geocaches. And one prolific hider made power trails all over northern Nevada with over 20,000 hides. They have since been archived, and the state’s hide count dropped considerably.

On the other side of the distribution, South Dakota might be the worst state for a geocacher. With only 3020 caches to find, you’ll quickly be driving long distances if you want to stay active in the game. Montana and Wyoming aren’t much better. Alaska has the lowest density, and rightly so. Of the 7591 caches spread amongst this large state, most are concentrated around Anchorage and the Kenai Peninsula. If you live in this area, there are enough caches to keep you busy for some time. If you live in the small, isolated villages further north, you may quickly find yourself out of a hobby, and even your own hides may only get occasional finds through the years.

This analysis isn’t perfect, but it does give us some insight into where geocaches are hidden in the United States, and what states might be better to live if geocaching is a major part of your life. But don’t read too much into it. Despite Montana’s low ranking on cache density, Missoula, Helena, and Great Falls are all great cities for geocachers, as are Boise, Spokane, and Coeur d’Alene. I wonder, at the city level, how they compare with other comparable cities in other regions, and how they stack up against their larger cousins. That’s a project for another day.

Adventures in Sourdough

When life prevents you from going out and adventuring, you make your own adventures at home. My latest adventure is making sourdough. Now, I could go out and obtain or buy a starter from a local bakery, but what’s the fun of that? It’s so easy to start my own from scratch, and now I have one that I can truly call my own. My guide for making the starter and baking my first batch of bread comes from King Arthur Flour’s Sourdough Baking Guide.

This all started about 3 weeks ago. I had some whole wheat flour sitting around and wasn’t sure how well it would start, given its age. But I mixed up the 1 cup of flour and half-cup of water and let it sit overnight and sure enough, there wasn’t much activity. But that’s normal. I fed it, and on day two, there were bubbles. Success. So I kept feeding it. Third day, more bubbles, and a ripe odor. Fourth day, still going strong. It was time for two feedings a day. But my starter was growing big. Ooops, I misread the instructions and threw out half a cup each day instead of saving half a cup. So I switched to that motif, keeping half a cup and feeding it, discarding the rest. Activity stopped. It no longer tasted tangy, nor smelled of fermentation. I kept feeding it anyway, hoping it would recover. It didn’t. It wouldn’t rise. Was it too cold? Did I screw something up by not properly feeding it originally? Probably not, but I also wanted to ensure success. So I dumped it and began again.

Adventures in Sourdough, take 2.

When baking, it’s important to get your flour to water ratios right or the dough won’t have the right consistency. I had been measuring my flour and water and starter by volume, which is not really an accurate gauge of how much material I was working with. So I ordered myself a kitchen scale. It arrived and this time I was determined to be more precise. Now I doubt that measuring by volume caused my previous attempt to flop – yeast and bacteria are both hardy creatures, and altering the moisture content can favor one over the other to control the flavor and intensity of your starter. But I was determined to be a bit more precise and deliberate in my measurements.

I weighed out 113g of whole wheat flour, and 113g of water, mixed it together and let it sit overnight. Again, no activity, but with a feeding of 113g of starter, 113g of flour, and 113g of water and another 24 hours of rest, the starter was on track just as before. I also decided to incubate the starter in the oven with the light on for a few hours, just in case room temperature on top of the fridge wasn’t in the ideal range. By day 4, it was starting to rise, and by day 6, it was consistently doubling in volume. In fact, it liked that environment so much that one day it overflowed the jar. I guess it’s a little too warm in the oven.

Friday night, I came home from game night and fed the starter, but instead of keeping the 113g, I expanded each ingredient to 200g so I’d have a total of 600g of starter in the morning – 113 I’d take out to continue culturing, and the rest I’d use to make bread. I followed the Naturally Leavened Sourdough recipe.

To me, breadmaking is zen. There’s a certain satisfaction that comes with mixing the flour and water, kneading, watching it rise, shaping the dough, and tasting the fruits of your labor. I love making pizza for the same reasons. But pizza is cakewalk compared with bread. It doesn’t require as much kneading, only rises once, and you’re stretching it out to put in the oven with the intent of it cooking flat. With bread, you want a rise, and I was pleasantly surprised when my dough rose during each of the resting periods. My natural yeast culture was alive and well and doing its job.

But there is still much to improve. I shaped the bread in bowls because I don’t own a proper bread shaping basket. I mean, a proper basket isn’t really necessary. But in the past, I would shape my loaves and let them rest on the counter, and without the support of the bowl, they would eventually spread out as the gluten relaxed, and when it was time to put them in the oven, they wouldn’t rise upward nearly as much as I wanted them to.

When shaping a loaf in a bowl, the bowl is often lined with a floured cloth. I grabbed the least-textured cloth towels we had, floured them and lined the bowls before dropping the dough in. At the end of the rise, I dumped the dough out, but it stuck to the towels. I mean, really stuck. I’m still trying to get dough out of them. Grr. The loaves deflated and I was pissed. Bummed. I wasn’t sure what to do. So I tried just greasing the bowls and dropping the dough in after reshaping them. I gave them another 2-hour rise and they came out just fine. Unfortunately, they deflated a bit upon scoring and the loaves didn’t puff up so much as out during baking.

My first sourdough loaves.

Now, let me just say that the bread came out of the oven smelling like heaven. After cooling, I couldn’t resist a taste. By all accounts, I think I had success. I had bread, and it tasted great. So it didn’t balloon up all big. So it’s not quite shaped right for any kind of sandwiches. Toast it up with some butter and it goes well with soup, or just on its own. In fact, even though these loaves weren’t ideal for sandwiches, I still stuffed cheese between two slices and grilled them up for the best damn toasted cheese sandwich I have ever made. God, it was heavenly. I made that for breakfast yesterday with two eggs on the side. I had another one this morning for an early lunch.

Making a grilled cheese with my sourdough.

My starter is now living in the fridge where I only have to feed it once a week. I don’t think I’ll be making bread on the regular, maybe every 2 weeks or so, though there are some recipes I’d like to try that use the discard from each feeding. For my next loaf, I may go with a no-knead recipe in which the entire gluten formation process occurs slowly through a process called autolyzation. I also need to work on getting that bread to hold its shape and rise properly in the oven. Some sources suggest the problem stems from letting the dough over-proof during the shaping stage. It rose too much and thus deflated when removing it from the bowl and scoring it. I might also shape it, or just bake it in a sandwich loaf pan just so I have bread that I can make sandwiches with. For now, I have made my first steps into the world of sourdough baking, and I’m eagerly looking forward to mastering the art of breadmaking.