Abstract
Long-term daily precipitation records from the US Historical Climatology Network (USHCN) are processed and summarized using R. Below is R code which examines the data from Livermore, CA, the closest available USHCN station to my present home in Fremont, CA.
Introduction
Long-term daily precipitation (rain/snow, and also temperature) records for the United States are available from the United States Historical Climatology Network - USHCN. These are observations of precipitation for typically about 100 years or longer. The USHCN stations are a subset of the larger/denser (but shorter) observation network from the Global Historical Climatology Network - GHCN
I obtained the data for California and other relevant data from here.
R code
Identify the station nearest to your location. For me its Livermore, CA.
rm(list = ls())
workDir <- "C:/Rstuff/DATA/ushcn/"
setwd(workDir)
# id of my nearest station (Livermore, CA), identified from
# 'ushcn-stations.txt'
myId <- "044997"
Following is the format of the data.
# format of the data, from 'data_format.txt'# .... (Each record in a file contains one month of daily data.)# # Variable Columns Type COOP ID 1-6 Character YEAR 7-10 Integer MONTH# 11-12 Integer ELEMENT 13-16 Character VALUE1 17-21 Integer MFLAG1 22# Character QFLAG1 23 Character SFLAG1 24 Character VALUE2 25-29 Integer# MFLAG2 30 Character QFLAG2 31 Character SFLAG2 32 Character . . .
Process precipitation data for your state and extract a subset of data corresponding to your station.
# read data for all stations in the state
allData <- readLines("state04_CA.txt")
# extract station ids from the data
idData <- substr(allData, 1, 6)
# create a new data frame, with ids as the first column of the frame
newData <- data.frame(idData, allData, stringsAsFactors = FALSE)
# extract data corresp to your nearest station
myData <- subset(newData, idData == myId)
myData <- myData[, 2] #throw away the previously added first column
Below function used later to determine the number of days in a month, including leap years.
# function to computes days in a month: input is year and month this is
# the best I could do without downloading external R libraries
FnDaysInMonth <- function(yr, mo) {
date1 <- paste(yr, mo, "01", sep = "-") #current month, day 1
mo2 <- ifelse(mo < 12, mo + 1, 1)
yr2 <- ifelse(mo < 12, yr, yr + 1)
date2 <- paste(yr2, mo2, "01", sep = "-") #next month, day 1
return(as.numeric(difftime(as.Date(date2), as.Date(date1))))
}
Output file to store the data and read it back again for plotting
outFile <- file(paste(myId, ".txt", sep = ""), "wt")
Each line of the data file contains data corresponding to all the days in the month. Discard temperature records and read only “PRCP”. Also, check for the data quality flags.
## each line is 1 month of data
for (eachLine in 1:length(myData)) {
yrVar <- as.numeric(substr(myData[eachLine], 7, 10))
moVar <- as.numeric(substr(myData[eachLine], 11, 12))
metVar <- substr(myData[eachLine], 13, 16)
# only extract precipitation info
if (metVar == "PRCP") {
## for each day of the month , check the data flags and get the data
for (eachDay in 1:FnDaysInMonth(yrVar, moVar)) {
dayOffset <- 17 + ((eachDay - 1) * 8)
metVal <- as.numeric(substr(myData[eachLine], dayOffset, dayOffset +
4))
mflag <- substr(myData[eachLine], dayOffset + 5, dayOffset + 5) #is irrelevant
qflag <- substr(myData[eachLine], dayOffset + 6, dayOffset + 6) #should be blank
sflag <- substr(myData[eachLine], dayOffset + 7, dayOffset + 7) #should not be blank
# write to ouput
if (qflag == " " & sflag != " ") {
writeLines(paste(yrVar, moVar, eachDay, metVal, sep = ","),
outFile)
}
}
}
}
close(outFile)
Read back data for summary graphs
prcp <- read.csv(paste(myId, ".txt", sep = ""), header = FALSE, sep = ",",
as.is = TRUE)
colnames(prcp) <- c("yr", "mo", "day", "val")
prcp$val <- prcp$val/100 #convert hundredths of inches to inches
Graphs …
# yearly total
yrtot <- aggregate(val ~ yr, data = prcp, FUN = sum)
png(filename = "fig1.png")
plot(yrtot$val ~ yrtot$yr, type = "h", main = "Annual Precipitation Total (inches), Livermore, CA",
ylab = "inches/year", xlab = "year")
garbage <- dev.off()
# monthly total
montot <- aggregate(val ~ yr + mo, data = prcp, FUN = sum)
png(filename = "fig2.png")
boxplot(montot$val ~ montot$mo, range = 0, main = "Monthly Precipitation Total (inches), Livermore, CA",
ylab = "inches/month", xlab = "calendar month")
garbage <- dev.off()
# number of rainy days per month, rainy day of rain amount > 0.01 inches
prcp$val <- ifelse(prcp$val <= 0.01, 0, 1)
raindays <- aggregate(val ~ yr + mo, data = prcp, FUN = sum)
png(filename = "fig3.png")
boxplot(raindays$val ~ raindays$mo, range = 0, main = "Monthly Rainy Days, Livermore, CA",
ylab = "days/month", xlab = "calendar month")
garbage <- dev.off()
Summary
This code, when modified slightly, could also be used to read GHCN daily precipitation data.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...