Posts

I am happy to announce that version 1.0 of the PSAboot package has been released to CRAN. This package implements bootstrapping for propensity score analysis. This deviates from typical implementations such as boot in that it allows for separate sampling specifications for treatment and control units. For example, in the case where the ratio of treatment-to-control units is large, one can bootstrap only the control units while always using all available treatment units.

CONTINUE READING

I am about to head home from my fifth time attending the North East Association for Institutional Research (NEAIR), this year in Newport, RI, which was just fantastic. Really great people, interesting talks, and good food. I again taught an Introduction to R and LaTeX for Institutional Research pre-conference workshop and also gave a talk on Propensity Score Analysis for Institutional Research which was an brief version of a workshop I taught at the 2013 useR!

CONTINUE READING

I finally had an opportunity to play with Shiny, and I am very impressed. I have created a Github Project so head over there for the source code. There are a number of ways to distribute Shiny apps. If you are running R (and mostly likely you are if you are reading this), you can download and run Shiny apps using the runApp (if already downloaded), runGitHub, runGist, or runUrl functions.

CONTINUE READING

Frequently I need to recode a date column to quarters. For example, at Excelsior College we have continuous enrollment so we report new enrollments per quarter. To complicate things a bit, our fiscal year starts in July so that July, August, and September represent the first quarter, January, February, and March are actually the third quarter. But sometimes we do need need to report out based upon calendar years (i.e. where January is in the first quarter).

CONTINUE READING

When I went to school we were always taught the “i before e, except after c” rule for spelling. But how accurate is this rule? Kevin Marks tweeted today the following: »@uberfacts: There are 923 words in the English language that break the “I before E” rule. Only 44 words actually follow that rule.« Science — Kevin Marks (@kevinmarks) March 25, 2013 Not sure where he came up with that result, but seems simple enough to verify.

CONTINUE READING

There are many situations in R where you have a list of vectors that you need to convert to a data.frame. This question has been addressed over at StackOverflow and it turns out there are many different approaches to completing this task. Since I encounter this situation relatively frequently, I wanted my own S3 method for as.data.frame that takes a list as its parameter. I should note that it only works with atomic vectors (i.

CONTINUE READING

I posted a question over on StackOverflow on an efficient way of comparing two data frames with the same column structure, but with different rows. What I would like to end up with is an n x m logical matrix where n and m are the number of rows in the first and second data frames, respectively; and the value at the *i*th row and *j*th column indicates whether all the values from row i from data frame one is equal to row j from data frame two.

CONTINUE READING

Version 1.0 of sqlutils has been released to CRAN. The sqlutils package is designed to manage a library of SQL files. This package grew out of the needs of an Office of Institutional Research where the vast majority of analysis is conducted on data from our Student Information System (SIS) which is stored in an Oracle database. A lot of our analyses and reports are derived from the same types of datasets but from easily extracted parameters (e.

CONTINUE READING

I recently taught a very basic introduction to SQL workshop and needed a way to have participants interact with SQL statements. Obviously there are lots of tools to interface with a database, but since we are all R users I thought it would be nice to be able interact without leaving R. Although this interface is fairly basic, the fact that we can type in a SQL statement and get the results as an R data frame provides all the advantages of having data in R.

CONTINUE READING

One issue I continuously encounter when starting to work with a new dataset is that of the codebook. In general, I prefer to load a codebook into R like any other data source, specifically as a data frame. And ideally, one data frame to provides the variable names with descriptions and any other meta data available, and a separate list of named vectors that can be used to recode factors.

CONTINUE READING