Introduction to Urban Institute Education Data API

The Urban Institute released a public API that pulls and pre-processes data from various sources of education institution data, including but not limited to the Department of Education. We used their R package to explore the relationship between applicant and enrollment volume.

API Documentation: https://ed-data-portal.urban.org/documentation/

To install the R, you must have the devtools library installed.

if(!('educationdata' %in% installed.packages()[,"Package"])) devtools::install_github('UrbanInstitute/education-data-package-r')

Analysis of Applicants, Enrollments in Universities

Among the data accessible through the API, university admission data stands out to me as a special dataset at this moment in time. With recent reporting of racial discrimination in university application acceptance (see: https://www.washingtonpost.com/local/education/internal-harvard-study-suggested-asian-americans-would-benefit-from-academics-only-admissions/2018/06/15/7a07202e-7021-11e8-bf86-a2351b5ece99_story.html?utm_term=.6f10c8bfdbb8), new education-focused researchers might find this data useful as a starting point in evaluating a university’s propensity for admitting students based on their gender or race.

I do not explore this topic; to do so would require much more granular data than this. I am, however, very interested in the relationship between the volume of applicants and enrollments at universities. I understand that universities are starting to drive application volume, and graduating high schoolers are obliging with more applications sent out per person than ever before https://www.usnews.com/education/blogs/college-admissions-playbook/2015/09/09/what-rising-college-application-volume-means-for-the-class-of-2020. I look into this trend to find out what universities are the ‘most competitive’ in terms of applicants/enrollment, and where competitiveness is rising or falling.

suppressMessages({
  library(educationdata)
  library(dplyr)
  library(tidyr)
  library(ggplot2)
  library(scales)
  library(plotly)
  library(knitr)
  library(kableExtra)
})
## Warning: package 'ggplot2' was built under R version 3.5.1
## Warning: package 'plotly' was built under R version 3.5.1
## Warning: package 'kableExtra' was built under R version 3.5.1

We call the API through the R package. We’re looking for admissions-enrollment data, as well as the university directory to retrieve complimentary data.

admissions <- suppressMessages(educationdata::get_education_data(level = "college-university",
                                                                 source = 'ipeds',
                                                                 topic = 'admissions-enrollment',
                                                                 filters = list(year = 2001:2015),
                                                                 add_labels = TRUE))

dir <- suppressMessages(educationdata::get_education_data(level = "college-university",
                                                          source = 'ipeds',
                                                          topic = 'directory',
                                                          filters = list(year = 2001:2015),
                                                          add_labels = TRUE))

print("Import Complete")
## [1] "Import Complete"

The package does not allow us to specify in the filters that we’re not interested in sex or race (also - ‘sex’ is a bit outdated and should be considered for revision by the source, the National Center for Education Statistics Integrated Postsecondary Education Data System).

admissions <- admissions[admissions$sex == 'Total' & admissions$ftpt == 'Total',]
admissions <- admissions %>% select(-sex, -ftpt)

We plot out the number of applicants vs the number of enrollees at each school, animated over time. I like to use plot_ly for these kinds of charts, as their animation allows you to pause on each frame to take a longer look at the chart; in addition, with so many data points, it’s nice to be able to hover and see the names of the schools. We see a general and sensible correlation between the number of applicants and enrollments; generally, there tends to be a higher ratio in private schools than public schools.

admissions %>%
  filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
  inner_join(dir) %>%
  plot_ly(
    x = ~number_enrolled, 
    y = ~number_applied, 
    color = ~inst_control, 
    frame = ~year, 
    text = ~inst_name, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers',
    opacity = 0.8
  )
## Joining, by = c("year", "unitid")

We look at the distribution of the ratios to check what the average is and how many outliers there are.

admissions %>%
  filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
  inner_join(dir) %>%
  mutate(applied_enrolled_ratio = number_applied/number_enrolled) %>%
  ggplot(aes(x = factor(year), y = applied_enrolled_ratio)) +
  geom_boxplot() +
  theme(legend.position = 'bottom')
## Joining, by = c("year", "unitid")

Looking at the worst ten outliers, we see these are typically smaller and/or technical schools.

admissions %>%
  filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
  inner_join(dir %>% select(unitid, year, inst_name, inst_control)) %>%
  mutate(applied_enrolled_ratio = number_applied/number_enrolled) %>%
  arrange(desc(applied_enrolled_ratio)) %>%
  head(n = 10) %>%
  kable('html') %>%
  kable_styling()
## Joining, by = c("year", "unitid")
year unitid number_applied number_admitted number_enrolled inst_name inst_control applied_enrolled_ratio
2004 188669 1289 422 9 THE AILEY SCHOOL Private not-for-profit 143.22222
2012 477039 359 124 3 West Coast University-Dallas Private for-profit 119.66667
2012 233499 1631 1631 14 Saint Pauls College Private not-for-profit 116.50000
2005 413839 270 134 3 ITT Technical Institute Private for-profit 90.00000
2015 484844 9032 190 101 Minerva Schools at Keck Graduate Institute Private not-for-profit 89.42574
2003 404338 2100 1461 25 SCHILLER INTERNATIONAL UNIVERSITY Private for-profit 84.00000
2005 437051 361 193 5 ITT Technical Institute Private for-profit 72.20000
2001 210304 4174 346 58 WARNER PACIFIC COLLEGE Private not-for-profit 71.96552
2010 414878 71 54 1 Trine University-Fort Wayne Regional Campus Private not-for-profit 71.00000
2008 449898 61 1 1 South University-Tampa Private for-profit 61.00000

I wondered if larger schools have an outsized ratio of applicants compared to their enrollments, as the popularity of the brand name may come to outstrip its actual size. This is not true, however; smaller schools - and particularly private schools - have higher ratios of applicants to enrollments. This may be due to more broad brand recognition of private universities, who compete for students all over the country whereas public schools tend to attract more in-state applicants due to in-state tuition.

admissions %>%
  filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
  inner_join(dir) %>%
  mutate(applied_enrolled_ratio = number_applied/number_enrolled) %>%
  plot_ly(
    x = ~number_enrolled, 
    y = ~applied_enrolled_ratio, 
    color = ~inst_control, 
    frame = ~year, 
    text = ~inst_name, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers',
    opacity = 0.8
  )
## Joining, by = c("year", "unitid")

As severely as tuition costs have been rising, it’s important to understand whether we see a supply-side bump in accepting students. Theoretically, given the high marginal revenue of each student, we should see high expansion of enrollments. The increase, however, has been modest at private institutions, but higher at public institutions. Perhaps public universities are able to capture more students who are seeking lower tuitions, while private universities are content to flex their brand for the highest paying students.

admissions %>%
  filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
  inner_join(dir) %>%
  group_by(inst_control, year) %>%
  summarize(total_enrollees = sum(number_enrolled)) %>%
  mutate(first_total_enrollees = first(total_enrollees, order_by = year),
         total_enrollees_index = 100*total_enrollees/first_total_enrollees) %>%
  ggplot(aes(x = year, y = total_enrollees_index, col = inst_control)) +
  geom_line() +
  theme(legend.position = 'bottom') +
  labs(y = 'Enrollment (Indexed to 2001)')
## Joining, by = c("year", "unitid")

As long as private univerties continue to get more competitive, there’s no chance of more affordable tuition rates coming from these schools.

admissions %>%
  filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
  inner_join(dir) %>%
  group_by(inst_control, year) %>%
  summarize(total_enrollments = sum(number_enrolled),
            total_applicants = sum(number_applied)) %>%
  mutate(total_applicants_per_enrollments = total_applicants/total_enrollments,
         first_applicants_per_enrollments = first(total_applicants_per_enrollments, order_by = year),
         total_applicants_per_enrollments_index = 100*total_applicants_per_enrollments/first_applicants_per_enrollments) %>%
  ggplot(aes(x = year, y = total_applicants_per_enrollments_index, col = inst_control)) +
  geom_line() +
  theme(legend.position = 'bottom') +
  labs(y = 'Applicants per Enrollment (Indexed to 2001)')
## Joining, by = c("year", "unitid")

If I’m a local or state policy advisor, I’m recommending a drastic expansion in public universities - and even lower the barrier for out-of-state applicants - to bring in talented people who can’t afford to break into the private school competitive and financial barriers, thus attracting private sector expansion in high-growth industries https://www.bloomberg.com/view/articles/2018-03-06/how-universities-make-cities-great.