The Urban Institute released a public API that pulls and pre-processes data from various sources of education institution data, including but not limited to the Department of Education. We used their R package to explore the relationship between applicant and enrollment volume.
API Documentation: https://ed-data-portal.urban.org/documentation/
To install the R, you must have the devtools library installed.
if(!('educationdata' %in% installed.packages()[,"Package"])) devtools::install_github('UrbanInstitute/education-data-package-r')
Analysis of Applicants, Enrollments in Universities
Among the data accessible through the API, university admission data stands out to me as a special dataset at this moment in time. With recent reporting of racial discrimination in university application acceptance (see: https://www.washingtonpost.com/local/education/internal-harvard-study-suggested-asian-americans-would-benefit-from-academics-only-admissions/2018/06/15/7a07202e-7021-11e8-bf86-a2351b5ece99_story.html?utm_term=.6f10c8bfdbb8), new education-focused researchers might find this data useful as a starting point in evaluating a university’s propensity for admitting students based on their gender or race.
I do not explore this topic; to do so would require much more granular data than this. I am, however, very interested in the relationship between the volume of applicants and enrollments at universities. I understand that universities are starting to drive application volume, and graduating high schoolers are obliging with more applications sent out per person than ever before https://www.usnews.com/education/blogs/college-admissions-playbook/2015/09/09/what-rising-college-application-volume-means-for-the-class-of-2020. I look into this trend to find out what universities are the ‘most competitive’ in terms of applicants/enrollment, and where competitiveness is rising or falling.
suppressMessages({
library(educationdata)
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
library(plotly)
library(knitr)
library(kableExtra)
})
## Warning: package 'ggplot2' was built under R version 3.5.1
## Warning: package 'plotly' was built under R version 3.5.1
## Warning: package 'kableExtra' was built under R version 3.5.1
We call the API through the R package. We’re looking for admissions-enrollment data, as well as the university directory to retrieve complimentary data.
admissions <- suppressMessages(educationdata::get_education_data(level = "college-university",
source = 'ipeds',
topic = 'admissions-enrollment',
filters = list(year = 2001:2015),
add_labels = TRUE))
dir <- suppressMessages(educationdata::get_education_data(level = "college-university",
source = 'ipeds',
topic = 'directory',
filters = list(year = 2001:2015),
add_labels = TRUE))
print("Import Complete")
## [1] "Import Complete"
The package does not allow us to specify in the filters that we’re not interested in sex or race (also - ‘sex’ is a bit outdated and should be considered for revision by the source, the National Center for Education Statistics Integrated Postsecondary Education Data System).
admissions <- admissions[admissions$sex == 'Total' & admissions$ftpt == 'Total',]
admissions <- admissions %>% select(-sex, -ftpt)
We plot out the number of applicants vs the number of enrollees at each school, animated over time. I like to use plot_ly for these kinds of charts, as their animation allows you to pause on each frame to take a longer look at the chart; in addition, with so many data points, it’s nice to be able to hover and see the names of the schools. We see a general and sensible correlation between the number of applicants and enrollments; generally, there tends to be a higher ratio in private schools than public schools.
admissions %>%
filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
inner_join(dir) %>%
plot_ly(
x = ~number_enrolled,
y = ~number_applied,
color = ~inst_control,
frame = ~year,
text = ~inst_name,
hoverinfo = "text",
type = 'scatter',
mode = 'markers',
opacity = 0.8
)
## Joining, by = c("year", "unitid")
We look at the distribution of the ratios to check what the average is and how many outliers there are.
admissions %>%
filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
inner_join(dir) %>%
mutate(applied_enrolled_ratio = number_applied/number_enrolled) %>%
ggplot(aes(x = factor(year), y = applied_enrolled_ratio)) +
geom_boxplot() +
theme(legend.position = 'bottom')
## Joining, by = c("year", "unitid")
Looking at the worst ten outliers, we see these are typically smaller and/or technical schools.
admissions %>%
filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
inner_join(dir %>% select(unitid, year, inst_name, inst_control)) %>%
mutate(applied_enrolled_ratio = number_applied/number_enrolled) %>%
arrange(desc(applied_enrolled_ratio)) %>%
head(n = 10) %>%
kable('html') %>%
kable_styling()
## Joining, by = c("year", "unitid")
year | unitid | number_applied | number_admitted | number_enrolled | inst_name | inst_control | applied_enrolled_ratio |
---|---|---|---|---|---|---|---|
2004 | 188669 | 1289 | 422 | 9 | THE AILEY SCHOOL | Private not-for-profit | 143.22222 |
2012 | 477039 | 359 | 124 | 3 | West Coast University-Dallas | Private for-profit | 119.66667 |
2012 | 233499 | 1631 | 1631 | 14 | Saint Pauls College | Private not-for-profit | 116.50000 |
2005 | 413839 | 270 | 134 | 3 | ITT Technical Institute | Private for-profit | 90.00000 |
2015 | 484844 | 9032 | 190 | 101 | Minerva Schools at Keck Graduate Institute | Private not-for-profit | 89.42574 |
2003 | 404338 | 2100 | 1461 | 25 | SCHILLER INTERNATIONAL UNIVERSITY | Private for-profit | 84.00000 |
2005 | 437051 | 361 | 193 | 5 | ITT Technical Institute | Private for-profit | 72.20000 |
2001 | 210304 | 4174 | 346 | 58 | WARNER PACIFIC COLLEGE | Private not-for-profit | 71.96552 |
2010 | 414878 | 71 | 54 | 1 | Trine University-Fort Wayne Regional Campus | Private not-for-profit | 71.00000 |
2008 | 449898 | 61 | 1 | 1 | South University-Tampa | Private for-profit | 61.00000 |
I wondered if larger schools have an outsized ratio of applicants compared to their enrollments, as the popularity of the brand name may come to outstrip its actual size. This is not true, however; smaller schools - and particularly private schools - have higher ratios of applicants to enrollments. This may be due to more broad brand recognition of private universities, who compete for students all over the country whereas public schools tend to attract more in-state applicants due to in-state tuition.
admissions %>%
filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
inner_join(dir) %>%
mutate(applied_enrolled_ratio = number_applied/number_enrolled) %>%
plot_ly(
x = ~number_enrolled,
y = ~applied_enrolled_ratio,
color = ~inst_control,
frame = ~year,
text = ~inst_name,
hoverinfo = "text",
type = 'scatter',
mode = 'markers',
opacity = 0.8
)
## Joining, by = c("year", "unitid")
As severely as tuition costs have been rising, it’s important to understand whether we see a supply-side bump in accepting students. Theoretically, given the high marginal revenue of each student, we should see high expansion of enrollments. The increase, however, has been modest at private institutions, but higher at public institutions. Perhaps public universities are able to capture more students who are seeking lower tuitions, while private universities are content to flex their brand for the highest paying students.
admissions %>%
filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
inner_join(dir) %>%
group_by(inst_control, year) %>%
summarize(total_enrollees = sum(number_enrolled)) %>%
mutate(first_total_enrollees = first(total_enrollees, order_by = year),
total_enrollees_index = 100*total_enrollees/first_total_enrollees) %>%
ggplot(aes(x = year, y = total_enrollees_index, col = inst_control)) +
geom_line() +
theme(legend.position = 'bottom') +
labs(y = 'Enrollment (Indexed to 2001)')
## Joining, by = c("year", "unitid")
As long as private univerties continue to get more competitive, there’s no chance of more affordable tuition rates coming from these schools.
admissions %>%
filter(!is.na(number_applied), number_enrolled > 0, number_applied > 0) %>%
inner_join(dir) %>%
group_by(inst_control, year) %>%
summarize(total_enrollments = sum(number_enrolled),
total_applicants = sum(number_applied)) %>%
mutate(total_applicants_per_enrollments = total_applicants/total_enrollments,
first_applicants_per_enrollments = first(total_applicants_per_enrollments, order_by = year),
total_applicants_per_enrollments_index = 100*total_applicants_per_enrollments/first_applicants_per_enrollments) %>%
ggplot(aes(x = year, y = total_applicants_per_enrollments_index, col = inst_control)) +
geom_line() +
theme(legend.position = 'bottom') +
labs(y = 'Applicants per Enrollment (Indexed to 2001)')
## Joining, by = c("year", "unitid")
If I’m a local or state policy advisor, I’m recommending a drastic expansion in public universities - and even lower the barrier for out-of-state applicants - to bring in talented people who can’t afford to break into the private school competitive and financial barriers, thus attracting private sector expansion in high-growth industries https://www.bloomberg.com/view/articles/2018-03-06/how-universities-make-cities-great.