# Impact of Predictive Learning Analytics on Course Awarding Gap - Supporting Data

dataset

posted on 21.04.2021, 14:12 by Martin HlostaMartin Hlosta, Christothea HerodotouChristothea Herodotou, Miriam Fernandez, Vaclav BayerVaclav Bayer**GENERAL INFORMATION**

The dataset represents supporting data for the research findings of the paper accepted for AIED'21 conference: http://oro.open.ac.uk/76042/

**SHARING/ACCESS INFORMATION**

Links to publications that cite or use the data:

*Hlosta, Martin; Christothea, Herodotou; Miriam, Fernandez and Vaclav, Bayer Impact of Predictive Learning Analytics on Course Awarding Gap of Disadvantaged students in STEM. In: 22nd International Conference on Artificial Intelligence in Education, AIED 2021, Springer.*

Was data derived from another source?

*Yes - the data was derived from the internal OU data*

Recommended citation for this dataset:

*Hlosta, Martin; Christothea, Herodotou; Miriam, Fernandez and Vaclav, Bayer Impact of Predictive Learning Analytics on Course Awarding Gap of Disadvantaged students in STEM. In: 22nd International Conference on Artificial Intelligence in Education, AIED 2021, Springer.*

**DATA & FILE OVERVIEW**

The dataset contains coefficients of a logistic and linear regression that was used to model 3 student outcomes in 3 STEM courses - 1) completion, 2) passing and 3) overall score. The results are split into four tabs

1. Regression Betas

Bets coefficients and the Standard Error for each variable student outcome , i.e.

- completion: comp_B comp_SE

- passing: pass_B pass_SE

- overall score: score_B score_SE

2. LogReg Marginal Effects

the marginal effect coefficients for the two dichotomous outcomes from the previous tab (completion and passing) More information about the marginal effects: https://www.statisticshowto.com/marginal-effects/

3. Reg_BAME - These are the regression coefficients reported in the in the first tab, for the same outcomes (i.e. completion/passing/overall score), but disaggregated by whether the student is identified as BAME or not. Note that the analysis does not contain the 'BAME' coefficients, because it would be constant

4. Red_IMD

Similarly as for BAME (point 3), these are regression coefficients disaggregated by IMD quintiles. IMD_Missing is a special category capturing the students without any IMD, i.e. international students.

**Regression coefficient variables**

The variables entering the regressions can be split into three categories and the intercept

(1) Student level

- age - banded into age_<21, age_[25-29], age_[30-39], age_>60, age_MISSING (reference category: age_[21-24])

- gender - gender_F (reference category Gender_M)

- an indicator of linked qualification - linked_qual (reference category: linked_qual =False)

- declared disability - disability (reference category: disability=False)

- caring responsibility carer_NO, carer_YES (reference category: carer_MISSING)

- flag whether the student is new at the OU - is_new (reference category: is_new=False)

- highest previous education - ed_NoFormal, ed_HE_Qual, ed_PostGrad (reference category: ed_A Level/Equivalent)

- average previous score - discretised into prev_score_LOW, prev_score_MOD, prev_score_VERY_HIGH (avg.prev.score=MISSING, i.e. the student did not study any previous course) these are banded into 4 quartiles (LOW, MOD, HIGH, VERY_HIGH), independently for each course - i.e. the specific values of these thresholds vary for the courses, as they will usually have values of the average score.

- number of other credits studied - banded as credits_other_[1-60], credits_other_>=61 (reference category: credits_other=0)

- number of previous attempts of the course - prev_attempt_=1, prev_attempt >1 (reference category: prev_attempt_0)

- IMD (Index of Multiple Deprivation) - banded into quintiles, i.e. imd_<=20, imd_[21-40], imd_[61-80], imd_>=81 imd_MISSING (reference category: imd_[41-60])

- whether the student is identified as BAME - BAME_YES (reference category: BAME_NO)

- Membership in the intervention group - group_INT (reference category: group_INT=0)

(2) Teacher level

- no. of students the teacher is responsible for - stud_in_group

- avg. student pass rate in the previous years they were teaching - tut_pr_pass_LOW, tut_pr_pass_HIGH, tut_pr_pass_VERY_HIGH, tut_pr_pass_MISSING (reference category: tut_pr_pass_MOD) - these are banded into 4 quartiles (LOW, MOD, HIGH, VERY_HIGH), independently for each course - i.e. the specific values of these thresholds vary for the courses, as they will usually have different pass rates

(3) Course level

- dummy variable encoded as - course_1, course_2 (reference category: course_3)

(4) intercept