# Update of the Matching Model–Transforming Data

We approach this project in three steps: 1. Transforming Data; 2. Building the Model; 3. Running the solution. The first step is transforming data we got from OAA. There are two initial resources from OAA:

1. one excel file containing: A matrix with students (rows) and majors + interests(columns); there are total 1615 “true ” students and 36 majors and 4 concerns including whether they defined themselves as transfer students, Neurodiverse, student-athletes and first-generation or not. “True students” means students who will be matched with advisors using the matching model, and there are also students who will be pre-matched to certain advisor, so they will not go through the algorithm; and there are 1666 students in total, with 51 pre-matched students. In addition, each student will choose at least 2 majors and at most 5 majors among 36 majors, and majors they choose will ranked from “1st major” to “5th major”. So it is a 1615 x 40 matrix.
2. one excel file containing: Detail information about advisors. Each advisor (343 in total) lists (a) interested majors (maximum of 5 and minimum of 1), (b) demand/maximum students to be received and (c) their attitude to four concerns including willingness to receive transfer students, Neurodiverse, student-athletes and first-generation or not. Thus, the excel file is also a matrix with advisors (rows) and academic interests (columns); there are total 343 advisors; 36 majors and 4 concerns. So it is a 343 x 40 matrix

We define student matrix be S matrix and Advisor matrix be A matrix. Since our ultimate goal is to maximize the satisfaction level of matching between students, we must first define “the satisfaction level”. Because student i is matched to advisor j, we use “pij “ to represent satisfaction level. Specifically, pij takes the number of common majors and concerns between students and advisor into consideration, so make sure the algorithm takes a global perspective. The idea is to first get pij value individually for S matrix (students’ satisfaction level toward each major) and A matrix (advisors’ satisfaction toward each major + attitude toward each concern), and then we use matrix multiplication to get the final pij between student and advisors. (Thinking of matrix multiplication, S matrix and transpose of A matrix could be multiplied because of their common “40” majors + concerns.)

First, in S matrix, students could choose at least 2 and at most 5 majors and the corresponding pij values are listed: 1st major pij =2.5, 2nd major pij =2.0, 3rd major pij =1.5, 4th major pij =1.0, 5th major pij =0.5. Then, for questions that whether they defined themselves as transfer students, Neurodiverse, student-athletes and first-generation or not, pij =1 if they answer “Yes” or “Unsure”; pij = 0 if they answer “No”.

Second, in A matrix, advisor will have pij=2 for their home department; for instance, a professor in math department will have pij=2 for math major. And for each other major advisor choose, pij =1. For 4 concerns, advisors could response “1”, “2” or “3”. The response “1” means “don’t want to match with students having following concerns”, response “2” means neutral attitude and response “3” means advisors are willing to match with students having following concerns. Therefore, response “1” pij = – 100 (large negative value to prevent matching), response “2” pij = 0.25 and response “3” pij = 1.

For example, for student who choose Math, Art, Biology and History, the excel file and corresponding S matrix are:  For advisor who choose Math, Computer Science and Biology, the excel file and corresponding A matrix are:  Therefore, the pij value, i.e., the satisfaction level, between student 1 and advisor 1 is (2×0) +(1.5×1) +(0x1)+(1×0)+(2.5×2)+(1×0.25)+(0x0.25)+(0x0.25)+(0x1)=6.75.

Therefore, by converting the information into pij value, we could obtain S matrix and A matrix. Then we get the transpose of A matrix (T matrix), a 40 x 343 matrix, we multiply the S matrix and T matrix. The final P matrix, using students as rows and advisors as columns, is a 1615 x 343 matrix. Each cell of P matrix contains the pij value between each student and advisor. We could thus easily evaluate satisfaction level between each advisor and student, and then select the best one. The highest possible pij value is 14 and the lowest possible pij value is -400.

After we transforming the data, we could go to next step — building model.