Example code - Comprehension
Files used in example
Assigning students to TA's, Take 2
After having used the original program a few times, see Example code - Data structures, I got dissatisfied with it - the randomness did not ensure even distribution of students to the various TA's. The TA's got the right number of students but a TA could get the same student several times in a row.
To address this issue, I rewrote the program (in better accord with the teaching), keeping some of the original ideas and data structures. A main idea is to have a database of which TA's have graded which students. By 'remembering' the previous distributions, you can spread the students more evenly on the TA's. This is simply done by creating an extra file (tab separated) with content like:
TA 1 Student A 3 TA 1 Student B 2 TA 1 Student C 2 TA 1 Student D 3 TA 1 Student E 2 TA 2 Student A 3 TA 2 Student B 3 TA 2 Student C 2 TA 2 Student D 2 TA 2 Student E 2 etc.
The number is the times a TA has graded a student.
Explanation of the idea:
First the database is loaded from file into a dict of dicts data structure (called base). If no database exists, the data strucure is just initialised with 0. The student list and TA list is stil shuffled for the same reasons as last time.
# Pseudo code Iterating through the list of students: First find a list of available TA's - those who's quota is not filled yet. Of those TA's find the one(s) who have the fewest students assigned - to spread the load evenly. Of those TA's find the one(s) who have had that student the least adjusted according the average. Of those TA's assign the student to a random TA. Update and save the database with the new assignments.
"who have had that student the least adjusted according the average" requires some more explanation. Since TA's have different quotas, then a TA with a small quota will always have had a student fewer times than one with a large quota. In order to not automatically select a "small" TA, then a TA's median for all students is subtracted from the number of times this student has been assigned to the TA. The result is a number, which the lower it is, the more under-assigned the student has been to that TA.
#!/usr/bin/python3 import sys import random def usage(message=''): '''A usage message to be displayed when the user gives faulty input''' if message != '': print(message) print("Usage: assign_TA.py [<file TA list> <file student list> <database file>]") print("The list of TAs may contain a maximum of students to assign to the particular TA.") print("It must come after the TA's name whitespace separated.") print("Both TA and student lists can have a # as first char - ignoring the person.") print("The standard filenames are TA_list.txt, student_list.txt and database.txt") print("The database is for keeping track of assignments, so they are shared equally.") print("The database is reset by deleting it. Do it once every season.") sys.exit(1) #display help if requested or run without arguments if len(sys.argv) == 1: TA_file = 'TA_list.txt' student_file = 'student_list.txt' database_file = 'database.txt' elif sys.argv[1] in ('-h', '--help', '-?'): usage() elif len(sys.argv) == 4: TA_file = sys.argv[1] student_file = sys.argv[2] database_file = sys.argv[3] else: usage() # Read TA list TA_dict = dict() TA_index = list() infile = open(TA_file, 'r') for line in infile: tmp = line.split() if len(tmp) == 0 or tmp[0].startswith('#'): continue #if there is a second column this is the max value of students for that TA maximum = 20 try: maximum = int(tmp[-1]) tmp.pop() except: pass if len(tmp) == 0: usage('Number wrong place on line: ' + line) name = ' '.join(tmp) TA_dict[name] = {'max': maximum, 'assigned': []} TA_index.append(name) infile.close() # Read list of students infile = open(student_file, 'r') st_list = list() for line in infile: line = line.strip() if len(line) > 0 and not line.startswith('#'): st_list.append(line) infile.close() # Read database base = dict() try: infile = open(database_file, 'r') for line in infile: tmp = line.split('\t') if tmp[0] not in base: base[tmp[0]] = dict() base[tmp[0]][tmp[1]] = int(tmp[2]) infile.close() except IOError: pass for TA in TA_index: if TA not in base: base[TA] = dict() for student in st_list: if student not in base[TA]: base[TA][student] = 0 # randomize list random.shuffle(TA_index) random.shuffle(st_list) # find TA median typetal = dict() for TA in base: mylist = sorted([ base[TA][x] for x in base[TA] ]) typetal[TA] = mylist[int(len(mylist)/2)] # make assignment for student in st_list: # How many times have a TA had that student if not filled freq = { x:base[x][student] for x in TA_dict.keys() if TA_dict[x]['max'] > len(TA_dict[x]['assigned']) } # What is the fewest assigned students per available TA minassigned = min([ len(TA_dict[x]['assigned']) for x in freq.keys() ]) # Who has fewest assigned freq = { x: freq[x] for x in freq.keys() if len(TA_dict[x]['assigned']) == minassigned } # For each available TA, have they had that student more than average ? freq = { x: (base[x][student] - typetal[x]) for x in freq.keys() } # Sort according to how frequent the available TA's have had the student adjusted for typetal freqsort = sorted(freq.keys(), key=freq.get) # Remove all but the ones who have had the student the fewest times freqsort = [ x for x in freqsort if freq[x] == freq[freqsort[0]] ] # Pick one at random TA = random.choice(freqsort) # Assign TA_dict[TA]['assigned'].append(student) # update database for TA in TA_dict: for student in TA_dict[TA]['assigned']: base[TA][student] += 1 # Save database try: outfile = open(database_file, 'w') for TA in list(base): for student in list(base[TA]): outfile.write(TA + '\t' + student + '\t' + str(base[TA][student]) + '\n') outfile.close() except IOError: print("Can not save database") # Show user for TA in TA_index: print(TA, ': ', sep='', end='') print('; '.join(TA_dict[TA]['assigned']) + "\n")
Assigning students to TA's, Take 3
Having done the above program, I was quite happy. I tested it with a lot of runs to see how it worked. Students were spread fairly evenly across the TA's, I did notice that some TA's have had one student 4 time and another 6 times. I went home, but this was nagging me. It was OK, but not completely even. Thinking about it enlightened me: The reason behind the slight unevenness is the nature of randomness and different quotas. Sometimes a student gets assigned to a "big" TA where it really should have been a "small" TA, simply due to the order the students appear in the student list (even if random).
Time for a new approach - let the TA select which student to take instead of assigning a student to a TA.
Notice that large parts of the program is the same, It is simply the selection method that is changed.
# Pseudo code As long as the student list is not empty: First find a list of available TA's - those who's quota is not filled yet. Of those TA's find the one(s) who have the fewest students assigned - to spread the load evenly. Construct all combinations of TA-student. Remove the combinations where TA has had a student more than the minimum possible. From the leftover combinations pick a random one. Any combination is a TA which has been under-assigned that student.
#!/usr/bin/python3 import sys import random def usage(message=''): '''A usage message to be displayed when the user gives faulty input''' if message != '': print(message) print("Usage: assign_TA.py [<file TA list> <file student list> <database file>]") print("The list of TAs may contain a maximum of students to assign to the particular TA.") print("It must come after the TA's name whitespace separated.") print("Both TA and student lists can have a # as first char - ignoring the person.") print("The standard filenames are TA_list.txt, student_list.txt and database.txt") print("The database is for keeping track of assignments, so they are shared equally.") print("The database is reset by deleting it. Do it once every season.") sys.exit(1) #display help if requested or run without arguments if len(sys.argv) == 1: TA_file = 'TA_list.txt' student_file = 'student_list.txt' database_file = 'database.txt' elif sys.argv[1] in ('-h', '--help', '-?'): usage() elif len(sys.argv) == 4: TA_file = sys.argv[1] student_file = sys.argv[2] database_file = sys.argv[3] else: usage() # Read TA list TA_dict = dict() TA_index = list() infile = open(TA_file, 'r') for line in infile: tmp = line.split() if len(tmp) == 0 or tmp[0].startswith('#'): continue #if there is a second column this is the max value of students for that TA maximum = 20 try: maximum = int(tmp[-1]) tmp.pop() except: pass if len(tmp) == 0: usage('Number wrong place on line: ' + line) name = ' '.join(tmp) TA_dict[name] = {'max': maximum, 'assigned': []} TA_index.append(name) infile.close() # Read list of students infile = open(student_file, 'r') st_list = list() for line in infile: line = line.strip() if len(line) > 0 and not line.startswith('#'): st_list.append(line) infile.close() # Read database base = dict() try: infile = open(database_file, 'r') for line in infile: tmp = line.split('\t') if tmp[0] not in base: base[tmp[0]] = dict() base[tmp[0]][tmp[1]] = int(tmp[2]) infile.close() except IOError: pass for TA in TA_index: if TA not in base: base[TA] = dict() for student in st_list: if student not in base[TA]: base[TA][student] = 0 # make assignment; goal is to spread out TA's evenly on students while len(st_list) > 0: # Find TA's who can still take on students TA_avail = [ TA for TA in TA_index if TA_dict[TA]['max'] > len(TA_dict[TA]['assigned']) ] # Making sure the burden is evenly spread, by removing those with more than minimum assigned minimum = min([ len(TA_dict[TA]['assigned']) for TA in TA_avail ]) TA_avail = [ TA for TA in TA_avail if len(TA_dict[TA]['assigned']) == minimum ] # Make all TA-student combinations combi = { (TA, stud): base[TA][stud] for TA in TA_avail for stud in st_list } # Find the the combinations with fewest assigned minimum = min(list(combi.values())) # Remove combinations greater than the fewest assigned, notice transition to list combi = [ k for k in combi if combi[k] == minimum ] # Pick random combinations until set is empty while len(combi) > 0: # Pick and assign (TA, student) = random.choice(combi) TA_dict[TA]['assigned'].append(student) st_list.remove(student) # Remove combinations with TA and student combi = [ k for k in combi if k[0] != TA and k[1] != student ] # update database for TA in TA_dict: for student in TA_dict[TA]['assigned']: base[TA][student] += 1 # Save database try: outfile = open(database_file, 'w') for TA in list(base): for student in list(base[TA]): outfile.write(TA + '\t' + student + '\t' + str(base[TA][student]) + '\n') outfile.close() except IOError: print("Can not save database") # Show user for TA in TA_index: print(TA, ': ', sep='', end='') print('; '.join(TA_dict[TA]['assigned']) + "\n")
This program spreads the load both random and evenly, both with regards to number of students assigned, and which TA they are assigned to.
Beautiful.