A Smarter Way for Teachers to Record Scores

Background

At McGraw Hill Education, the platform included a scoresheet feature that automatically graded online quizzes, tests, and homework.

Class scoresheet interface — The existing scoresheet handled digital assignments automatically — but offered no efficient path for teachers entering paper test scores by hand.

For older students, that worked well. But K5 teachers operated differently — they created tests and quizzes in the platform, then printed them out for children to take by hand. Auto-grading didn't apply to them.

After collecting completed tests, teachers had to manually enter every score into the platform to unlock performance reports. Without that data, they couldn't see which students needed remediation or which skills to revisit.

The entry process was sequential and slow — one question, one student, one click at a time. For a class of 25 with a 10-question test, that's 250 individual inputs before a single report could run.

Scoresheet GIF — The existing workflow: click into each cell manually, enter one score, move to the next. No keyboard navigation, no batch entry, no way to skip a student.

Concepts

The real question wasn't just how to make entry faster — it was what mental model to design for. Teachers grading paper tests already had a workflow. The design needed to match it, not fight it.

An early idea was to let teachers upload a spreadsheet that would auto-populate the scoresheet. Most teachers tracked scores in spreadsheets anyway, so it fit their existing behavior. Stakeholders were on board — but it was out of scope for the initial version.

With spreadsheet upload off the table, three core requirements defined the design problem:

Bulk entry

Score an entire class at once, not student by student.

Apply All shortcut

Fill max score across all students on a question in a single click.

Extra credit support

Allow scores above 100% for extra credit assignments.

Wireframe 1 — Design A: A spreadsheet-style layout where all students and questions were visible at once — fast to scan, but it assumed teachers worked question-by-question across the class rather than student-by-student.

The spreadsheet layout gave teachers a full class view at a glance — easy to scan and compare, and familiar enough to feel immediate. The tradeoff was navigation: with many students or questions, it could get overwhelming to track which row you were on.

A second design took the opposite approach: one student at a time, focused entry, mimicking the physical act of picking up a test paper and working through it.

Wireframe 2 — Design B: A student-focused view that mirrored the physical grading workflow — pick up one test, enter all scores for that student, move to the next.

Design A was faster for getting a class-level overview. Design B matched how teachers actually moved through a stack of papers. Rather than debate it, we built both and tested them with real teachers.

User Testing

Pilot Insight

Catching a Core Issue Early

Before formal testing began, our research team ran a pilot session with a single K5 teacher. Watching her work through Design A, I noticed something immediately: she was fighting the layout. The spreadsheet expected her to move question-by-question across columns — but she kept trying to move student-by-student down rows.

"It would be nice if it [the design] was going down vertically here as well, instead of horizontally, it would just looking up and down on here. It's almost harder for me to go up and down and then track horizontally."

Pilot Tester

Pilot test — Having the students' names on the left side (instead of the questions) went against the users' mental model.

Teachers grade paper assignments student-by-student, not question-by-question. The spreadsheet's axis was reversed from how they actually worked. Based on this, I revised Design A to flip the orientation: questions down the left column, student names across the top — matching the direction teachers naturally moved through a stack of tests.

Methodology

Our Testing Approach

Our research team recruited seven K5 instructors with experience across two major products — Reading Mastery and Reading Wonders — all familiar with the existing online scoresheet.

Each participant tested both designs: the revised spreadsheet layout (Design A) and the student-by-student view (Design B).

To simulate real grading conditions, participants were given PDFs of three pre-scored tests — open on a second screen or printed — and asked to enter scores for those three students into each prototype.

Printer — Participants used printed or PDF copies of scored tests as reference — the same way they'd work at their desks after collecting a class set of papers.

Feedback

Design A

Spreadsheet Design

Keyboard navigation

Tab and Enter made entry fast — but participants didn't discover them until they stumbled on them. A discoverability fix, not a design flaw.

Absent / exempt states

Every participant asked to mark absent or excused students before submitting — what seemed like an edge case was universal.

Apply button label

Consistently missed on first pass. Once explained, everyone wanted it. The concept was right — the label wasn't.

"I do like the format of this and makes it easy to put in and I imagine it would be even faster if I wasn't necessarily looking at a scanned document like toggling back and forth. I think that would be easier."

3rd grade teacher

"So what I liked about it is that I clicked on it. Put a number, enter. Next one number enter number enter number entry, like it was it was quicker. It was a lot easier."

Kindergarten teacher

Scoresheet UT1 — Participants moved quickly through the spreadsheet once they found Tab and Enter — but that discoverability gap was itself a fix needed before ship.

"The Apply button...I would think would just be to lock the score before you submit it. But I honestly visually. I just as you can see, I totally skipped over that."

3rd grade teacher

"Yeah, I like that. I just didn't even see that there where it says give max for but now that you pointed that out to me it makes perfect sense."

Kindergarten/4th grade tutor

Scoresheet UT2 — Every participant missed the "Apply" button on first pass. Once it was explained, every participant wanted it — which confirmed it needed a label rethink, not removal.

Design B

Student-by-Student Design

Design B made the "Apply" button clearer and reduced cognitive load. But two consistent issues emerged:

No class overview

Most participants wanted to see the whole class at once — the single-student view felt limiting for gauging overall performance.

Name-hunting friction

Paper tests aren't sorted alphabetically. Teachers had to hunt for each student's name — friction that Design A's visible row labels avoided.

"I honestly almost even like this better from a type A personality perspective, because if I've got Lucas in front of me. Then I'm just looking at his score sheet. And I don't feel like I have all the other names and everything around me. But I mean, I do like both."

Kindergarten/4th grade tutor

"It's hard to say. I like the previous one better, that we just submitted...and I think that I liked on that you just start at the top and you just go down to input all the scores."

3rd grade teacher

Scoresheet UT3 — Design B had real fans — the focused view resonated with teachers who liked working one student at a time. But the inability to see the full class at once was a consistent drawback.

Revisions

5 of 7 participants preferred Design A. When asked what would make it better, teachers named the same four things:

Absent / exempt marking

Ability to exclude specific students before submitting the full class batch.

Score summaries

Displaying overall scores or class averages alongside individual entries.

Color coding

Highlight high and low scores visually to surface struggling students at a glance.

Standards mapping

Show the skill or standard tied to each question to help guide targeted remediation.

Changes Made

What I Updated Before Ship

Clearer Apply button

Renamed and restyled so teachers immediately understood its function.

Shortcut discoverability

Added an info icon surfacing Tab and Enter navigation that participants kept missing.

Student exclusions

Teachers can now mark students absent or exempt before submitting the batch.

Clearer CTA

"Submit Assignment" → "Submit Scores" — more accurate to what was actually happening.

Submission indicators

Visual confirmation showing which scores had been successfully submitted row by row.

Scoresheet Final — The revised design: "Apply" relabeled for clarity, a tooltip surfacing keyboard shortcuts that were previously invisible, and a way to mark students absent or exempt before submitting.

The final design after all testing rounds:

Scoresheet Final GIF — The final design. Tab and Enter navigate between cells; the Apply button fills max score for a question across the whole class; submitted scores are visually confirmed row by row.

Insights

The strongest signal from testing wasn't a preference split — it was the emotional response. Teachers weren't just giving feedback on an interface. They were reacting to years of friction finally being acknowledged.

A few things they said during sessions:

"Um, I just want to say thank you for changing this because last year was, it like I...would I would do this on a weekend because it would take so long."

Kindergarten teacher

"A definite improvement over what we've had, I think what we've had not only isn't that easy to use, but I think it looks really outdated, which turns people off."

Reading Wonders (K5 product) curriculum specialist

The Class Scoresheet wasn't shipped. Priorities shifted and resources moved elsewhere — a real and common outcome in product work. But the design was validated: seven teachers tested both concepts, five preferred the revised spreadsheet layout, and every participant left with a clear picture of what they needed.

What this project reinforced: the pilot session insight — that the axis was wrong — is exactly the kind of thing that disappears when research isn't treated as a real deliverable. Catching it before formal testing meant seven sessions of clean, focused feedback instead of seven sessions trying to diagnose a structural flaw. That's what research is for.

← Back to Case Studies