One of the first tasks that Mick Galbani, ACESpark (AKA Davwin), Shinryu, Pachy, and I undertook as a judge staff for Make a Good Mega Man Level Contest 3 (MaGMML3) was to test out the rubric we'd be using to rate the entries. This was the same rubric we'd used in the judge application process, but now the goal was to determine whether the categories were weighted properly. For this, a larger pool of levels would be needed.
The hosts (Mick and Davwin) selected two levels from each tier of MaGMML1, a game whose contest rubric is universally recognized as pants. If desired, we were allowed to play the MaGMML1 Remastered versions of those levels (same design; different game engine). Even though Remastered hadn't been released yet, the levels in question were in good enough shape to be loaded into a standalone executable. Five levels from MaGMML2's unrated Tier X (which sounds dirtier than it is) were also selected. To avoid bias, we opted to omit any levels made by one of us judges. We also excluded the entry levels from MaGMML2, given that Davwin was a judge for that contest and used a rubric that wasn't too far off from the new one.
We played and scored these levels according to the new rubric—but no reviews; just ratings. I shared my personal rubric breakdown (the one I used for the judge application) with the group, in case they found it useful. Together, we refined the subcategories as follows:
Design - X/35
Introductions - X/5
Challenge design (deliberate, clear, meaningful, fair) - X/5
Challenge progression (↑ complexity/difficulty, challenge arcs, climax) - X/5
Focus (coherent theme, manageable roster, nothing over/underused) - X/5
Architecture (logical, efficient, unobtrusive) - X/5
Player consideration (length, layout, checkpoints, power-ups) - X/5
Ability Balance (abilities shine without destroying the challenge) - X/2
Name (does the level reflect the title) - X/2
Perfectible (no damage w/ buster only, or else with acceptable forced weapon use) - X/1
Fun - X/25
Totally subjective rating - X/10
Recommendable (would you recommend this to someone else) - X/5
Highs (do the best parts boost the level) - X/5
Lows (are the shortcomings forgivable) - X/5
Creativity - X/15
Originality (have I seen anything exactly like this, i.e. sections copied from other levels) - X/5
Novelty (does this offer a new gameplay experience, or does it feel similar to other stages) - X/5
Impressiveness (am I surprised or wowed) - X/5
Aesthetics - X/15
Graphics - X/5
Music/Sound - X/5
Atmosphere/theming - X/5
Functionality - 9/10
Stability (flawless construction; no glitches) - X/5
Feasibility (can the player reliably complete each challenge) - X/5
Armed with this updated rubric breakdown, I came up with some scores that should cause anyone who's played these levels to raise an eyebrow. As before, the numbers in parentheses represent the individual subcategory scores. Feel free to skip ahead; I won't stop you.
Design - 21/35 (4, 2, 1, 3, 3, 4, 2, 1, 1)
Fun - 13/25 (6, 1, 1, 5)
Creativity - 7/15 (5, 1, 1)
Aesthetics - 8/15 (2, 3, 3)
Functionality - 9/10 (4, 5)
TOTAL - 58
Napalm Forest & Caves
Design - 22/35 (3, 3, 2, 3, 4, 2, 2, 2, 1)
Fun - 11/25 (5, 1, 2, 3)
Creativity - 4/15 (1, 2, 1)
Aesthetics - 12/15 (3, 5, 4)
Functionality - 8/10 (3, 5)
TOTAL - 58
Glass Man
Design - 24/35 (3, 4, 3, 3, 3, 3, 2, 2, 1)
Fun - 12/25 (5, 2, 2, 3)
Creativity - 8/15 (4, 3, 1)
Aesthetics - 11/15 (4, 4, 3)
Functionality - 10/10 (5, 5)
TOTAL - 65
Design - 23/35 (5, 2, 2, 3, 4, 2, 2, 2, 1)
Fun - 8/25 (3, 2, 1, 2)
Creativity - 10/15 (3, 4, 3)
Aesthetics - 13/15 (3, 5, 5)
Functionality - 8/10 (4, 4)
TOTAL - 62
City War
Design - 10/35 (2, 1, 1, 2, 1, 1, 0, 2, 0)
Fun - 4/25 (1, 1, 1, 1)
Creativity - 4/15 (2, 1, 1)
Aesthetics - 12/15 (4, 5, 3)
Functionality - 4/10 (1, 3)
TOTAL - 34
Chroma Key
Design - 23/35 (3, 3, 3, 3, 3, 3, 2, 2, 1)
Fun - 16/25 (7, 3, 3, 3)
Creativity - 11/15 (5, 3, 3)
Aesthetics - 12/15 (4, 4, 4)
Functionality - 10/10 (5, 5)
TOTAL - 72
Design - 22/35 (3, 3, 2, 2, 3, 5, 2, 2, 0)
Fun - 10/25 (4, 1, 1, 4)
Creativity - 2/15 (0, 2, 0)
Aesthetics - 9/15 (2, 5, 2)
Functionality - 9/10 (5, 4)
TOTAL - 52
Thunderclyffe Plant
Design - 25/35 (4, 4, 3, 2, 4, 4, 2, 1, 1)
Fun - 13/25 (5, 2, 1, 5)
Creativity - 6/15 (4, 1, 1)
Aesthetics - 10/15 (4, 4, 2)
Functionality - 10/10 (5, 5)
TOTAL - 64
Research Facility
Design - 29/35 (3, 4, 4, 4, 5, 5, 2, 1, 1)
Fun - 23/25 (8, 5, 5, 5)
Creativity - 11/15 (3, 3, 5)
Aesthetics - 12/15 (4, 5, 3)
Functionality - 8/10 (3, 5)
TOTAL - 83
Design - 22/35 (2, 2, 2, 4, 4, 3, 2, 2, 1)
Fun - 12/25 (5, 2, 2, 3)
Creativity - 6/15 (3, 1, 2)
Aesthetics - 12/15 (3, 5, 4)
Functionality - 7/10 (2, 5)
TOTAL - 59
Wily Fortress VR
Design - 20/35 (2, 3, 3, 3, 3, 4, 1, 0, 1)
Fun - 13/25 (6, 3, 3, 1)
Creativity - 10/15 (3, 4, 3)
Aesthetics - 12/15 (5, 4, 3)
Functionality - 9/10 (4, 5)
TOTAL - 64
So Good
Design - 23/35 (3, 3, 2, 4, 4, 3, 2, 1, 1)
Fun - 17/25 (7, 3, 3, 4)
Creativity - 11/15 (4, 4, 3)
Aesthetics - 12/15 (3, 5, 4)
Functionality - 8/10 (3, 5)
TOTAL - 71
Design - 28/35 (4, 3, 4, 5, 4, 3, 2, 2, 1)
Fun - 11/25 (5, 2, 3, 1)
Creativity - 11/15 (4, 4, 3)
Aesthetics - 15/15 (5, 5, 5)
Functionality - 9/10 (5, 4)
TOTAL - 74
Coyote Man
Design - 21/35 (3, 3, 3, 4, 2, 2, 2, 1, 1)
Fun - 9/25 (4, 1, 2, 2)
Creativity - 9/15 (3, 4, 2)
Aesthetics - 12/15 (4, 4, 4)
Functionality - 9/10 (5, 4)
TOTAL - 60
The Quickening 2
Design - 12/35 (0, 1, 1, 4, 2, 1, 1, 2, 0)
Fun - 5/25 (2, 1, 1, 1)
Creativity - 12/15 (4, 5, 3)
Aesthetics - 13/15 (3, 5, 5)
Functionality - 6/10 (5, 1)
TOTAL - 48
Organizing the total scores from lowest to highest (giving more priority to the level with the higher Design score, in the case of a tie), here's where everything placed:
(34) City War
(48) The Quickening 2
(52) Wily Combo
(58) Level
(58) Napalm Forest & Caves
(59) Midnight Snow
(60) Coyote Man
(62) Mega Man World
(64) Wily Fortress VR
(64) Thunderclyffe Plant
(65) Glass Man
(71) So Good
(72) Chroma Key
(74) Metallic Ocean
(83) Research Facility
Some questionable scores, to be sure. But I've seen some bad fan-made levels, so the bar is set pretty low for getting at least a couple points in any given category. Also, I tend to focus more on the things that ruin my Mega Man experience than the things that go above and beyond to make it superb. There are other factors influencing my scores, but these are the ones that most concisely explain how I came up with these numbers.
I have a vague recollection of us judges comparing notes and averaging all our scores to see where these levels would've placed in MaGMML3. Oddly, I can't find any record of this discussion in the Discord logs, so maybe it happened over voice chat. Whatever the case, we did establish that the main rubric was solid enough to use for the contest. Whether we used the more detailed breakdown was up to us.
When it came time to start rating and reviewing MaGMML3 levels, it quickly became apparent to me that the more detailed breakdown needed more tweaking. I found myself agonizing over categories that were too specific to apply to less conventional levels, so broad that they overlapped unfairly with other categories, or simply no longer of great interest to me.
For one thing, my interpretation of Creativity inherently penalized levels such as Wily Combo and Napalm Forest & Caves, which are centered around callbacks to the official Mega Man games, and unfairly rewarded the likes of So Good and Mega Man World for using non–Mega Man assets. For another thing, the Recommendable subcategory of the Fun score was all too close to MaGMML1's wildly subjective Other Person Fun Factor score (ie, how much do you think other people would enjoy the level).
So, I tinkered with the subcategories until I had something that was (a) comfortably easy to fill out, (b) consistently applicable across all level types, and (c) more accurate to what I was actually looking for in these levels.
Design - X/35
Introductions (clear, appropriate) - X/5
Challenge design (deliberate, meaningful, fair, reasonably perfectible) - X/5
Challenge progression (↑ complexity/difficulty, challenge arcs, climax) - X/5
Focus (coherent vision, manageable roster, nothing over/underused) - X/5
Layout (logical/efficient architecture, obvious pits, safe transitions, sense of direction) - X/5
Player consideration (length, checkpoints, items, disability awareness, niceties) - X/5
Weapon consideration (balance of freedom and challenge, limitations inform the design) - X/3
Name (does the title fit the level) - X/2
Fun - X/25
Totally subjective rating - X/10
Highs (do the best parts boost the level) - X/5
Lows (are the shortcomings forgivable) - X/5
Contest appropriateness (should other people be expected to play this) - X/5
Creativity - X/15
Novelty (does the level offer new experiences, or present old experiences in new ways) - X/5
Potential (does the level adequately explore the potential of its various elements) - X/5
Impressiveness (am I surprised, charmed, impressed, or wowed) - X/5
Aesthetics - X/15
Audio (appropriate, enjoyable, tolerable, music looped and implemented properly) - X/5
Visuals (clear, legible, appealing, unobtrusive, polished) - X/5
Atmosphere (theming, story, overall feel) - X/5
Functionality - X/10
Construction (appropriate structural object use, polished programming, no surprises) - X/5
Feasibility (is the level beatable without risk of getting stuck or crashing the game) - X/5
This proved to be a much better breakdown. There were still some oddball levels that threw me for a loop, but the total scores I was assigning finally felt right.
I originally had level layout under Player Consideration, but I realized that all the structural elements should be together in the same category. Proceeding safely and confidently through a level is a function of the architecture, layout, screen transitions, and graphics working together to guide the player. Player Consideration should be reserved for questions such as, "Did you remember that players with color blindness, motion sickness, epilepsy, or a hearing impairment may want to enjoy this level?", not, "Did you remember that players prefer not to die instantly when attempting to exit a room?"
Whereas the old Ability Balance category required me to pick apart the individual usefulness of every single weapon in a roster full of redundancy, and Perfectible required me to determine if it was technically possible to do a no-damage buster-only run (ugh), Weapon Consideration allowed me to step back and compare overall experiences with and without special weapons. Is the player allowed to try different strategies without being excessively rewarded or punished? This was also a way to score the appropriateness and intentionality of what weapons were enabled, disabled, unlockable, or infinite.
Recommendable morphed into Contest Appropriateness—essentially, a measure of how much the level deserves a skip teleporter. My rationale is that if a level is skippable, it's obviously not fun enough for a general audience to put up with whatever problems warranted the skip. Whereas MaGMML1's Other Person Fun Factor was a haphazard guess at what other people might like, this was a safeguard against my personal preferences allowing a level to reach the top tiers without being fully accessible to the average player.
Originality was unsustainable; I was spending entirely too much time analyzing every screen, testing my memory for any identical setups across hundreds upon hundreds of other levels. What I really cared about for Creativity was the overall experience, what the designer created with elements old and new. I replaced Originality with Potential, something that is hugely important to me—it's one thing to come up with new ideas; it's another thing entirely to go anywhere with them.
I wanted Functionality to pertain solely to the technical aspects of level design. When I initially worked out the category breakdowns, the glitchy spring platforms of Magnum Man were foremost in my mind for Feasibility. Realistically, whether the player can reliably complete each challenge is usually more of a Design question—and a question that can only be answered by more repeat testing than my schedule and sanity would ever allow. Thus, I recalibrated the breakdown to focus on easily quantifiable items in the same vein as glitches and collision object mishaps. Does the architecture line up safely across screen transitions? Are boss projectiles destroyed along with the boss? Is the level free of any and all softlock potential? Etc.
I came to realize that some overlap was inevitable. A low Player Consideration score, for instance, goes hand-in-hand with a low Contest Appropriateness score, because a long level with no checkpoints or power-ups is a prime candidate for a skip teleporter. A high Creativity score is almost guaranteed to secure a good number of Fun points from me. Good screen transitions are highly important to me, and different aspects of their use are covered under both Design and Functionality. And that's okay. The purpose of the rubric is to help the judges translate their complex opinions into a simple, quantifiable, universally applicable format, not to force apart certain elements that are inherently intertwined.
Besides, reorganizing my subcategories was only one part of the scoring process. The plan was always to compare scores at the very end and adjust as necessary, with or without a more detailed breakdown. If there were still problems with my interpretation of the rubric after 170+ levels, I'd have a chance to resolve them.
Fun, right? And this was just the first step in the judge process. Wait until I tell you about playing and writing reviews—after the results are announced, of course.