Exam Scoring Guide

Introduction

One of the primary purposes of the BJCP is to promote, recognize, and advance beer, mead, and cider tasting, evaluation and communication skills. The BJCP exam is the heart of the program and is a unique process for peer review of prospective judges and continuing education for both the examinees and the graders. The score on this exam, along with experience points, determined the maximum rank attainable for each judge.

The BJCP uses a three-tier exam system consisting of an online entrance exam, a judging exam, and a written proficiency exam. The online exam is scored automatically, and does not require grader participation. Passing the entrance exam qualifies a participant to take the related beer, mead, or cider judging exam which has a practical judging component. Higher-scoring and experienced beer judges may take the written proficiency exam to reach higher ranks. Both the judging exams and the written proficiency exam require graders to assess the exams and assign scores.

See Rank Requirements for more detail about the specific scores and experiences points to reach each rank in the program, and Exam Structure for more detail about how the exam program is structured and what types of exams are used.

Graders most frequently score beer judging exams, but the same principles apply to all graded exams. The grader’s priority is to determine the proper scoring decile (ten point band) for the exam, since those deciles are used for purposes of rank determination. The skill demonstrated on the exam should directly equate to the implied rank based on the score (e.g., scores in the 70s should indicate “Certified” level skills). This concept is explained in more detail in the Exam Grading Guidelines section.

Within those scoring bands, the relative level of score can send a signal (e.g., within the 70-79 band, a 70 would indicate a minimal achievement, a 75 might be encouraging, and a 79 might invite a protest since the next-higher band was just missed – for this reason, scores ending in ‘9’ are discouraged). It may be helpful to equate a score of 71 to “low Certified” and 78 as “high Certified”, for example.

As with judging beer, graders should be careful about being a fault-finder. Higher rankings should be attainable, and the grader should be careful about being too critical. Note that Master level is a score of 90, not 100, so perfection isn’t required. In particular, graders should be careful about relying on a “bottoms-up” scoring approach since this generally leads to over-penalizing examinees.

Our goal is to evaluate all exams as objectively as possible. In the past, we relied on practical experience and mentoring to build these skills, but we now have an online grader training course available, as well as a rubric for grading scoresheets. BJCP exam graders are required to complete the grader training course.

Exam Grading Guidelines

When scoring an exam, graders should be comfortable that the examinee has demonstrated skills that relate to the judge level for which the score qualifies, using the following:

  • <60: On the written proficiency exam, little knowledge of brewing and/or styles is conveyed, and major gaps are evident. On the judging exam, the examinee displays weak tasting skills, and the scoresheets will generally have unacceptably low levels of completeness, descriptive information and/or feedback. This examinee will be an Apprentice judge.

  • 60s: The examinee demonstrates a basic grasp of fundamentals on the written proficiency exam, but there may be some significant knowledge gaps. The judging exam demonstrates the minimum acceptable communication and judging skills expected of a Recognized judge.

  • 70s : There can be errors and small gaps in the answers on the written proficiency exam, but depth in answers is not necessary. On the judging exam, at least three of the six exam beers are accurately evaluated. The scoresheets should have reasonably good completeness, descriptive information, and feedback appropriate to the Certified judging level.

  • 80s: The written proficiency exam indicates good knowledge of all subjects. Some errors are allowable, but there are no significant gaps and most of the answers demonstrate depth. On the judging exam, at least four of the six exam beers are accurately evaluated with the high quality scoresheets expected of a National judge.

  • 90s: The written proficiency exam demonstrates excellent knowledge level. There are no significant errors, no knowledge gaps, good depth to answers, and evidence of independent thought. On the judging exam, it should be obvious that the examinee is an experienced beer taster.  At least five of the six exam beers are accurately evaluated and the scoresheets have Master levels of completeness, descriptive information, and feedback.

This Grading Guideline applies for all exams, not just beer judging exams. While the Mead Judge and Cider Judge programs do not currently have rank levels, this does not mean that the exam is Pass/Fail. Scores need to be assigned, and the scores need to equate to equivalent beer judge rank levels. The scores are used for other purposes, such as grader qualification, and may be used in a future rank system.

Scoring Mechanics

Judging Exam

There are six beers, meads, or ciders that are evaluated in a 90 minute time period. Each beer is scored on a 100 point scale, with 20 points allocated to scoring accuracy and 80 points allocated to scoresheet comments.

Scoring Accuracy

The judges’ scores and the consensus scores of the proctors for each beer are entered by the exam director into the Exam Grading Form (EGF). The scoring accuracy is calculated using the variance table in the BJCP Scoresheet Guide. Graders do not have to calculate scoring accuracy, but the baseline scores may be adjusted in consultation with the Exam Director as described in the BJCP Scoresheet Guide.

Scoresheet Comments

The remaining 80 points per beer are equally divided between Perception, Descriptive Ability, Feedback, and Completeness. The BJCP Scoresheet Guide is a detailed rubric to help both judges and graders understand the criteria that determine the quality of a beer scoresheet. Please review that guide prior to beginning the grading assignment.

The 80 points for the evaluation of each beer are assigned as follows:

  1. Perception (20 points/beer): Points should be deducted for missed flaws and errors in aroma, appearance, flavor, and mouthfeel perception. The rubric formed by the proctors’ scoresheets enables the graders to make a correlation between the characteristics identified by the examinees and those noted by the proctors. 
  2. Descriptive Ability (20 points/beer): A beer judge should be able to describe the intensity and characteristics of the aroma, appearance, flavor, and mouthfeel using the proper terminology. The BJCP Style Guidelines serve as a reference for this aspect of the scoresheet.
  3. Feedback (20 points/beer): The brewer should receive useful and constructive feedback explaining how to adjust the recipe or brewing procedure in order to produce a beer that is closer to style. The comments should be constructive and consistent with the characteristics perceived by the examinee as well as with the score assigned to the beer. 
  4. Completeness/Communication (20 points/beer): A complete scoresheet should have well-organized, legible, and have informative comments that fill all available comment space. Checkboxes for stylistic accuracy, technical merit, and intangibles should also be marked. This aspect of the scoresheet is generally consistent with the level of descriptive information and feedback conveyed by the examinee.

The points awarded for each aspect of the beer should be correlated with the experience levels; i.e., 12-13 would be expected from a Recognized judge, 14-15 from a Certified judge, 16-17 from a National judge and 18-20 from a Master judge. Scoresheets which are indicative of a subpar judging performance generally fall in the 9-11 point range. Record the score for each beer on the EGF.

Written Exam

There are six questions to be answered in 90 minutes, with the first question comprising twenty true-false (TF) questions on the BJCP levels, the judging process and judging ethics. These TF questions only impact the exam score if they are answered incorrectly, in which case a one-half point (0.5) deduction is made for each error or omission. The other five questions are essay format, worth 100 points each, and cover beer styles, beer characteristics, and the brewing process. 

The primary reference for grading any aspect of beer styles is the 2015 BJCP Style Guidelines. Before grading the written proficiency exam, read each question, review relevant references, and make a checklist of the key information that should be included in a complete answer. When evaluating examinee responses, consider the accuracy, depth of knowledge demonstrated, and completeness and communication skills, including neatness and organization. Understanding positions of various authorities on controversial subjects is desirable, as is knowledge of commercial and classic examples of the styles. Omissions and incorrect or contradictory information should detract from the score; however, some of these deductions may be offset by including greater depth in other aspects of the response.

The points awarded for each answer should correlate to the implied rank; i.e., 60-69 points would be expected from a Recognized judge, 70-79 from a Certified judge, 80-89 from a National judge and 90-100 from a Master judge. Answers which are indicative of a subpar judging performance generally fall in the 40-60 point range. The score for each answer should be entered on the EGF, and comments for the feedback portion of the Report to Participant Form noted. 

Report to Participant (RTP) Forms

The lead scorer is responsible for completing the RTP that will be returned to each examinee. The format for the RTP consists of a cover page summarizing and explaining the results followed by additional pages giving tabular feedback on specific beers (Beer Judging Exam) and essay questions (Written Proficiency Exam).

The RTP form is configured so that the header for the second page and any possible additional pages will have the participant number as part of the page header. The sections of the RTP should be completed as follows:

  • SCORES – This section is completed by the Exam Director after the exams have been reviewed, so please leave this section blank.
  • RECOMMENDED STUDY – Indicate the references that should be studied to correct deficiencies on the written proficiency and tasting exams.
  • JUDGING EXAM SUMMARIES – The EGF tabulates the scoring accuracy result and the average scores for each aspect of beer evaluation: perception, descriptive ability, completeness and feedback. The EGF generates tables for each exam in the tabs named “Grids 1-12” and “Grids 13-24.” The grader copies these tables into the RTP, and then additional feedback can be given using the checkboxes or in prose format. This additional feedback is optional and should be kept brief since the summary tables already provide detailed information about the performance on each scoresheet.
  • WRITTEN EXAM SUMMARIES – Three questions on the BJCP written proficiency exam focus on beer styles, while two are more technical in nature. There are also TF questions relating to the levels of the BJCP, the judging process, and ethics. The correct answers for the TF questions and those submitted by the examinees have already been entered into the EGF by the Exam Director, so these do not need to be graded. The EGF also calculates the average scores for the style and technical questions on the exam, and generates tables that the grader needs to copy into the RTP. 

Consensus Score Process

The exam graders reconcile their scores and agree on a final consensus result. Both graders should be comfortable with this final score not only with respect to the scoring bands (deciles), but also to the position within the level, i.e., low, middle, or high end of the range. If this is not the case, one or both of the graders should adjust their score. If deviations are more than seven points, then the graders should discuss the exam in detail and adjust their scores until they reach a consensus. If there is still a problem, request further scoring by the Associate Director to break the deadlock or to determine what final scores should be assigned to borderline exams. When a consensus score has been reached for each exam, the lead grader should email the completed EGF (including scores from the second grader), the consensus scores, and completed RTPs to the Associate Exam Director and Exam Director.

Additional Grading Tips

  1. Take advantage of the statistics when scores are entered into the EGF. For example on a Beer Judging exam, the two graders and six beers generate twelve data points for each of the four scoresheet characteristics (Perception, Descriptive Ability, Feedback and Completeness). Small differences between the graders will be averaged out.  For example, if the graders arrive at scores of 15 (75%) and 17 (85%) for the Perception score on one of the exam beers, this corresponds to 2 x 20% x (1/6) = 0.07 points of the total score for that examinee. This small difference should not warrant much debate or discussion.
  2. After scores from both graders are entered into the EGF, compare the scores in the “Summary” tab of the EGF. For exams where the two scores are within five points and within the same scoring band, the consensus score should be the average score, even if there are variations in the scores for the individual beers or essay questions. Let the statistics smooth out these differences. The scoring part of the assignment is finished for these exams, and the lead grader can then work on the corresponding RTPs. For exams where the total scores diverge, look at the scores for the individual beers or essay questions and see if you can pinpoint the source of the difference. It may be a matter of discussing the rubric and making sure you using the same reference. The next layer of investigation would be to look at the components of each exam beer or essay question, but this is often unnecessary after the graders identify the exams that need further review.
  3. If there is ever an impasse in assigning a consensus score after each grader has taken another pass through the exams on which there are scoring divergences, remember that the Associate Director is available to provide a third opinion. There are sometimes systematic biases between the graders, and the AD will help identify if one or both of the graders is being too harsh or lenient. Note that the AD will also automatically review exams that end up close to the threshold between scoring bands.
  4. If there are circumstances in your professional or personal life that will result in a significant delay in completion of the grading assignment, please communicate that information to the AD and ED as soon as possible. It is much better to reassign the exams early in the process rather than to let them languish for six months or longer.  Communication is the key!

Timetable and Expenses

Our target is to turn tasting exams around in eight weeks and written proficiency exams in twelve weeks. This requires that graders complete the scoring in no more than four and six weeks, respectively. The BJCP is a nonprofit organization, so the graders and the directors are not expected to profit from the grading of exams. Reasonable expenses may be tabulated and submitted to the BJCP Treasurer with receipts for reimbursement; however, this is rare.