Test Development Process

Standard 1.0 ...[A]ppropriate validity evidence in support of each intended [test score] interpretation should be provided. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests. Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests. The process of validation involves accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014, p. 11)

A passing score on the MTTC test is a requirement for obtaining a teaching certificate or endorsement in Michigan. The validity of the MTTC test is based on the accumulation of evidence that supports the use of the MTTC test for making pass/fail determinations within this context. The process of accumulating validity evidence is interwoven throughout the development of the MTTC tests, including the establishment and participation of advisory committees, the definition of test content, the development of test items, and the establishment of passing standards. The test development process includes steps designed to help ensure that

  • the test content is aligned with Michigan laws, Michigan standards for teacher certification, and other state documents/policies governing Michigan schools,
  • the test items assess the defined content accurately, and are job-related and free from bias, and
  • the passing scores reflect the level appropriate for the use of the MTTC test in making pass/fail determinations as a requirement for receiving a teaching certificate or endorsement in Michigan.

The test development procedures, including the accumulation of validity evidence, are described in the following sections of this manual:

Establishing Advisory Committees

Standard 1.9 When a validation rests in part on the opinions or decisions of expert judges, observers, or raters, procedures for selecting such experts and for eliciting judgments or ratings should be fully described. The qualifications and experience of the judges should be presented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Involving Michigan educators in test development activities has been an important component of establishing a validity basis for the MTTC program. Michigan teachers and teacher educators have served on MTTC advisory committees throughout the development of the tests. The involvement of Michigan school teachers and faculty preparing prospective educators contributes to validity by grounding the program in Michigan practice and requirements.

Development of new and updated tests for the MTTC program is a collaborative effort involving the Michigan Department of Education (MDE), the Evaluation Systems group of Pearson (Evaluation Systems), and committees of Michigan teachers and teacher educators, including educators who serve on Content Advisory Committees (CACs) and the Bias Review Committee (BRC).

CACs are charged with reviewing and validating the content of the tests; one CAC is constituted for each test field. Bias prevention is the focus of the BRC, a group of Michigan educators who participate in reviews of test materials throughout the development process. Members of the CACs are involved in making judgments that are provided to the MDE for use in setting passing standards for the tests.

Content Advisory Committees

A Content Advisory Committee (CAC) composed of Michigan educators (up to 18–22) associated with the test field is established for each test field.

The CACs include Michigan school teachers and college and university faculty engaged in the preparation of prospective educators. Nominations for membership on the committees are elicited from school administrators, deans at higher education institutions, school teachers, teacher organizations, academic and professional associations, and other sources specified by the MDE. Evaluation Systems documents the nominations of eligible educators for the MDE, and the MDE reviews educator applications to select the educators to invite to serve on the CACs based on their qualifications (e.g., content training, years of experience, accomplishments).

Committee members are selected to include

  • school teachers (typically a majority of the members), and
  • higher education faculty preparing prospective educators (arts and sciences, fine arts, and/or education faculty).

In addition, committee members are selected with consideration given to the following criteria:

  • Representation from different levels of teaching (i.e., early childhood, elementary, middle, and secondary levels)
  • Representation from professional associations and other organizations
  • Representation from diverse racial, ethnic, and cultural groups
  • Representation from all genders
  • Geographic representation
  • Representation from diverse school settings (e.g., urban areas, rural areas, large schools, small schools, charter schools)

The CACs meet during the test development process for the following activities:

  • Teacher task list review
  • Test framework/objectives and test specifications review
  • Test items review and validation
  • Marker response selection, for those fields with constructed-response items
  • Standard Setting
  • Linking activity

Bias Review Committee

A Bias Review Committee (BRC) comprising up to 15 Michigan educators has been established to review new and updated test materials to help prevent potential bias. BRC members are a diverse group of educators who represent individuals with disabilities and the racial, gender, ethnic, and regional diversity of Michigan.

The establishment of the BRC mirrored the process for establishing the CACs. Educators were nominated and encouraged to apply for membership by school administrators, deans at higher education institutions, school teachers, teacher organizations, academic and professional associations, and others sources specified by the MDE. The MDE reviews educator applications to select the educators based on their qualifications to serve on the BRC.

In general, the committee members are selected with consideration given to the following criteria:

  • Representation from different levels of teaching (e.g., early childhood, elementary, middle, and secondary level)
  • Representation from diverse racial, ethnic, and cultural groups
  • Representation from all genders
  • Geographic representation
  • Representation from diverse school settings (e.g., urban areas, rural areas, large schools, small schools, charter schools)
  • Representation from professional associations and other related educational organizations

The BRC meets during the test development process for the following activities:

  • Review of test objectives and assessment specifications
  • Review of test items
  • Marker response selection*
  • Standard setting*
  • *For these activities, members of the BRC are invited to participate for test fields in which they are certified and practicing or preparing candidates for certification.

The BRC works on a parallel track with the CACs. Typically, the BRC reviews materials shortly before the materials are reviewed by the CACs. BRC members are provided with a copy of Fairness and Diversity in Tests (Evaluation Systems, 2009) before beginning their work. The BRC recommends revisions, as needed, to the test materials. Their comments and suggestions are communicated to the CAC members, who make revisions to address the issues raised by the BRC. If the revisions by the CAC differ from those suggested by the BRC, a member of the BRC and a member of the CAC are asked to mutually agree on alternative revisions.

Standard Setting Panels

A Standard Setting Panel of up to 22 Michigan educators is established for each test field to provide judgments to be used in setting the passing scores for the tests. Because the number of educators certified and available to participate varies across the different test fields, the number of panel members varies accordingly. These panels typically include some members from the CAC for the field and, in some cases, BRC members qualified in the field, as well as additional educators meeting the same eligibility guidelines as the CAC members.

The selection process for the Standard Setting Panels mirrors the selection process for the CACs.

Panel members are approved by the MDE to include

  • public school educators (typically a majority of the members), and
  • higher education faculty preparing prospective educators (arts and sciences, fine arts, and/or education faculty).

In addition, committee members are selected with consideration given to the following criteria:

  • Representation from different levels of teaching (i.e., early childhood, elementary, middle, and secondary levels)
  • Representation from professional associations and other organizations
  • Representation from diverse racial, ethnic, and cultural groups
  • Representation from all genders
  • Geographic representation
  • Representation from diverse school settings (e.g., urban areas, rural areas, large schools, small schools, charter schools)

Members of the panels make recommendations that are used by the MDE, in part, in establishing the passing score for each test.

Test Objective Development and Review

Standard 11.2 Evidence of validity based on test content requires a thorough and explicit definition of the content domain of interest. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Test Objectives

As indicated previously, validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests. The validity evidence for the MTTC tests focuses on the use of the MTTC tests for making pass/fail determinations for the purpose of teacher certification. In order to appropriately make those pass/fail determinations, it is important that the test content be explicitly defined and that it align with relevant Michigan teacher preparation standards and requirements and practice regarding teacher certification. The MTTC Test Objectives serve the purpose of providing explicit descriptions of the content eligible to be included on the tests.

The purposes of the test objectives include

  • establishing a link between test content and Michigan legal, policy, and regulatory sources;
  • communicating to policymakers, educators, and other stakeholders how standards and expectations for teachers in Michigan are embodied in the MTTC tests;
  • presenting an organized summary of content knowledge expectations for candidates preparing to take the test as well as higher education faculty responsible for preparing prospective educators; and
  • providing a structure for score reporting and score interpretation.

The MTTC test objectives (available on the MTTC program website) include a table indicating the weighting of the content subareas of the test. A sample is provided below.

MATHEMATICS (SECONDARY)

Test objectives weighting by number of questions per subarea
Subarea Range of Objectives Approximate Percentage of Questions on Test
I Mathematical Processes and Number Concepts 001–004 22%
II Patterns, Algebraic Relationships, and Functions 005–009 28%
III Measurement and Geometry 010–013 22%
IV Data Analysis, Statistics, Probability, and Discrete Mathematics 014–018 28%

The test objectives, contained in the test frameworks, provide the subareas, objectives, and descriptive statements that define the content of the test. A sample is provided below.

Subarea I—MATHEMATICAL PROCESSES AND NUMBER CONCEPTS

Objective 002—Understand problem-solving strategies, connections among different mathematical ideas, and the use of mathematics in other fields.

Includes:

  • devising, carrying out, and evaluating a problem-solving plan
  • applying a range of strategies (e.g., drawing a diagram, working backwards, creating a simpler problem) to solve problems
  • analyzing problems that have multiple solutions
  • selecting an appropriate tool or technology to solve a given problem
  • recognizing connections among two or more mathematical concepts (e.g., Fibonacci numbers and the golden rectangle symmetry and group theory)
  • exploring the relationship between geometry and algebra
  • applying mathematics across the curriculum and in everyday contexts

Preparation of Test Objectives

As an initial step in preparing the MTTC Test Objectives, Evaluation Systems, in partnership with the Michigan Department of Education (MDE), systematically reviews relevant documents that establish the basis for the content of the tests and incorporates the content of the documents into the draft test objectives. The test objectives reflect the relevant Michigan standards and requirements in each field, focusing mainly on the appropriate Michigan Standards for the Preparation of Teachers link opens in a new window. Evaluation Systems may also review additional documents that were referenced by the State of Michigan in preparing the teacher preparation standards. These documents may range from learning/teaching standards prepared by national organizations, books used in educator preparation programs, and similar resources, as appropriate.

Documentation of Correspondence between Test Objectives and Sources

Standard 11.3 When test content is a primary source of validity evidence in support of the interpretation for the use of a test for employment decisions or credentialing, a close link between test content and the job or professional/occupational requirements should be demonstrated. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Evaluation Systems prepared, as a component of the documentation of the MTTC program for validation purposes, a correlation chart linking the test objectives to Michigan sources from which they were derived. The correlation charts focus on the links between the test objectives and the relevant Michigan standards.

Assessment Specifications

Standard 4.2 In addition to describing intended uses of the test, the test specifications should define the content of the test, the proposed test length, the item formats, the desired psychometric properties of the test items and the test, and the ordering of items and sections. Test specifications should also specify the amount of time allowed for testing; directions for the test takers; procedures to be used for test administration, including permissible variations; any materials to be used; and scoring and reporting procedures. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Assessment specifications documents that describe major aspects of test design and test administration are prepared at the outset of MTTC test development. The assessment specifications contain two sections: an introductory section designed to be consistent across tests, and a test-specific section. The introductory section, prepared by Evaluation Systems and the MDE, contains information regarding

  • the background and purpose of the MTTC test;
  • the purpose of the test objectives, and the structure of the test content into subareas, objectives, and descriptive statements;
  • test item formats;
  • bias prevention;
  • test composition and length;
  • test administration and testing time;
  • test scoring; and
  • test reporting.

The purpose of the introductory section of the assessment specifications is to provide the BRC and CACs with contextual information about the tests and test operations to better enable them to review assessment materials and conduct other test development tasks. Additionally, the information helps preserve consistency across MTTC tests during the development process.

The field-specific Measurement Notes section of the assessment specifications are drafted by Evaluation Systems for review and revision by the BRC and relevant CAC. This section includes information such as field-specific terminology to be used in the test items, resources to be consulted, specifications regarding item stimuli, and constructed-response item guidelines. The purpose of the Measurement Notes is to provide a mechanism for communicating agreed-upon item development specifications to test developers.

Bias Review of Test Objectives and Assessment Specifications

Standard 3.2 Test developers are responsible for developing tests that measure the intended construct and for minimizing the potential for tests being affected by construct-irrelevant characteristics, such as linguistic, communicative, cognitive, cultural, physical, or other characteristics. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The BRC serves a central role in helping to safeguard that the tests measure intended constructs and in minimizing the potential for irrelevant characteristics affecting examinees' scores. The BRC is convened at the beginning of the test development process to review the test objectives and assessment specifications to help determine if the materials contain characteristics irrelevant to the constructs being measured that could interfere with some test takers' ability to respond. The BRC uses bias review criteria established for the MTTC program regarding content, language, offensiveness, and stereotypes. Committee members are asked to review the proposed test objectives (including subareas, objectives, and descriptive statements) and the Measurement Notes section of the assessment specifications according to the following review criteria:

Objectives

Content: Does any element of the objectives or descriptive statements contain content that disadvantages a person because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Language: Does the language used to describe any element of the objectives or descriptive statements disadvantage a person because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Offense: Is any element of the objectives or descriptive statements presented in such a way as to offend a person because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Stereotypes: Does any element of the objectives or descriptive statements contain language or content that reflects a stereotypical view of a group based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Diversity: Does the list of objectives and descriptive statements permit appropriate inclusion of content that reflects the diversity of the Michigan population?

Assessment Specifications

Content: Does any element of the test specifications contain content that disadvantages a person because of her or his gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Language: Does the language used to describe any element of the test specifications disadvantage a person because of her or his gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Offense: Is any element of the test specifications presented in such a way as to offend a person because of her or his gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Stereotypes: Does any element of the test specifications contain language or content that reflects a stereotypical view of a group based on gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Diversity: Do the test specifications permit appropriate inclusion of content that reflects the diversity of the Michigan population?

The BRC reviews the draft test objectives and assessment specifications with guidance from Evaluation Systems facilitators, and members are asked to come to consensus regarding any recommended revisions. Recommendations for revisions are presented to the CAC convened for review of the same materials. The CAC is instructed to address all bias-related issues raised by the BRC. If a revision by the CAC differs substantively from what was suggested by the BRC, follow-up is conducted with a member of the BRC to make sure the revision is mutually agreed-upon.

Content Reviews of Test Objectives and Assessment Specifications

The CACs are convened to review the proposed test objectives, including descriptive statements and subareas, as well as the assessment specifications.

For the test objectives, the CAC uses review criteria regarding the structure of the test objectives (including the weighting of the content subareas containing the test objectives) and the content of the objectives. Committee members apply the following review criteria established for the MTTC program related to program purpose, organization, inclusiveness, significance, accuracy, freedom from bias, and job-relatedness:

Structure of Test Objectives

Program Purpose: Is the framework (set of test objectives) consistent with the purpose of the MTTC tests (i.e., to determine whether prospective teachers have the knowledge and skills to perform effectively the job of a qualified educator in Michigan)?

Organization: Is the framework (set of test objectives) organized in a reasonable way? Are the subarea headings accurate and do they clearly describe the content?

Inclusiveness: Is the content of the framework complete? Does the framework (set of test objectives) reflect the knowledge and skills an educator should have in order to teach the content? Is there any content that should be added?

Objectives

Significance: Do the objectives describe knowledge and skills that are important for an educator to have?

Accuracy: Do the objectives accurately reflect the content as it is understood by educators in the field? Are the objectives stated clearly and accurately, using appropriate terminology?

Freedom from Bias: Are the objectives free from elements that might potentially disadvantage an individual because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Job-Relatedness: Do the objectives cover important knowledge and skills that an educator should have in order to perform effectively the job of a qualified Michigan Educator?

CAC members are also asked to review the proposed Measurement Notes section of the assessment specifications according to the following review criteria:

Assessment Specifications

Program Purpose: Are the test specifications consistent with the purpose of the MTTC program (i.e., to determine whether prospective teachers have the knowledge and skills to perform effectively the job of a qualified educator in Michigan)?

Significance: Do the test specifications describe knowledge and skills that are important for educators to have?

Accuracy: Do the test specifications accurately reflect the content as it is understood by educators in the field? Are the test specifications stated clearly and accurately, using appropriate terminology?

Freedom from Bias: Are the test specifications free of elements that might potentially disadvantage an individual because of her or his gender, race, ethnicity, nationality, religion, age, disability, or cultural, economic, or geographic background?

Job-Relatedness: Do the test specifications cover important knowledge and skills that an educator should have in order to perform effectively the job of a qualified Michigan educator?

The CAC reviews and revises the draft test objectives and Measurement Notes section of the assessment specifications through a process of discussion and consensus, with the guidance of an Evaluation Systems facilitator. During the committee discussion, they incorporate revisions suggested by the BRC. Following the committee's consensus review and revision of the test objectives, committee members independently provide a validity rating to verify that the final objectives, as agreed upon in the consensus review, are significant, accurate, free from bias, and job-related. Committee members also have the opportunity to make additional comments regarding the test objectives.

Following the review meeting, Evaluation Systems revises the test objectives and assessment specifications according to the recommendations of the CAC. The MDE approves the draft test objectives for use in the content validation survey and the assessment specifications for use in test item development.

Content Validation Surveys

Standard 11.3 When test content is a primary source of validity evidence in support of the interpretation for the use of a test for employment decisions or credentialing, a close link between test content and the job or professional/occupational requirements should be demonstrated. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The MTTC content validation surveys are an important component of the validity evidence in support of the content of the MTTC tests. The surveys validate the test objectives that form the basis of test content by ascertaining that job incumbents (i.e., Michigan school teachers) and educator experts (i.e., educator preparation faculty) consider the content of each test objective important for teaching. The surveys provide additional evidence of linkage of the test content to job requirements, beyond the correlation charts linking the test objectives to the relevant Michigan standards.

The purpose of the survey was to obtain judgments from Michigan school teachers and educator preparation faculty about

  • the importance of each objective for a qualified educator in Michigan schools;
  • how well each set of descriptive statements represents important aspects of the corresponding objective; and
  • how well the set of objectives, as a whole, represents the content knowledge and skills needed for a qualified educator in Michigan schools.

Survey of Michigan School Teachers

To be eligible to respond to the survey of school teachers, an individual needs to be a certified, practicing educator holding a Michigan teaching endorsement and teaching assignment corresponding to the test field. A database of teachers assigned to each teaching field for each school district in Michigan is provided by the Michigan Department of Education (MDE) for use in drawing a random sample of educators. Typically, 200 school teachers are randomly selected for a field (or the entire population, for fields with fewer than 200 educators statewide).

In order to meet expectations established with the MDE for representation of minority educator survey participation, African Americans are sampled for each test field at roughly twice the rate at which they are represented in the population of teachers for the test field. This guideline applies insofar as sufficient numbers of teachers are available in individual fields. The population and proposed sample for each field is reviewed by the MDE.

Advance notification materials are sent to principals of schools with sampled educators. A subsequent mailing to principals includes instruction letters and Web survey access codes for sampled educators. Principals are asked to distribute the materials to the sampled educators.

To determine eligibility to complete the survey, recipients respond to the following question at the beginning of the survey (with slight variations for certain fields): Are you now teaching or have you taught in Michigan as a certified teacher in the field indicated above during this or the previous school year?

Respondents provide background information about their highest level of education attained, gender, race/ethnicity, years of professional teaching experience, primary teaching assignment level, and primary teaching environment.

The school teachers are asked to respond to the following questions (with slight variations for certain fields):

Objective rating question: In your job as a Michigan educator in this field, of how much importance is the objective below to an understanding of the content of this certificate or endorsement area?

1 = no importance
2 = little importance
3 = moderate importance
4 = great importance
5 = very great importance

Descriptive statement rating question: How well does the set of descriptive statements represent important aspects of the knowledge and skills addressed by the objective?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

Overall rating question: How well does the set of objectives as a whole represent important aspects of the knowledge and skills required for performing the job of a qualified Michigan educator in this field?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

For each survey question, participants are asked to provide a comment for any rating less than "3," including noting any additional important areas of content that should be included.

Evaluation Systems monitors the survey access codes and tracks responses by school and educator. A number of follow-up activities are conducted with non-responding schools and educators, including telephone calls, emailing or re-mailing survey materials to schools, and extending the deadline for returns.

Survey of Faculty at Michigan Institutions of Higher Education

A separate content validation survey is conducted with faculty at Michigan colleges and universities offering approved teacher education programs in the specified fields. To be eligible to participate in the survey, a faculty member must be teaching one or more education courses or academic specialization courses in the content fields being surveyed. Responses from up to 100 faculty are targeted for fields where they exist. Typically, institutions are categorized as having high, medium, or low enrollment for a given field, and the number of surveys distributed to each preparation program is based on this categorization.

Advance notification letters are sent to educator preparation program contacts at higher education institutions included in the sample. A subsequent mailing with instruction letters and Web survey access codes is sent to the designated contact person at each institution, along with instructions for identifying faculty members eligible to complete the survey.

To determine eligibility to complete the survey, recipients respond to the following question at the beginning of the survey (with slight variation for certain fields):

During this or the previous school year, are you now teaching or have you taught in Michigan undergraduate or graduate courses in the field indicated above to undergraduate or graduate education candidates?

Respondents provide background information about their highest level of education attained, gender, race/ethnicity, years of teaching at a college or university, grade level(s) for which their candidates are preparing to teach, and college/department affiliation.

Faculty members are asked to respond to the following questions (with slight variations for certain fields):

Objective rating question: To a person preparing for a job as a Michigan educator in the field indicated above, of how much importance is the objective below to an understanding of the content of this certificate or endorsement area?

1 = no importance
2 = little importance
3 = moderate importance
4 = great importance
5 = very great importance

Descriptive statement rating question: How well does the set of descriptive statements represent important aspects of the knowledge and skills addressed by the objective?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

Overall rating question: How well does the set of objectives as a whole represent important aspects of the knowledge and skills required for performing the job of a qualified Michigan educator in this field?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

For each survey question, participants are asked to provide a comment for any rating less than "3," including noting any additional areas of content that should be included.

Evaluation Systems monitors the survey access codes and tracks responses by institution and faculty member. A number of follow-up activities are conducted with non-responding institutions and faculty members, including reminder emails, telephone calls, re-providing survey access codes, and extending the deadline for returns.

Analysis of the Content Validation Surveys

Evaluation Systems analyzes the content validation data for each field separately for school teachers and teacher educators. The following reports are produced and provided to the MDE.

Content Validation Survey Population/Sample/Respondents Demographics: Indicates the composition of the educator group for the population, sample, and survey respondents (for school teacher survey only).

Survey Return Rate by Field and Return Status: Indicates the number and percent of surveys distributed and returned.

Demographic Summary Report: Indicates participant responses to the eligibility and background information questions.

Objective Rating Report: Indicates the average importance rating given to each objective and the average across all objectives. For fields in which oversampling of minority groups was conducted for the school teacher survey, weighted data are provided to appropriately take the oversampling into account.

Descriptive Statement Rating Report: Indicates the average importance rating given to each set of descriptive statements and the average across all sets of descriptive statements. For fields in which oversampling of minority groups was conducted for the school teacher survey, weighted data are provided to appropriately take the oversampling into account.

Composite Rating Report: Indicates the average rating given to the set of objectives as a whole. For fields in which oversampling of minority groups was conducted for the school teacher survey, weighted data are provided to appropriately take the oversampling into account.

Respondent comments regarding the objectives, descriptive statements, and set of objectives as a whole are sorted and categorized to facilitate review (e.g., sorted by relevant objective).

The analyses of survey return rates, demographic summaries, survey ratings, and participant comments are provided to the MDE for review. The MDE determines if any changes to test objectives are warranted based on the survey results (e.g., additions or revisions to content or terminology in a descriptive statement). Objectives indicated as important (objectives with mean importance ratings of 3.00 or higher for each respondent group) by both school teachers and teacher educators are considered eligible for inclusion on the test. Any objectives with a mean importance rating of less than 3.00 from either respondent group are identified for further review and discussion (with the MDE and/or MTTC content advisory committee members). For more information, see the summary of content validation survey importance ratings for all fields that have undergone content validation.

Job Analysis Study: Document Linkage between the Test Objectives and the Job of a Teacher in Michigan

As an additional way of gathering validity evidence in support of the use of MTTC tests as a requirement for certification as a Michigan educator, a Job Analysis Study is conducted for all newly developed and redeveloped test fields.

The job analysis activities entail the one-time creation of a job task list via facilitated committee discussion, a one-time job analysis survey based on the job task list, and linkage studies conducted for each test field under development/redevelopment. Linkage studies may occur in conjunction with either the objective review meeting or the standard setting meeting for the associated test field.

The job analysis study entails the following four steps:

  1. Creation of the Michigan-specific Teacher Job Task List
  2. Job Analysis Committee meetings
  3. Job Analysis Survey
  4. Linkage Studies

Creation of the Michigan-specific Teacher Job Task List. In summer of 2018, Evaluation Systems prepared a draft Teacher Job Task List that enumerates and describes the major tasks performed by Michigan educators in the classroom. The Teacher Job Task List includes work activities, abilities, skills, knowledge, tasks, tools, and technology that teachers must master to be effective and proficient in their everyday work. For the purposes of this study, a task was defined as a work activity that has an identifiable beginning and end and results in a meaningful outcome. In the context of education, student learning is arguably the most important outcome to result from an educator's work activities. As a result, the focus of the draft Teacher Job Task List was to include those tasks performed by educators that lead to the desired outcome of student learning.

The following documents were used to inform the creation of a draft Michigan-specific Teacher Job Task List.

  • High-Leverage Practices online
  • InTASC Model Core Teaching Standards
  • Universal Design for Learning (UDL) Guidelines
  • A Data-Driven Focus on Student Achievement (Marzano)
  • The 2011 Framework for Teaching Evaluation Instrument (Danielson)
  • edTPA Assessment Handbooks
  • U.S. Department of Labor information about what a teacher needs to do (from O*NET Online)

Job Analysis Committee meetings. Once the draft Michigan Teacher Job Task List was developed, Job Analysis Committee meetings were conducted to create a final task list to serve as the basis for a Job Analysis Survey, pending MDE approval.

Overall structure. Four job analysis committees (representing grade bands PK–3, 3–6, 5–9, and 7–12, to reflect the newly emerging Michigan teacher certification structure) were assembled for the review of the draft teacher task list. Each committee included approximately eight teachers with certificates/endorsements across content areas, as appropriate.

Participant Recruitment. Evaluation Systems collaborated with the MDE to complete the recruitment process. Potential participants were recruited based on the parameters of the new state teacher certification structure. Any teachers who qualify for potential committee participation were submitted for MDE review.

Process. Each job analysis committee meeting began with an orientation explaining the purpose of the Job Analysis Study and the role of the committee in reviewing the draft Michigan Teacher Job Task List. The committees reviewed the draft Michigan Teacher Job Task List, adding, deleting, or revising as they saw fit to represent their work as teachers. Any recommended revisions to the task list were documented by each committee, and participants noted their approval for the record.

After each of the four grade band committees met and edited the task list, two representatives from each committee were convened into a new committee (a "consensus" committee). The consensus committee reviewed all edits from each of the individual grade band committees and came to consensus on the content of the final Michigan Teacher Job Task List. This final list is intended to represent the work of all teachers across the state of Michigan, regardless of grade level or teaching assignment. The MDE approved the final Michigan Teacher Job Task List for use in the Job Analysis Survey.

Job Analysis Survey. Evaluation Systems developed, from the MDE approved Michigan Teacher Job Task List, a job analysis survey. Respondents were asked to provide ratings, for each task in the Teacher Job Task List, indicating the relative time spent on the task, and the importance of the task.

The job analysis survey respondent sampling plan sought survey respondents constituting a relatively large and representative sample of teachers, as determined by the MDE and Evaluation Systems (based on certificate/endorsement area, demographic characteristics, job specialty areas, etc.). A stratified random sampling design, as is used for content validation surveys, was employed. The target survey sample included approximately 400 individuals from each educator job assignment group (or the entire population if a group has fewer than 400 educators). The educator job assignment groups was defined to mirror the grade band committees from the job analysis committee meeting, i.e., Elementary (PK–3), Elementary (3–6), Middle Grades (5–9), and Secondary (7–12).

Once the survey was closed, Evaluation Systems summarized data for the two task ratings—relative time spent and importance—both for the total sample and separately for any other appropriate variables.

Additionally, Evaluation Systems conducted analyses based on the joint distribution of time spent and importance ratings for tasks (and work behaviors, if used and suitable) to provide an overall measure of task (and work behavior) criticality to the job. An index of the extent to which survey respondents agreed in their evaluations of importance and relative time spent was also calculated. Evaluation Systems described and documented the data analyses conducted in a final report to the MDE. These data provide the basis for all subsequent linkage studies.

Linkage Studies. For each redeveloped or newly developed MTTC test, Evaluation Systems conducts a linkage study to determine and document the relationship between important knowledge, skills, and abilities (KSAs), as included in the test objectives, and important job tasks (as defined by the job analysis survey results).

An important element of validity in employment settings entails demonstrating that KSAs measured in testing are indeed important to the actual performance of jobs. While the job analysis survey is used to identify critical job tasks for teachers, the linkage study establishes, for each redeveloped or newly developed MTTC test, the basic importance of the KSAs (as outlined in the test framework), comprising the test. That is, the linkage study is undertaken to establish the direct relationship between the objectives measured by MTTC tests and important behavioral aspects of the job of teaching.

Process. For each redeveloped or newly developed MTTC test, members of the CAC are convened to provide linkage judgments. The linkage study activity takes place in conjunction with either the content review of each newly developed set of test objectives, or the standard setting meeting for the test. During the linkage study, Evaluation Systems provides committee members with a description of the job analysis survey design and a summary of the results in order for them to achieve a sound understanding of the research efforts and the meaning of the data collected and summarized. Committee members independently judge the relevance and importance of the objectives and descriptive statements comprising the tests to the critical tasks identified in the job analysis survey.

Materials. Evaluation Systems has developed rating forms, including rating scales and instructions, for committee member use in providing their judgments about the relevance and importance of each objective to each of the tasks in the teacher task list. A sample linkage study response sheet for committee members is shown below.

Sample Linkage Study Response sheet for rating the importance of 3 objectives

Outcomes. Following each linkage study activity, Evaluation Systems analyzes the data captured at the committee meeting following procedures similar to those used for the job analysis survey, and presents the results to the MDE in the form of a report.

Test Item Development and Review, Field Testing, and Marker Response Establishment

Standard 4.7 The procedures used to develop, review, and try out items and to select items from the item pool should be documented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

For test fields being newly developed, the test items are newly written and undergo a set of rigorous reviews, as described in this section. For test fields being updated, either test items are newly developed, or previously written items from the existing test item bank, which exhibit strong performance statistics, are considered for continued use in the updated test item bank. All newly written items undergo a full review process, while items that are pre-existing undergo a validation process to ensure that they remain current and appropriate—that is, that they still meet the item review criteria (as described below under "Content Review of Test Items") in the context of the redeveloped test objectives.

Test Item Preparation

Test item preparation combines the expertise of content specialists (i.e., experts in the field-specific content areas) and experienced item development specialists. Evaluation Systems supervises the item drafting process, which involves program staff, content experts, psychometricians, and item development specialists. Item development teams are provided with program policy materials (e.g., the appropriate Michigan Standards for the Preparation of Teachers link opens in a new window), committee-approved test objectives and assessment specifications, Evaluation Systems item preparation and bias prevention materials, and additional materials as appropriate for the field (e.g., textbooks, online resources).

Before the development of a full item bank for a specific test field, Evaluation Systems provides examples, or prototypes, of the kinds of test items that will be developed for the field, in accordance with the MDE-approved Assessment Specifications. If any fields require the development of constructed-response items, draft rubrics are also developed in concert with constructed-response item prototypes to aid reviewers in providing feedback on those item prototypes. The MDE reviews the item prototypes and provides feedback before item development for a test field begins.

For fields being updated, items that match the new test objectives and have appropriate psychometric characteristics are eligible to be retained in the item bank for validation by MTTC content advisory committees. Additionally, some items from the previous bank are revised and reviewed, along with the newly written items, by the bias review and content advisory committees.

Preliminary versions of test items are reviewed by specialists with content expertise in the appropriate field (e.g., teachers, college faculty, other specialists) as a preliminary check of the items' accuracy, clarity, and freedom from bias.

Bias Review of Test Items

Standard 3.2 Test developers are responsible for developing tests that measure the intended construct and for minimizing the potential for tests being affected by construct-irrelevant characteristics, such as linguistic, communicative, cognitive, cultural, physical, or other characteristics. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The BRC is convened to review draft test items to help safeguard that the test items measure the intended constructs and to minimize characteristics irrelevant to the constructs being measured that could interfere with some test takers' ability to respond. The BRC reviews items according to the following established bias review criteria for the MTTC regarding content, language, offensiveness, and stereotypes:

Content: Does the test item contain content that disadvantages a person based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Language: Does the test item contain language that disadvantages a person based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Offensiveness: Is the test item presented in such a way as to offend a person based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Stereotypes: Does the test item contain language or content that reflects a stereotypical view of a group based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Diversity: Taken as a whole, do the items include content that reflects the diversity of the Michigan population?

The BRC reviews the draft test items with the guidance of Evaluation Systems facilitators and is asked to come to consensus regarding any recommended revisions. The BRC also has the opportunity to submit content-related questions for consideration by the CAC. Recommendations for revisions and content-related questions are presented to the CAC convened for review of the same materials. The CAC is instructed to address all bias-related issues raised by the BRC. If a revision by the CAC differs substantively from what was suggested by the BRC, follow-up is conducted with a member of the BRC to make sure the revision is mutually agreed-upon or the item is deleted.

Content Review of Test Items

Standard 4.8 The test review process should include empirical analyses and/or the use of expert judges to review items and scoring criteria. When expert judges are used, their qualifications, relevant experiences, and demographic characteristics should be documented, along with instructions and training in the item review process that the judges receive. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The CACs, composed of Michigan school teachers and faculty preparing prospective educators associated with the test field, serve as expert judges to review the test items. See Establishing Advisory Committees for further information about the qualifications of the committee members.

CAC members review draft test items according to review criteria established for the MTTC program regarding objective match, accuracy, freedom from bias, and job-relatedness.

For their review of draft test items, committee members apply the following criteria:

Objective Match:

  • Does the item measure an important aspect of the test objectives?
  • Is the level of difficulty appropriate for the testing program?
  • Are the items, as a whole, consistent with the purpose of the MTTC program?

Accuracy:

  • Is the content accurate?
  • Is the terminology in the item correct and appropriate for Michigan?
  • Is the item grammatically correct and clear in meaning?
  • Is the correct response accurately identified?
  • Are the distractors plausible yet clearly incorrect?
  • Are the stem and response alternatives clear in meaning?
  • Is the wording of the item stem free of clues that point toward the correct answer?
  • Is the graphic (if any) accurate and relevant to the item?

Freedom from Bias:

  • Is the item free of language, content, or stereotypes that might potentially disadvantage or offend an individual based on gender, race, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?
  • Are the items, as a whole, fair to all individuals regardless of gender, race, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?
  • As a whole do the items include content that reflects the diversity of the Michigan population?

Job-Relatedness:

  • Is the content job-related?
  • Does the item measure content or skills that an educator needs on the job in Michigan schools?
  • Does the item measure content or skills that an educator should be expected to know in order to perform effectively the job of a qualified Michigan educator (i.e., not learned on the job)?

The CAC reviews and revises the draft test items through a process of discussion and consensus, with the guidance of an Evaluation Systems facilitator. During the discussion, the committee incorporates revisions suggested by the BRC. Following the committee's review of each item and documentation of any changes made to the item, committee members independently provide a validity rating to verify that the final item, as agreed upon in the consensus review, was matched to the objective, accurate, free from bias, and job-related. Committee members also have the opportunity to make additional comments related to the review criteria.

Following the item review meetings, Evaluation Systems reviews the item revisions and validity judgments and revises the test items according to the recommendations of the CACs. Evaluation Systems documents the BRC recommendations and resolutions of the recommendations; additionally, any post-conference editorial revisions (beyond typographical revisions) are documented (e.g., rewording a committee revision for clarity or consistency with other response alternatives). The documentation of revisions is submitted to the MDE for final approval of the test items. The revised test items are then prepared for field testing throughout Michigan.

Field Testing

Standard 4.8 The test review process should include empirical analyses and/or the use of expert judges to review items and scoring criteria.

Standard 4.9 When item or test form tryouts are conducted, the procedures used to select the sample(s) of test takers as well as the resulting characteristics of the sample(s) should be documented. The sample(s) should be as representative as possible of the population(s) for which the test is intended. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

In addition to the review of test items by expert judges (members of the BRC and CACs), when permitted by sufficient candidate populations, empirical data are collected about statistical and qualitative characteristics of the new items through field testing. This additional information is used to refine the item banks before the items are used on a scorable basis on operational test forms.

Due to the nature of particular test fields or certification/endorsement areas, there are a variety of circumstances that may call for varied approaches to field testing. The methods of field testing may include the placement of items as non-scorable on operational test forms, stand-alone field testing, and/or focus group field testing. The method(s) of field testing to be used is generally determined by the number of candidates in the field.

Where the test design and number of candidates allow, new and revised multiple-choice test items for tests being updated are field tested on operational test forms, in non-scorable slots, during regularly scheduled test administrations. Operational test forms may be constricted to include scorable multiple-choice test items and nonscorable multiple-choice test items, with the scorable test items being drawn from the current test item bank, and the nonscorable test items being newly developed test items intended for use with the redeveloped version of the test. Also, once the redeveloped test has been released, any items in the item bank that were not field tested will be included on test forms as nonscorable items only, for the purpose of collecting psychometric data on them. Candidates are unaware which items are scorable and which are non-scorable. Thus, the sample of field test takers mirrors in composition and motivation the operational test takers.

For some fields, stand-alone field testing is conducted with volunteer participants before the items are introduced on operational test forms. Field tests are administered under conditions that mirror an authentic testing situation, to candidates with characteristics generally mirroring those of candidates who plan to eventually take the operational MTTC test.

For test fields with very small numbers of candidates, such that subsequent statistical analyses may not be meaningful, focus group field testing is conducted. During a focus group field test, volunteer candidates participate in an item try-out during which they independently review a test form, answer each item, and provide judgments regarding the difficulty level and clarity of each item. A follow-up structured group interview is conducted to provide supplemental qualitative information related to the clarity, appropriateness, and difficulty of the test items. Focus group field testing is generally scheduled to occur in the days prior to the item review of the test items. In this way, the results of the focus group may be communicated to the associated content advisory committee for consideration in their review of the items.

For both stand-alone and focus group field testing, eligible participants generally include juniors and seniors enrolled in Michigan teacher preparation programs who are planning to seek a Michigan certificate or endorsement in the test being field tested. Volunteer participants are given an incentive for participating, such as a gift card, or a voucher to offset future testing fees.

Field test forms are designed to allow participants to complete the test in a reasonable amount of time, typically one-and-a-half to two-and-a-half hours for a form or set of forms, in order to minimize any effects on the data from participant fatigue. Multiple field test forms are generally prepared for each field to allow for the collection of an adequate level of responses to the field test. If a field test includes constructed-response items, forms with more than one such item are counterbalanced (i.e., the order of the items is reversed on every other form), and field test forms are randomly distributed to participants.

Field test responses to multiple-choice items are scored, and the following item statistics are generated:

  • Individual item p-values (percent correct)
  • Item-to-test point-biserial correlation
  • Distribution of participant responses (percent of participants selecting each response option)
  • Mean score by response choice (average score on the multiple-choice set achieved by all participants selecting each response option)
  • Mantel-Haenszel differential item functioning (DIF) analysis for test fields in which the number of participants in the focal and comparison groups (gender and ethnicity) are of sufficient size

The statistical analyses identify multiple-choice items with one or more of the following characteristics:

  • The percent of the candidates who answered the item correctly is less than 30 (i.e., fewer than 30 percent of candidates selected the response keyed as the correct response)
  • The percent of the candidates who answered the item correctly is greater than 90 (i.e., more than 90 percent of candidates selected the response keyed as the correct response)
  • Nonmodal correct response (i.e., the response chosen by the greatest number of candidates is not the response keyed as the correct response)
  • The item-to-test point biserial correlation is less than .10 and the p-value is less than .90 (provided the number of respondents is greater than 25)
  • Item-to-test point-biserial correlation is negative
  • The Mantel-Haenszel analysis indicates that differential item functioning (DIF) was present
  • Participant feedback indicates a question with the item

Item data for identified items are reviewed, and when warranted, further reviews are conducted, including

  • confirmation that the wording of the item was the same as the wording approved by the CAC,
  • a check of content and correct answer with documentary sources, and/or
  • review by a content expert.

If a field test includes constructed-response items, they are scored by educators meeting the eligibility criteria for MTTC operational scorers. Scoring procedures approximate those of operational administrations. For constructed-response items with 25 or more responses, statistical descriptions and analyses of item performance are produced, including the following:

  • Mean score on the item
  • Standard error of the mean score
  • Standard deviation of the mean score
  • Percent distribution of scores
  • Analysis of variance (ANOVA) to detect item main-effects differences
  • Analysis of variance (ANOVA) for item-by-participant group interactions (provided that the number of responses for each group is greater than or equal to 25)
  • A test to identify items with mean scores that are significantly statistically different from the others
  • Rate of agreement among scorers

In addition, the following qualitative analyses are conducted and reported:

  • Items that elicited a high number of blank, short, incomplete, or low-scoring responses
  • Items that scorers identified as difficult to score
  • Items with a high number of scorer discrepancies
  • Items that participants identified in participant questionnaires as problematic

For fields with insufficient candidate populations to allow statistical analyses of the constructed-response items, an attempt is made to obtain five or more responses to the items for the purpose of conducting a qualitative review. The review of responses is done by educators meeting the eligibility criteria for MTTC operational scorers and focuses on determining whether the items appeared to be clear and answerable to field test participants.

Multiple-choice and constructed-response items with the appropriate statistical characteristics, based on the field test analyses, are included in the final item bank and available for inclusion on operational test forms. Items identified for further review may be deleted or retained, based on the results of the review.

Establishment of Marker Responses for Constructed-Response Items

Standard 4.8 The test review process should include empirical analyses and/or the use of expert judges to review items and scoring criteria. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The establishment of marker responses is an integral part of the process of preparing for operational scoring. As part of the establishment of scoring materials for the constructed-response items, the marker establishment process helps set the criteria and standards for the scoring of examinee responses to constructed-response items. This is accomplished through the identification of a set of responses (i.e., markers) exemplifying each of the score points on the scoring scale. Marker responses are used in scoring written, oral, and videotaped responses to constructed-response items.

The use of the marker responses in the training of scoring personnel, together with the standardized scoring scale, helps to promote continuity and consistency in scoring over time, and across test forms, test administrations, and scorers. The marker responses help to ensure that scores retain a consistent meaning over time, and that candidates' responses are judged similarly regardless of when they take a test or which test form they take.

A subset of the Michigan educators who had participated on the CAC (generally about 6–8 members) meet to review responses to constructed-response items that were typically created by field test participants. Before beginning their task, committee members are familiarized with the test objectives on which the constructed-response items were based, the constructed-response items they previously approved, the test directions, scoring procedures, and the scoring scales for the items. The committee members establish the marker responses through a process of discussion and consensus, with the guidance of an Evaluation Systems facilitator. Committee members select or modify responses from the field test, or create responses if needed.

Establishing Passing Standards

Standard 5.21 When proposed score interpretations involve one or more cut scores, the rationale and procedures used for establishing cut scores should be documented clearly.

Standard 5.22 When cut scores defining pass-fail or proficiency levels are based on direct judgments about the adequacy of item or test performances, the judgmental process should be designed so that the participants providing the judgments can bring their knowledge and experience to bear in a reasonable way. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The information that follows describes the current standard setting procedures and questions, which have been in place since October 2012. See the Archived Descriptions at the end of this section for descriptions of procedures followed and standard setting questions implemented prior to October 2012.

Standard Setting Conference

Following the first operational administration of a new or updated test, committees of Michigan educators meet again to provide judgments that assist in setting the passing standards (also known as passing scores) for each test field.

The goal of standard setting is to identify standards (passing scores) for each test field that would be a fair and reasonable definition of a level of knowledge separating those certificate or endorsement candidates who have the content knowledge necessary to effectively perform the job of a qualified Michigan educator from those who did not. The standard setting process relies on professional judgments informed by input from Michigan educators who have previously participated as content experts on the CAC for the test field. Their judgments are provided to the MDE, who ultimately sets the passing standard for each test.

The procedures used to establish the passing scores are based on a process commonly used for licensing and credentialing tests. The section that follows describes the standard setting procedures currently in use, which were implemented beginning October 2004. (For the procedures used before this, see the Standard Setting Procedures from Program Inception through October 2003 in the Archived Descriptions below.)

A Standard Setting Panel of Michigan educators is convened for each test field to provide judgments to be used in setting the passing score for each test. These panels of up to 22 educators typically include some members from the CAC for the field and, in some cases, BRC members qualified in the field, as well as additional educators meeting the eligibility guidelines. See Establishing Advisory Committees for further information about the Standard Setting Panels.

An iterative procedure is used in which standard-setting ratings are gathered in three rounds, using procedures commonly referred to as a modified Angoff procedure and the extended Angoff procedure. In the first round, panel members provide item-based judgments of the performance of an "effective Michigan educator" on the items from the first operational test form. In the second round, panel members review the results from the initial round of ratings and candidate performance on the items. Panel members are then given an opportunity to make revisions to their individual round-one item ratings. A final round is conducted in which panel members review results from the second round of item-based ratings and individually provide a test-based passing score judgment.

Orientation and training. Panel members are given an orientation that explains the goal and steps of the passing score recommendation process, the materials to be used, and the judgments about test items and the total test that they would be asked to make. Panelists also complete a training exercise, including rating items with a range of item difficulty, to prepare them for the actual rating activity.

Simulated test-taking activity. To familiarize the panel members with the knowledge and skills associated with the test items, each panelist is given a copy of the appropriate field's test framework and participates in a simulated test-taking experience. Panelists are provided with a copy of the first operational test form and asked to read and answer the questions on the test without a key to the correct answers. After panelists complete this activity, they are provided with the answer key (i.e., the correct responses to the questions on the test) and asked to score their answers themselves.

Round one—item-based judgments: multiple-choice items. The Evaluation Systems facilitator provides training in the next step of the process, in which panel members make item-by-item judgments using a modified Angoff procedure. Panel members are asked to make a judgment regarding the performance of "effective Michigan educators" on each test item.

The concept of the "threshold of knowledge required to be an effective Michigan educator" is introduced, with an emphasis on the importance of the panelists defining for themselves the threshold of knowledge needed to perform effectively the job of an educator qualified to receive a certificate in Michigan. Panelists discuss the characteristics of the qualified (effective) candidate, thinking about educators, from candidate to veteran, they may have known whom they consider to be effective teachers.

Panelists are provided with the following description of the hypothetical group of hypothetical individuals that they are asked to envision in making their passing score judgments:

Hypothetical

Acceptable Level of Content Knowledge of an Effective Michigan Educator

A certain amount of content knowledge is required to teach effectively in Michigan's K–12 schools. Individuals seeking a Michigan teaching certificate/endorsement may exceed that level of content knowledge, but the individuals you use as a hypothetical reference group for your judgments should be at the level of content knowledge required to be an effective teacher in that content area.

Individuals seeking teacher certification have varying amounts of knowledge to perform the job of an effective educator in Michigan. These individuals will represent a range of knowledge. Candidates may include those who are more than sufficiently knowledgeable about all aspects of a content field to be an effective educator in that field, and other individuals who have little or no content knowledge in an area.

Somewhere along the range between these two extremes is the threshold of knowledge required to be an effective educator qualified to receive a Michigan teaching certificate. Individuals receiving teaching certificates may exceed the threshold, but none should demonstrate less knowledge than the required threshold. The individuals you use as a hypothetical reference group for your ratings should, at a minimum, meet the threshold of knowledge required to be an effective educator, and those individuals must possess the characteristics listed below.

Please recognize that the point you are defining as the threshold of knowledge required to be an effective educator is not necessarily at the middle of a continuum of content knowledge. An effective educator in Michigan is expected to:

  1. know and effectively teach the content defined by the test objectives;
  2. effectively teach all students at a level in keeping with the high standards set for Michigan K–12 students to graduate career- and college-ready;
  3. effectively teach all possible courses governed by the standards for the content area(s) for this certificate or endorsement; and
  4. effectively teach academically advanced students as well as those who are less academically proficient within the grade levels specified by the certificate or endorsement.

You must define for yourself the threshold of knowledge needed to perform effectively the job of an educator receiving a certificate in Michigan. Each committee will discuss the characteristics of the qualified (effective) candidate. You may find it helpful to think about educators, from candidate to veteran, you have known whom you considered to be effective teachers.

Each panel member indicates on a rating form the percent of this hypothetical reference group who would provide a correct response for each item. Panelists provide an independent rating for each item by answering, in their professional judgment, the following question:

"Imagine a hypothetical group of individuals who have the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What percent of this group would answer this item correctly?"

0%–10% = 1
11%–20% = 2
21%–30% = 3
31%–40% = 4
41%–50% = 5
51%–60% = 6
61%–70% = 7
71%–80% = 8
81%–90% = 9
91%–100% = 10

Round one—item-based judgments: constructed-response items. Panelists make similar judgments regarding constructed-response item(s), using a procedure known as the extended Angoff procedure. The scoring of constructed-response items is explained to panelists. The training includes a review and discussion of the performance characteristics and four-point scoring scale used by scorers, examples of marker responses used to train scorers, how item scores are combined, and the total number of points available for the constructed-response section of the test.

Panel members provide an independent rating by answering, in their professional judgment, the following question:

"Imagine a hypothetical individual who has the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What score represents the level of response that would be achieved by this individual?"

Panel members provide their judgments based on combined item scores (e.g., 8 points for a 4-point item scored by two scorers).

Analysis of round one results. After the panelists complete their multiple-choice and (if applicable) constructed-response item ratings, their rating forms are analyzed. Item Rating Summary Reports are produced for each panelist, containing, for each multiple-choice item and the constructed-response section: a) the panelist's rating of the item or section, b) the median rating of all panelists who rated the item or section, and c) the frequency distribution of the item or section ratings for all panelists. Panelists are given an explanation of how to read and interpret the report, including how the ratings would be translated into recommended performance levels for the test.

Round two—revisions to round one item-based judgments. In the second round of judgments, panel members have the opportunity to revise any of their item ratings from round one. In addition to the Item Rating Summary Reports, if available, item difficulty information for multiple-choice items is provided to panel members in the form of candidate performance statistics from the first operational administration period of the new tests. Panelists review the results from the initial round of ratings and candidate performance on the items and have the opportunity to provide a second rating to replace the first rating for any multiple-choice item and the constructed-response item(s). Changes to ratings are made independently, without discussion with other panelists.

Analysis of round two results. After the panelists complete their second round of multiple-choice and (if applicable) constructed-response item ratings, their rating forms are analyzed. Each panelist's individual item ratings are combined into a score that a hypothetical individual would be expected to achieve on the entire test (if the test includes multiple-choice items only), or on each section of the test (multiple-choice and constructed-response). This score represents the panel's recommended passing score. This recommended passing score was calculated for each panelist individually and for the group of panelists as a whole. The recommended passing score for each panelist is calculated by summing the panelist's individual item ratings.

Item-Based Passing Score Summary Reports are then distributed to panel members for each test field. For multiple-choice items, this report contains the number of scorable items on the test, the number of panelists, the recommended passing score for the multiple-choice section, and the distribution of individual panelists' recommended passing scores (from Step B above), sorted in descending order. For test fields with constructed-response items, a second report is provided. This report contains the recommended passing score for the constructed-response section, and the distribution of individual panelists' recommended passing scores for the constructed-response section.

Round three—test-based passing score recommendations. In addition to the Item-based Passing Score Summary, panel members are provided with the test field's Pass Rate Report, describing the performance of examinees during the first administration period of the test (if available). Panel members are given the opportunity to consider the information in these reports and then provide a test-based passing score (e.g., a passing score of 55 multiple-choice items answered correctly out of a possible 80 scorable multiple-choice items on the test) in response to the following question:

"Imagine a hypothetical individual who has the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What is the number of multiple-choice items on the test that would be answered correctly by this individual?"

For test fields containing only multiple-choice items, this passing score recommendation is considered a passing score recommendation for the entire test. For test fields also containing constructed-response items, this passing score recommendation is considered a passing score recommendation for the multiple-choice section only.

For test fields with constructed-response items, panel members are instructed to make an additional recommendation pertaining to the constructed-response section of the test. Panel members are asked to consider the data in the reports provided and then provide a passing score in response to the following question:

"Imagine a hypothetical individual who has the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What score represents the level of response that would be achieved by this individual?"

Panel members are instructed that their response to this question should be the combined total number of points out of the possible number of points for all of the constructed-response items on the test that represents the level of responses that would be achieved by this individual on the constructed-response items.

Archived Descriptions: Standard Setting Procedures

Note that the steps of the process from October 2004 through November 2010 were the same as those of the current process. However, the question posed to committee members in providing their standard setting ratings, and assumptions made around that question, were different, and were updated to the current version after November 2010 and implemented in October 2012.

Establishing Passing Standards

Following the Standard Setting Conference, Evaluation Systems calculates recommended performance levels for the multiple-choice and constructed-response sections of each test based on the ratings provided by the Standard Setting Panel members. These calculations are based on the panelists' final rating on each item (i.e., either the unchanged first-round rating or the second-round rating if it was different from the first-round rating). See Calculation of Recommended Performance Levels for further information regarding the calculation of qualifying score judgments.

Evaluation Systems provides the Michigan Department of Education (MDE) with a Standard Setting Conference Report describing the participants, process, and results of the standard setting activities, considerations related to measurement error, and use of the passing scores in scoring and reporting the MTTC test. The MDE sets the passing score for each test based upon the panel-based recommendations and other input. The passing score is applied to the first and subsequent operational administration periods for each test.


Top of Page