Test Development Process

Establishing Advisory Committees
Test Objective Development and Review
Content Validation Surveys
Job Analysis Study: Document Linkage between the Test Objectives and the Job of a Teacher in Michigan
Test Item Development and Review, Field Testing, and Marker Response Establishment
Establishing Passing Standards

Standard 1.0 ...[A]ppropriate validity evidence in support of each intended [test score] interpretation should be provided. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests. Validity is, therefore, the most fundamental consideration in developing tests and evaluating tests. The process of validation involves accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014, p. 11)

A passing score on the MTTC test is a requirement for obtaining a teaching certificate or endorsement in Michigan. The validity of the MTTC test is based on the accumulation of evidence that supports the use of the MTTC test for making pass/fail determinations within this context. The process of accumulating validity evidence is interwoven throughout the development of the MTTC tests, including the establishment and participation of advisory committees, the definition of test content, the development of test items, and the establishment of passing standards. The test development process includes steps designed to help ensure that

the test content is aligned with Michigan laws, Michigan standards for teacher certification, and other state documents/policies governing Michigan schools,
the test items assess the defined content accurately, and are job-related and free from bias, and
the passing scores reflect the level appropriate for the use of the MTTC test in making pass/fail determinations as a requirement for receiving a teaching certificate or endorsement in Michigan.

The test development procedures, including the accumulation of validity evidence, are described in the following sections of this manual:

Establishing Advisory Committees
Test Objective Development and Review
Content Validation Surveys
Job Analysis Study: Document Linkage between the Test Objectives and the Job of a Teacher in Michigan
Test Item Development and Review, Field Testing, and Marker Response Establishment
Establishing Passing Standards

Establishing Advisory Committees

Standard 1.9 When a validation rests in part on the opinions or decisions of expert judges, observers, or raters, procedures for selecting such experts and for eliciting judgments or ratings should be fully described. The qualifications and experience of the judges should be presented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Involving Michigan educators in test development activities has been an important component of establishing a validity basis for the MTTC program. Michigan teachers and teacher educators have served on MTTC advisory committees throughout the development of the tests. The involvement of Michigan school teachers and faculty preparing prospective educators contributes to validity by grounding the program in Michigan practice and requirements.

Development of new and updated tests for the MTTC program is a collaborative effort involving the Michigan Department of Education (MDE), the Evaluation Systems group of Pearson (Evaluation Systems), and committees of Michigan teachers and teacher educators, including educators who serve on Content Advisory Committees (CACs) and the Bias Review Committee (BRC).

CACs are charged with reviewing and validating the content of the tests; one CAC is constituted for each test field. Bias prevention is the focus of the BRC, a group of Michigan educators who participate in reviews of test materials throughout the development process. Members of the CACs are involved in making judgments that are provided to the MDE for use in setting passing standards for the tests.

Content Advisory Committees

A Content Advisory Committee (CAC) composed of Michigan educators (up to 18–22) associated with the test field is established for each test field.

The CACs include Michigan school teachers and college and university faculty engaged in the preparation of prospective educators. Nominations for membership on the committees are elicited from school administrators, deans at higher education institutions, school teachers, teacher organizations, academic and professional associations, and other sources specified by the MDE. Evaluation Systems documents the nominations of eligible educators for the MDE, and the MDE reviews educator applications to select the educators to invite to serve on the CACs based on their qualifications (e.g., content training, years of experience, accomplishments).

Committee members are selected to include

school teachers (typically a majority of the members), and
higher education faculty preparing prospective educators (arts and sciences, fine arts, and/or education faculty).

In addition, committee members are selected with consideration given to the following criteria:

Representation from different levels of teaching (i.e., early childhood, elementary, middle, and secondary levels)
Representation from professional associations and other organizations
Representation from diverse racial, ethnic, and cultural groups
Representation from all genders
Geographic representation
Representation from diverse school settings (e.g., urban areas, rural areas, large schools, small schools, charter schools)

The CACs meet during the test development process for the following activities:

Teacher task list review
Test framework/objectives and test specifications review
Test items review and validation
Marker response selection, for those fields with constructed-response items
Standard Setting
Linking activity

Bias Review Committee

A Bias Review Committee (BRC) comprising up to 15 Michigan educators has been established to review new and updated test materials to help prevent potential bias. BRC members are a diverse group of educators who represent individuals with disabilities and the racial, gender, ethnic, and regional diversity of Michigan.

The establishment of the BRC mirrored the process for establishing the CACs. Educators were nominated and encouraged to apply for membership by school administrators, deans at higher education institutions, school teachers, teacher organizations, academic and professional associations, and others sources specified by the MDE. The MDE reviews educator applications to select the educators based on their qualifications to serve on the BRC.

In general, the committee members are selected with consideration given to the following criteria:

Representation from different levels of teaching (e.g., early childhood, elementary, middle, and secondary level)
Representation from diverse racial, ethnic, and cultural groups
Representation from all genders
Geographic representation
Representation from diverse school settings (e.g., urban areas, rural areas, large schools, small schools, charter schools)
Representation from professional associations and other related educational organizations

The BRC meets during the test development process for the following activities:

Review of test objectives and assessment specifications
Review of test items
Marker response selection*
Standard setting*

*For these activities, members of the BRC are invited to participate for test fields in which they are certified and practicing or preparing candidates for certification.

The BRC works on a parallel track with the CACs. Typically, the BRC reviews materials shortly before the materials are reviewed by the CACs. BRC members are provided with a copy of Fairness and Diversity in Tests (Evaluation Systems, 2009) before beginning their work. The BRC recommends revisions, as needed, to the test materials. Their comments and suggestions are communicated to the CAC members, who make revisions to address the issues raised by the BRC. If the revisions by the CAC differ from those suggested by the BRC, a member of the BRC and a member of the CAC are asked to mutually agree on alternative revisions.

Standard Setting Panels

A Standard Setting Panel of up to 22 Michigan educators is established for each test field to provide judgments to be used in setting the passing scores for the tests. Because the number of educators certified and available to participate varies across the different test fields, the number of panel members varies accordingly. These panels typically include some members from the CAC for the field and, in some cases, BRC members qualified in the field, as well as additional educators meeting the same eligibility guidelines as the CAC members.

The selection process for the Standard Setting Panels mirrors the selection process for the CACs.

Panel members are approved by the MDE to include

public school educators (typically a majority of the members), and
higher education faculty preparing prospective educators (arts and sciences, fine arts, and/or education faculty).

In addition, committee members are selected with consideration given to the following criteria:

Representation from different levels of teaching (i.e., early childhood, elementary, middle, and secondary levels)
Representation from professional associations and other organizations
Representation from diverse racial, ethnic, and cultural groups
Representation from all genders
Geographic representation
Representation from diverse school settings (e.g., urban areas, rural areas, large schools, small schools, charter schools)

Members of the panels make recommendations that are used by the MDE, in part, in establishing the passing score for each test.

Test Objective Development and Review

Standard 11.2 Evidence of validity based on test content requires a thorough and explicit definition of the content domain of interest. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Test Objectives

As indicated previously, validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests. The validity evidence for the MTTC tests focuses on the use of the MTTC tests for making pass/fail determinations for the purpose of teacher certification. In order to appropriately make those pass/fail determinations, it is important that the test content be explicitly defined and that it align with relevant Michigan teacher preparation standards and requirements and practice regarding teacher certification. The MTTC Test Objectives serve the purpose of providing explicit descriptions of the content eligible to be included on the tests.

The purposes of the test objectives include

establishing a link between test content and Michigan legal, policy, and regulatory sources;
communicating to policymakers, educators, and other stakeholders how standards and expectations for teachers in Michigan are embodied in the MTTC tests;
presenting an organized summary of content knowledge expectations for candidates preparing to take the test as well as higher education faculty responsible for preparing prospective educators; and
providing a structure for score reporting and score interpretation.

The MTTC test objectives (available on the MTTC program website) include a table indicating the weighting of the content subareas of the test. A sample is provided below.

MATHEMATICS (SECONDARY)

Test objectives weighting by number of questions per subarea

Subarea Range of Objectives Approximate Percentage of Questions on Test

I Mathematical Processes and Number Concepts 001–004 22%

II Patterns, Algebraic Relationships, and Functions 005–009 28%

III Measurement and Geometry 010–013 22%

IV Data Analysis, Statistics, Probability, and Discrete Mathematics 014–018 28%

Test objectives weighting by number of questions per subarea
Subarea	Range of Objectives	Approximate Percentage of Questions on Test
I	Mathematical Processes and Number Concepts	001–004	22%
II	Patterns, Algebraic Relationships, and Functions	005–009	28%
III	Measurement and Geometry	010–013	22%
IV	Data Analysis, Statistics, Probability, and Discrete Mathematics	014–018	28%

The test objectives, contained in the test frameworks, provide the subareas, objectives, and descriptive statements that define the content of the test. A sample is provided below.

Subarea I—MATHEMATICAL PROCESSES AND NUMBER CONCEPTS

Objective 002—Understand problem-solving strategies, connections among different mathematical ideas, and the use of mathematics in other fields.

Includes:

devising, carrying out, and evaluating a problem-solving plan

applying a range of strategies (e.g., drawing a diagram, working backwards, creating a simpler problem) to solve problems

analyzing problems that have multiple solutions

selecting an appropriate tool or technology to solve a given problem

recognizing connections among two or more mathematical concepts (e.g., Fibonacci numbers and the golden rectangle symmetry and group theory)

exploring the relationship between geometry and algebra

applying mathematics across the curriculum and in everyday contexts

Preparation of Test Objectives

As an initial step in preparing the MTTC Test Objectives, Evaluation Systems, in partnership with the Michigan Department of Education (MDE), systematically reviews relevant documents that establish the basis for the content of the tests and incorporates the content of the documents into the draft test objectives. The test objectives reflect the relevant Michigan standards and requirements in each field, focusing mainly on the appropriate Michigan Standards for the Preparation of Teachers link opens in a new window . Evaluation Systems may also review additional documents that were referenced by the State of Michigan in preparing the teacher preparation standards. These documents may range from learning/teaching standards prepared by national organizations, books used in educator preparation programs, and similar resources, as appropriate.

Documentation of Correspondence between Test Objectives and Sources

Standard 11.3 When test content is a primary source of validity evidence in support of the interpretation for the use of a test for employment decisions or credentialing, a close link between test content and the job or professional/occupational requirements should be demonstrated. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Evaluation Systems prepared, as a component of the documentation of the MTTC program for validation purposes, a correlation chart linking the test objectives to Michigan sources from which they were derived. The correlation charts focus on the links between the test objectives and the relevant Michigan standards.

Assessment Specifications

Standard 4.2 In addition to describing intended uses of the test, the test specifications should define the content of the test, the proposed test length, the item formats, the desired psychometric properties of the test items and the test, and the ordering of items and sections. Test specifications should also specify the amount of time allowed for testing; directions for the test takers; procedures to be used for test administration, including permissible variations; any materials to be used; and scoring and reporting procedures. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

Assessment specifications documents that describe major aspects of test design and test administration are prepared at the outset of MTTC test development. The assessment specifications contain two sections: an introductory section designed to be consistent across tests, and a test-specific section. The introductory section, prepared by Evaluation Systems and the MDE, contains information regarding

the background and purpose of the MTTC test;
the purpose of the test objectives, and the structure of the test content into subareas, objectives, and descriptive statements;
test item formats;
bias prevention;
test composition and length;
test administration and testing time;
test scoring; and
test reporting.

The purpose of the introductory section of the assessment specifications is to provide the BRC and CACs with contextual information about the tests and test operations to better enable them to review assessment materials and conduct other test development tasks. Additionally, the information helps preserve consistency across MTTC tests during the development process.

The field-specific Measurement Notes section of the assessment specifications are drafted by Evaluation Systems for review and revision by the BRC and relevant CAC. This section includes information such as field-specific terminology to be used in the test items, resources to be consulted, specifications regarding item stimuli, and constructed-response item guidelines. The purpose of the Measurement Notes is to provide a mechanism for communicating agreed-upon item development specifications to test developers.

Bias Review of Test Objectives and Assessment Specifications

Standard 3.2 Test developers are responsible for developing tests that measure the intended construct and for minimizing the potential for tests being affected by construct-irrelevant characteristics, such as linguistic, communicative, cognitive, cultural, physical, or other characteristics. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The BRC serves a central role in helping to safeguard that the tests measure intended constructs and in minimizing the potential for irrelevant characteristics affecting examinees' scores. The BRC is convened at the beginning of the test development process to review the test objectives and assessment specifications to help determine if the materials contain characteristics irrelevant to the constructs being measured that could interfere with some test takers' ability to respond. The BRC uses bias review criteria established for the MTTC program regarding content, language, offensiveness, and stereotypes. Committee members are asked to review the proposed test objectives (including subareas, objectives, and descriptive statements) and the Measurement Notes section of the assessment specifications according to the following review criteria:

Objectives

Content: Does any element of the objectives or descriptive statements contain content that disadvantages a person because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Language: Does the language used to describe any element of the objectives or descriptive statements disadvantage a person because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Offense: Is any element of the objectives or descriptive statements presented in such a way as to offend a person because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Stereotypes: Does any element of the objectives or descriptive statements contain language or content that reflects a stereotypical view of a group based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Diversity: Does the list of objectives and descriptive statements permit appropriate inclusion of content that reflects the diversity of the Michigan population?

Assessment Specifications

Content: Does any element of the test specifications contain content that disadvantages a person because of her or his gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Language: Does the language used to describe any element of the test specifications disadvantage a person because of her or his gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Offense: Is any element of the test specifications presented in such a way as to offend a person because of her or his gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Stereotypes: Does any element of the test specifications contain language or content that reflects a stereotypical view of a group based on gender, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Diversity: Do the test specifications permit appropriate inclusion of content that reflects the diversity of the Michigan population?

The BRC reviews the draft test objectives and assessment specifications with guidance from Evaluation Systems facilitators, and members are asked to come to consensus regarding any recommended revisions. Recommendations for revisions are presented to the CAC convened for review of the same materials. The CAC is instructed to address all bias-related issues raised by the BRC. If a revision by the CAC differs substantively from what was suggested by the BRC, follow-up is conducted with a member of the BRC to make sure the revision is mutually agreed-upon.

Content Reviews of Test Objectives and Assessment Specifications

The CACs are convened to review the proposed test objectives, including descriptive statements and subareas, as well as the assessment specifications.

For the test objectives, the CAC uses review criteria regarding the structure of the test objectives (including the weighting of the content subareas containing the test objectives) and the content of the objectives. Committee members apply the following review criteria established for the MTTC program related to program purpose, organization, inclusiveness, significance, accuracy, freedom from bias, and job-relatedness:

Structure of Test Objectives

Program Purpose: Is the framework (set of test objectives) consistent with the purpose of the MTTC tests (i.e., to determine whether prospective teachers have the knowledge and skills to perform effectively the job of a qualified educator in Michigan)?

Organization: Is the framework (set of test objectives) organized in a reasonable way? Are the subarea headings accurate and do they clearly describe the content?

Inclusiveness: Is the content of the framework complete? Does the framework (set of test objectives) reflect the knowledge and skills an educator should have in order to teach the content? Is there any content that should be added?

Objectives

Significance: Do the objectives describe knowledge and skills that are important for an educator to have?

Accuracy: Do the objectives accurately reflect the content as it is understood by educators in the field? Are the objectives stated clearly and accurately, using appropriate terminology?

Freedom from Bias: Are the objectives free from elements that might potentially disadvantage an individual because of her or his gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Job-Relatedness: Do the objectives cover important knowledge and skills that an educator should have in order to perform effectively the job of a qualified Michigan Educator?

CAC members are also asked to review the proposed Measurement Notes section of the assessment specifications according to the following review criteria:

Assessment Specifications

Program Purpose: Are the test specifications consistent with the purpose of the MTTC program (i.e., to determine whether prospective teachers have the knowledge and skills to perform effectively the job of a qualified educator in Michigan)?

Significance: Do the test specifications describe knowledge and skills that are important for educators to have?

Accuracy: Do the test specifications accurately reflect the content as it is understood by educators in the field? Are the test specifications stated clearly and accurately, using appropriate terminology?

Freedom from Bias: Are the test specifications free of elements that might potentially disadvantage an individual because of her or his gender, race, ethnicity, nationality, religion, age, disability, or cultural, economic, or geographic background?

Job-Relatedness: Do the test specifications cover important knowledge and skills that an educator should have in order to perform effectively the job of a qualified Michigan educator?

The CAC reviews and revises the draft test objectives and Measurement Notes section of the assessment specifications through a process of discussion and consensus, with the guidance of an Evaluation Systems facilitator. During the committee discussion, they incorporate revisions suggested by the BRC. Following the committee's consensus review and revision of the test objectives, committee members independently provide a validity rating to verify that the final objectives, as agreed upon in the consensus review, are significant, accurate, free from bias, and job-related. Committee members also have the opportunity to make additional comments regarding the test objectives.

Following the review meeting, Evaluation Systems revises the test objectives and assessment specifications according to the recommendations of the CAC. The MDE approves the draft test objectives for use in the content validation survey and the assessment specifications for use in test item development.

Content Validation Surveys

Standard 11.3 When test content is a primary source of validity evidence in support of the interpretation for the use of a test for employment decisions or credentialing, a close link between test content and the job or professional/occupational requirements should be demonstrated. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The MTTC content validation surveys are an important component of the validity evidence in support of the content of the MTTC tests. The surveys validate the test objectives that form the basis of test content by ascertaining that job incumbents (i.e., Michigan school teachers) and educator experts (i.e., educator preparation faculty) consider the content of each test objective important for teaching. The surveys provide additional evidence of linkage of the test content to job requirements, beyond the correlation charts linking the test objectives to the relevant Michigan standards.

The purpose of the survey was to obtain judgments from Michigan school teachers and educator preparation faculty about

the importance of each objective for a qualified educator in Michigan schools;
how well each set of descriptive statements represents important aspects of the corresponding objective; and
how well the set of objectives, as a whole, represents the content knowledge and skills needed for a qualified educator in Michigan schools.

Survey of Michigan School Teachers

To be eligible to respond to the survey of school teachers, an individual needs to be a certified, practicing educator holding a Michigan teaching endorsement and teaching assignment corresponding to the test field. A database of teachers assigned to each teaching field for each school district in Michigan is provided by the Michigan Department of Education (MDE) for use in drawing a random sample of educators. Typically, 200 school teachers are randomly selected for a field (or the entire population, for fields with fewer than 200 educators statewide).

In order to meet expectations established with the MDE for representation of minority educator survey participation, African Americans are sampled for each test field at roughly twice the rate at which they are represented in the population of teachers for the test field. This guideline applies insofar as sufficient numbers of teachers are available in individual fields. The population and proposed sample for each field is reviewed by the MDE.

Advance notification materials are sent to principals of schools with sampled educators. A subsequent mailing to principals includes instruction letters and Web survey access codes for sampled educators. Principals are asked to distribute the materials to the sampled educators.

To determine eligibility to complete the survey, recipients respond to the following question at the beginning of the survey (with slight variations for certain fields): Are you now teaching or have you taught in Michigan as a certified teacher in the field indicated above during this or the previous school year?

Respondents provide background information about their highest level of education attained, gender, race/ethnicity, years of professional teaching experience, primary teaching assignment level, and primary teaching environment.

The school teachers are asked to respond to the following questions (with slight variations for certain fields):

Objective rating question: In your job as a Michigan educator in this field, of how much importance is the objective below to an understanding of the content of this certificate or endorsement area?

1 = no importance
2 = little importance
3 = moderate importance
4 = great importance
5 = very great importance

Descriptive statement rating question: How well does the set of descriptive statements represent important aspects of the knowledge and skills addressed by the objective?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

Overall rating question: How well does the set of objectives as a whole represent important aspects of the knowledge and skills required for performing the job of a qualified Michigan educator in this field?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

For each survey question, participants are asked to provide a comment for any rating less than "3," including noting any additional important areas of content that should be included.

Evaluation Systems monitors the survey access codes and tracks responses by school and educator. A number of follow-up activities are conducted with non-responding schools and educators, including telephone calls, emailing or re-mailing survey materials to schools, and extending the deadline for returns.

Survey of Faculty at Michigan Institutions of Higher Education

A separate content validation survey is conducted with faculty at Michigan colleges and universities offering approved teacher education programs in the specified fields. To be eligible to participate in the survey, a faculty member must be teaching one or more education courses or academic specialization courses in the content fields being surveyed. Responses from up to 100 faculty are targeted for fields where they exist. Typically, institutions are categorized as having high, medium, or low enrollment for a given field, and the number of surveys distributed to each preparation program is based on this categorization.

Advance notification letters are sent to educator preparation program contacts at higher education institutions included in the sample. A subsequent mailing with instruction letters and Web survey access codes is sent to the designated contact person at each institution, along with instructions for identifying faculty members eligible to complete the survey.

To determine eligibility to complete the survey, recipients respond to the following question at the beginning of the survey (with slight variation for certain fields):

During this or the previous school year, are you now teaching or have you taught in Michigan undergraduate or graduate courses in the field indicated above to undergraduate or graduate education candidates?

Respondents provide background information about their highest level of education attained, gender, race/ethnicity, years of teaching at a college or university, grade level(s) for which their candidates are preparing to teach, and college/department affiliation.

Faculty members are asked to respond to the following questions (with slight variations for certain fields):

Objective rating question: To a person preparing for a job as a Michigan educator in the field indicated above, of how much importance is the objective below to an understanding of the content of this certificate or endorsement area?

1 = no importance
2 = little importance
3 = moderate importance
4 = great importance
5 = very great importance

Descriptive statement rating question: How well does the set of descriptive statements represent important aspects of the knowledge and skills addressed by the objective?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

Overall rating question: How well does the set of objectives as a whole represent important aspects of the knowledge and skills required for performing the job of a qualified Michigan educator in this field?

1 = poorly
2 = somewhat
3 = adequately
4 = well
5 = very well

For each survey question, participants are asked to provide a comment for any rating less than "3," including noting any additional areas of content that should be included.

Evaluation Systems monitors the survey access codes and tracks responses by institution and faculty member. A number of follow-up activities are conducted with non-responding institutions and faculty members, including reminder emails, telephone calls, re-providing survey access codes, and extending the deadline for returns.

Analysis of the Content Validation Surveys

Evaluation Systems analyzes the content validation data for each field separately for school teachers and teacher educators. The following reports are produced and provided to the MDE.

Content Validation Survey Population/Sample/Respondents Demographics: Indicates the composition of the educator group for the population, sample, and survey respondents (for school teacher survey only).

Survey Return Rate by Field and Return Status: Indicates the number and percent of surveys distributed and returned.

Demographic Summary Report: Indicates participant responses to the eligibility and background information questions.

Objective Rating Report: Indicates the average importance rating given to each objective and the average across all objectives. For fields in which oversampling of minority groups was conducted for the school teacher survey, weighted data are provided to appropriately take the oversampling into account.

Descriptive Statement Rating Report: Indicates the average importance rating given to each set of descriptive statements and the average across all sets of descriptive statements. For fields in which oversampling of minority groups was conducted for the school teacher survey, weighted data are provided to appropriately take the oversampling into account.

Composite Rating Report: Indicates the average rating given to the set of objectives as a whole. For fields in which oversampling of minority groups was conducted for the school teacher survey, weighted data are provided to appropriately take the oversampling into account.

Respondent comments regarding the objectives, descriptive statements, and set of objectives as a whole are sorted and categorized to facilitate review (e.g., sorted by relevant objective).

The analyses of survey return rates, demographic summaries, survey ratings, and participant comments are provided to the MDE for review. The MDE determines if any changes to test objectives are warranted based on the survey results (e.g., additions or revisions to content or terminology in a descriptive statement). Objectives indicated as important (objectives with mean importance ratings of 3.00 or higher for each respondent group) by both school teachers and teacher educators are considered eligible for inclusion on the test. Any objectives with a mean importance rating of less than 3.00 from either respondent group are identified for further review and discussion (with the MDE and/or MTTC content advisory committee members). For more information, see the summary of content validation survey importance ratings for all fields that have undergone content validation.

Job Analysis Study: Document Linkage between the Test Objectives and the Job of a Teacher in Michigan

As an additional way of gathering validity evidence in support of the use of MTTC tests as a requirement for certification as a Michigan educator, a Job Analysis Study is conducted for all newly developed and redeveloped test fields.

The job analysis activities entail the one-time creation of a job task list via facilitated committee discussion, a one-time job analysis survey based on the job task list, and linkage studies conducted for each test field under development/redevelopment. Linkage studies may occur in conjunction with either the objective review meeting or the standard setting meeting for the associated test field.

The job analysis study entails the following four steps:

Creation of the Michigan-specific Teacher Job Task List
Job Analysis Committee meetings
Job Analysis Survey
Linkage Studies

Creation of the Michigan-specific Teacher Job Task List. In summer of 2018, Evaluation Systems prepared a draft Teacher Job Task List that enumerates and describes the major tasks performed by Michigan educators in the classroom. The Teacher Job Task List includes work activities, abilities, skills, knowledge, tasks, tools, and technology that teachers must master to be effective and proficient in their everyday work. For the purposes of this study, a task was defined as a work activity that has an identifiable beginning and end and results in a meaningful outcome. In the context of education, student learning is arguably the most important outcome to result from an educator's work activities. As a result, the focus of the draft Teacher Job Task List was to include those tasks performed by educators that lead to the desired outcome of student learning.

The following documents were used to inform the creation of a draft Michigan-specific Teacher Job Task List.

High-Leverage Practices online
InTASC Model Core Teaching Standards
Universal Design for Learning (UDL) Guidelines
A Data-Driven Focus on Student Achievement (Marzano)
The 2011 Framework for Teaching Evaluation Instrument (Danielson)
edTPA Assessment Handbooks
U.S. Department of Labor information about what a teacher needs to do (from O*NET Online)

Job Analysis Committee meetings. Once the draft Michigan Teacher Job Task List was developed, Job Analysis Committee meetings were conducted to create a final task list to serve as the basis for a Job Analysis Survey, pending MDE approval.

Overall structure. Four job analysis committees (representing grade bands PK–3, 3–6, 5–9, and 7–12, to reflect the newly emerging Michigan teacher certification structure) were assembled for the review of the draft teacher task list. Each committee included approximately eight teachers with certificates/endorsements across content areas, as appropriate.

Participant Recruitment. Evaluation Systems collaborated with the MDE to complete the recruitment process. Potential participants were recruited based on the parameters of the new state teacher certification structure. Any teachers who qualify for potential committee participation were submitted for MDE review.

Process. Each job analysis committee meeting began with an orientation explaining the purpose of the Job Analysis Study and the role of the committee in reviewing the draft Michigan Teacher Job Task List. The committees reviewed the draft Michigan Teacher Job Task List, adding, deleting, or revising as they saw fit to represent their work as teachers. Any recommended revisions to the task list were documented by each committee, and participants noted their approval for the record.

After each of the four grade band committees met and edited the task list, two representatives from each committee were convened into a new committee (a "consensus" committee). The consensus committee reviewed all edits from each of the individual grade band committees and came to consensus on the content of the final Michigan Teacher Job Task List. This final list is intended to represent the work of all teachers across the state of Michigan, regardless of grade level or teaching assignment. The MDE approved the final Michigan Teacher Job Task List for use in the Job Analysis Survey.

Job Analysis Survey. Evaluation Systems developed, from the MDE approved Michigan Teacher Job Task List, a job analysis survey. Respondents were asked to provide ratings, for each task in the Teacher Job Task List, indicating the relative time spent on the task, and the importance of the task.

The job analysis survey respondent sampling plan sought survey respondents constituting a relatively large and representative sample of teachers, as determined by the MDE and Evaluation Systems (based on certificate/endorsement area, demographic characteristics, job specialty areas, etc.). A stratified random sampling design, as is used for content validation surveys, was employed. The target survey sample included approximately 400 individuals from each educator job assignment group (or the entire population if a group has fewer than 400 educators). The educator job assignment groups was defined to mirror the grade band committees from the job analysis committee meeting, i.e., Elementary (PK–3), Elementary (3–6), Middle Grades (5–9), and Secondary (7–12).

Once the survey was closed, Evaluation Systems summarized data for the two task ratings—relative time spent and importance—both for the total sample and separately for any other appropriate variables.

Additionally, Evaluation Systems conducted analyses based on the joint distribution of time spent and importance ratings for tasks (and work behaviors, if used and suitable) to provide an overall measure of task (and work behavior) criticality to the job. An index of the extent to which survey respondents agreed in their evaluations of importance and relative time spent was also calculated. Evaluation Systems described and documented the data analyses conducted in a final report to the MDE. These data provide the basis for all subsequent linkage studies.

Linkage Studies. For each redeveloped or newly developed MTTC test, Evaluation Systems conducts a linkage study to determine and document the relationship between important knowledge, skills, and abilities (KSAs), as included in the test objectives, and important job tasks (as defined by the job analysis survey results).

An important element of validity in employment settings entails demonstrating that KSAs measured in testing are indeed important to the actual performance of jobs. While the job analysis survey is used to identify critical job tasks for teachers, the linkage study establishes, for each redeveloped or newly developed MTTC test, the basic importance of the KSAs (as outlined in the test framework), comprising the test. That is, the linkage study is undertaken to establish the direct relationship between the objectives measured by MTTC tests and important behavioral aspects of the job of teaching.

Process. For each redeveloped or newly developed MTTC test, members of the CAC are convened to provide linkage judgments. The linkage study activity takes place in conjunction with either the content review of each newly developed set of test objectives, or the standard setting meeting for the test. During the linkage study, Evaluation Systems provides committee members with a description of the job analysis survey design and a summary of the results in order for them to achieve a sound understanding of the research efforts and the meaning of the data collected and summarized. Committee members independently judge the relevance and importance of the objectives and descriptive statements comprising the tests to the critical tasks identified in the job analysis survey.

Materials. Evaluation Systems has developed rating forms, including rating scales and instructions, for committee member use in providing their judgments about the relevance and importance of each objective to each of the tasks in the teacher task list. A sample linkage study response sheet for committee members is shown below.

Sample Linkage Study Response sheet for rating the importance of 3 objectives

Outcomes. Following each linkage study activity, Evaluation Systems analyzes the data captured at the committee meeting following procedures similar to those used for the job analysis survey, and presents the results to the MDE in the form of a report.

Test Item Development and Review, Field Testing, and Marker Response Establishment

Standard 4.7 The procedures used to develop, review, and try out items and to select items from the item pool should be documented. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

For test fields being newly developed, the test items are newly written and undergo a set of rigorous reviews, as described in this section. For test fields being updated, either test items are newly developed, or previously written items from the existing test item bank, which exhibit strong performance statistics, are considered for continued use in the updated test item bank. All newly written items undergo a full review process, while items that are pre-existing undergo a validation process to ensure that they remain current and appropriate—that is, that they still meet the item review criteria (as described below under "Content Review of Test Items") in the context of the redeveloped test objectives.

Test Item Preparation

Test item preparation combines the expertise of content specialists (i.e., experts in the field-specific content areas) and experienced item development specialists. Evaluation Systems supervises the item drafting process, which involves program staff, content experts, psychometricians, and item development specialists. Item development teams are provided with program policy materials (e.g., the appropriate Michigan Standards for the Preparation of Teachers link opens in a new window ), committee-approved test objectives and assessment specifications, Evaluation Systems item preparation and bias prevention materials, and additional materials as appropriate for the field (e.g., textbooks, online resources).

Before the development of a full item bank for a specific test field, Evaluation Systems provides examples, or prototypes, of the kinds of test items that will be developed for the field, in accordance with the MDE-approved Assessment Specifications. If any fields require the development of constructed-response items, draft rubrics are also developed in concert with constructed-response item prototypes to aid reviewers in providing feedback on those item prototypes. The MDE reviews the item prototypes and provides feedback before item development for a test field begins.

For fields being updated, items that match the new test objectives and have appropriate psychometric characteristics are eligible to be retained in the item bank for validation by MTTC content advisory committees. Additionally, some items from the previous bank are revised and reviewed, along with the newly written items, by the bias review and content advisory committees.

Preliminary versions of test items are reviewed by specialists with content expertise in the appropriate field (e.g., teachers, college faculty, other specialists) as a preliminary check of the items' accuracy, clarity, and freedom from bias.

Bias Review of Test Items

Standard 3.2 Test developers are responsible for developing tests that measure the intended construct and for minimizing the potential for tests being affected by construct-irrelevant characteristics, such as linguistic, communicative, cognitive, cultural, physical, or other characteristics. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The BRC is convened to review draft test items to help safeguard that the test items measure the intended constructs and to minimize characteristics irrelevant to the constructs being measured that could interfere with some test takers' ability to respond. The BRC reviews items according to the following established bias review criteria for the MTTC regarding content, language, offensiveness, and stereotypes:

Content: Does the test item contain content that disadvantages a person based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Language: Does the test item contain language that disadvantages a person based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Offensiveness: Is the test item presented in such a way as to offend a person based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Stereotypes: Does the test item contain language or content that reflects a stereotypical view of a group based on gender, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Diversity: Taken as a whole, do the items include content that reflects the diversity of the Michigan population?

The BRC reviews the draft test items with the guidance of Evaluation Systems facilitators and is asked to come to consensus regarding any recommended revisions. The BRC also has the opportunity to submit content-related questions for consideration by the CAC. Recommendations for revisions and content-related questions are presented to the CAC convened for review of the same materials. The CAC is instructed to address all bias-related issues raised by the BRC. If a revision by the CAC differs substantively from what was suggested by the BRC, follow-up is conducted with a member of the BRC to make sure the revision is mutually agreed-upon or the item is deleted.

Content Review of Test Items

Standard 4.8 The test review process should include empirical analyses and/or the use of expert judges to review items and scoring criteria. When expert judges are used, their qualifications, relevant experiences, and demographic characteristics should be documented, along with instructions and training in the item review process that the judges receive. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The CACs, composed of Michigan school teachers and faculty preparing prospective educators associated with the test field, serve as expert judges to review the test items. See Establishing Advisory Committees for further information about the qualifications of the committee members.

CAC members review draft test items according to review criteria established for the MTTC program regarding objective match, accuracy, freedom from bias, and job-relatedness.

For their review of draft test items, committee members apply the following criteria:

Objective Match:

Does the item measure an important aspect of the test objectives?

Is the level of difficulty appropriate for the testing program?

Are the items, as a whole, consistent with the purpose of the MTTC program?

Accuracy:

Is the content accurate?

Is the terminology in the item correct and appropriate for Michigan?

Is the item grammatically correct and clear in meaning?

Is the correct response accurately identified?

Are the distractors plausible yet clearly incorrect?

Are the stem and response alternatives clear in meaning?

Is the wording of the item stem free of clues that point toward the correct answer?

Is the graphic (if any) accurate and relevant to the item?

Freedom from Bias:

Is the item free of language, content, or stereotypes that might potentially disadvantage or offend an individual based on gender, race, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

Are the items, as a whole, fair to all individuals regardless of gender, race, sexual orientation, gender identity or expression, race, nationality, ethnicity, religion, age, disability, or cultural, economic, or geographic background?

As a whole do the items include content that reflects the diversity of the Michigan population?

Job-Relatedness:

Is the content job-related?

Does the item measure content or skills that an educator needs on the job in Michigan schools?

Does the item measure content or skills that an educator should be expected to know in order to perform effectively the job of a qualified Michigan educator (i.e., not learned on the job)?

The CAC reviews and revises the draft test items through a process of discussion and consensus, with the guidance of an Evaluation Systems facilitator. During the discussion, the committee incorporates revisions suggested by the BRC. Following the committee's review of each item and documentation of any changes made to the item, committee members independently provide a validity rating to verify that the final item, as agreed upon in the consensus review, was matched to the objective, accurate, free from bias, and job-related. Committee members also have the opportunity to make additional comments related to the review criteria.

Following the item review meetings, Evaluation Systems reviews the item revisions and validity judgments and revises the test items according to the recommendations of the CACs. Evaluation Systems documents the BRC recommendations and resolutions of the recommendations; additionally, any post-conference editorial revisions (beyond typographical revisions) are documented (e.g., rewording a committee revision for clarity or consistency with other response alternatives). The documentation of revisions is submitted to the MDE for final approval of the test items. The revised test items are then prepared for field testing throughout Michigan.

Field Testing

Standard 4.8 The test review process should include empirical analyses and/or the use of expert judges to review items and scoring criteria.

Standard 4.9 When item or test form tryouts are conducted, the procedures used to select the sample(s) of test takers as well as the resulting characteristics of the sample(s) should be documented. The sample(s) should be as representative as possible of the population(s) for which the test is intended. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

In addition to the review of test items by expert judges (members of the BRC and CACs), when permitted by sufficient candidate populations, empirical data are collected about statistical and qualitative characteristics of the new items through field testing. This additional information is used to refine the item banks before the items are used on a scorable basis on operational test forms.

Due to the nature of particular test fields or certification/endorsement areas, there are a variety of circumstances that may call for varied approaches to field testing. The methods of field testing may include the placement of items as non-scorable on operational test forms, stand-alone field testing, and/or focus group field testing. The method(s) of field testing to be used is generally determined by the number of candidates in the field.

Where the test design and number of candidates allow, new and revised multiple-choice test items for tests being updated are field tested on operational test forms, in non-scorable slots, during regularly scheduled test administrations. Operational test forms may be constricted to include scorable multiple-choice test items and nonscorable multiple-choice test items, with the scorable test items being drawn from the current test item bank, and the nonscorable test items being newly developed test items intended for use with the redeveloped version of the test. Also, once the redeveloped test has been released, any items in the item bank that were not field tested will be included on test forms as nonscorable items only, for the purpose of collecting psychometric data on them. Candidates are unaware which items are scorable and which are non-scorable. Thus, the sample of field test takers mirrors in composition and motivation the operational test takers.

For some fields, stand-alone field testing is conducted with volunteer participants before the items are introduced on operational test forms. Field tests are administered under conditions that mirror an authentic testing situation, to candidates with characteristics generally mirroring those of candidates who plan to eventually take the operational MTTC test.

For test fields with very small numbers of candidates, such that subsequent statistical analyses may not be meaningful, focus group field testing is conducted. During a focus group field test, volunteer candidates participate in an item try-out during which they independently review a test form, answer each item, and provide judgments regarding the difficulty level and clarity of each item. A follow-up structured group interview is conducted to provide supplemental qualitative information related to the clarity, appropriateness, and difficulty of the test items. Focus group field testing is generally scheduled to occur in the days prior to the item review of the test items. In this way, the results of the focus group may be communicated to the associated content advisory committee for consideration in their review of the items.

For both stand-alone and focus group field testing, eligible participants generally include juniors and seniors enrolled in Michigan teacher preparation programs who are planning to seek a Michigan certificate or endorsement in the test being field tested. Volunteer participants are given an incentive for participating, such as a gift card, or a voucher to offset future testing fees.

Field test forms are designed to allow participants to complete the test in a reasonable amount of time, typically one-and-a-half to two-and-a-half hours for a form or set of forms, in order to minimize any effects on the data from participant fatigue. Multiple field test forms are generally prepared for each field to allow for the collection of an adequate level of responses to the field test. If a field test includes constructed-response items, forms with more than one such item are counterbalanced (i.e., the order of the items is reversed on every other form), and field test forms are randomly distributed to participants.

Field test responses to multiple-choice items are scored, and the following item statistics are generated:

Individual item p-values (percent correct)

Item-to-test point-biserial correlation

Distribution of participant responses (percent of participants selecting each response option)

Mean score by response choice (average score on the multiple-choice set achieved by all participants selecting each response option)

Mantel-Haenszel differential item functioning (DIF) analysis for test fields in which the number of participants in the focal and comparison groups (gender and ethnicity) are of sufficient size

The statistical analyses identify multiple-choice items with one or more of the following characteristics:

The percent of the candidates who answered the item correctly is less than 30 (i.e., fewer than 30 percent of candidates selected the response keyed as the correct response)

The percent of the candidates who answered the item correctly is greater than 90 (i.e., more than 90 percent of candidates selected the response keyed as the correct response)

Nonmodal correct response (i.e., the response chosen by the greatest number of candidates is not the response keyed as the correct response)

The item-to-test point biserial correlation is less than .10 and the p-value is less than .90 (provided the number of respondents is greater than 25)

Item-to-test point-biserial correlation is negative

The Mantel-Haenszel analysis indicates that differential item functioning (DIF) was present

Participant feedback indicates a question with the item

Item data for identified items are reviewed, and when warranted, further reviews are conducted, including

confirmation that the wording of the item was the same as the wording approved by the CAC,
a check of content and correct answer with documentary sources, and/or
review by a content expert.

If a field test includes constructed-response items, they are scored by educators meeting the eligibility criteria for MTTC operational scorers. Scoring procedures approximate those of operational administrations. For constructed-response items with 25 or more responses, statistical descriptions and analyses of item performance are produced, including the following:

Mean score on the item

Standard error of the mean score

Standard deviation of the mean score

Percent distribution of scores

Analysis of variance (ANOVA) to detect item main-effects differences

Analysis of variance (ANOVA) for item-by-participant group interactions (provided that the number of responses for each group is greater than or equal to 25)

A test to identify items with mean scores that are significantly statistically different from the others

Rate of agreement among scorers

In addition, the following qualitative analyses are conducted and reported:

Items that elicited a high number of blank, short, incomplete, or low-scoring responses

Items that scorers identified as difficult to score

Items with a high number of scorer discrepancies

Items that participants identified in participant questionnaires as problematic

For fields with insufficient candidate populations to allow statistical analyses of the constructed-response items, an attempt is made to obtain five or more responses to the items for the purpose of conducting a qualitative review. The review of responses is done by educators meeting the eligibility criteria for MTTC operational scorers and focuses on determining whether the items appeared to be clear and answerable to field test participants.

Multiple-choice and constructed-response items with the appropriate statistical characteristics, based on the field test analyses, are included in the final item bank and available for inclusion on operational test forms. Items identified for further review may be deleted or retained, based on the results of the review.

Establishment of Marker Responses for Constructed-Response Items

Standard 4.8 The test review process should include empirical analyses and/or the use of expert judges to review items and scoring criteria. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The establishment of marker responses is an integral part of the process of preparing for operational scoring. As part of the establishment of scoring materials for the constructed-response items, the marker establishment process helps set the criteria and standards for the scoring of examinee responses to constructed-response items. This is accomplished through the identification of a set of responses (i.e., markers) exemplifying each of the score points on the scoring scale. Marker responses are used in scoring written, oral, and videotaped responses to constructed-response items.

The use of the marker responses in the training of scoring personnel, together with the standardized scoring scale, helps to promote continuity and consistency in scoring over time, and across test forms, test administrations, and scorers. The marker responses help to ensure that scores retain a consistent meaning over time, and that candidates' responses are judged similarly regardless of when they take a test or which test form they take.

A subset of the Michigan educators who had participated on the CAC (generally about 6–8 members) meet to review responses to constructed-response items that were typically created by field test participants. Before beginning their task, committee members are familiarized with the test objectives on which the constructed-response items were based, the constructed-response items they previously approved, the test directions, scoring procedures, and the scoring scales for the items. The committee members establish the marker responses through a process of discussion and consensus, with the guidance of an Evaluation Systems facilitator. Committee members select or modify responses from the field test, or create responses if needed.

Establishing Passing Standards

Standard 5.21 When proposed score interpretations involve one or more cut scores, the rationale and procedures used for establishing cut scores should be documented clearly.

Standard 5.22 When cut scores defining pass-fail or proficiency levels are based on direct judgments about the adequacy of item or test performances, the judgmental process should be designed so that the participants providing the judgments can bring their knowledge and experience to bear in a reasonable way. Standards for Educational and Psychological Testing (AERA, APA & NCME, 2014)

The information that follows describes the current standard setting procedures and questions, which have been in place since October 2012. See the Archived Descriptions at the end of this section for descriptions of procedures followed and standard setting questions implemented prior to October 2012.

Standard Setting Conference

Following the first operational administration of a new or updated test, committees of Michigan educators meet again to provide judgments that assist in setting the passing standards (also known as passing scores) for each test field.

The goal of standard setting is to identify standards (passing scores) for each test field that would be a fair and reasonable definition of a level of knowledge separating those certificate or endorsement candidates who have the content knowledge necessary to effectively perform the job of a qualified Michigan educator from those who did not. The standard setting process relies on professional judgments informed by input from Michigan educators who have previously participated as content experts on the CAC for the test field. Their judgments are provided to the MDE, who ultimately sets the passing standard for each test.

The procedures used to establish the passing scores are based on a process commonly used for licensing and credentialing tests. The section that follows describes the standard setting procedures currently in use, which were implemented beginning October 2004. (For the procedures used before this, see the Standard Setting Procedures from Program Inception through October 2003 in the Archived Descriptions below.)

A Standard Setting Panel of Michigan educators is convened for each test field to provide judgments to be used in setting the passing score for each test. These panels of up to 22 educators typically include some members from the CAC for the field and, in some cases, BRC members qualified in the field, as well as additional educators meeting the eligibility guidelines. See Establishing Advisory Committees for further information about the Standard Setting Panels.

An iterative procedure is used in which standard-setting ratings are gathered in three rounds, using procedures commonly referred to as a modified Angoff procedure and the extended Angoff procedure. In the first round, panel members provide item-based judgments of the performance of an "effective Michigan educator" on the items from the first operational test form. In the second round, panel members review the results from the initial round of ratings and candidate performance on the items. Panel members are then given an opportunity to make revisions to their individual round-one item ratings. A final round is conducted in which panel members review results from the second round of item-based ratings and individually provide a test-based passing score judgment.

Orientation and training. Panel members are given an orientation that explains the goal and steps of the passing score recommendation process, the materials to be used, and the judgments about test items and the total test that they would be asked to make. Panelists also complete a training exercise, including rating items with a range of item difficulty, to prepare them for the actual rating activity.

Simulated test-taking activity. To familiarize the panel members with the knowledge and skills associated with the test items, each panelist is given a copy of the appropriate field's test framework and participates in a simulated test-taking experience. Panelists are provided with a copy of the first operational test form and asked to read and answer the questions on the test without a key to the correct answers. After panelists complete this activity, they are provided with the answer key (i.e., the correct responses to the questions on the test) and asked to score their answers themselves.

Round one—item-based judgments: multiple-choice items. The Evaluation Systems facilitator provides training in the next step of the process, in which panel members make item-by-item judgments using a modified Angoff procedure. Panel members are asked to make a judgment regarding the performance of "effective Michigan educators" on each test item.

The concept of the "threshold of knowledge required to be an effective Michigan educator" is introduced, with an emphasis on the importance of the panelists defining for themselves the threshold of knowledge needed to perform effectively the job of an educator qualified to receive a certificate in Michigan. Panelists discuss the characteristics of the qualified (effective) candidate, thinking about educators, from candidate to veteran, they may have known whom they consider to be effective teachers.

Panelists are provided with the following description of the hypothetical group of hypothetical individuals that they are asked to envision in making their passing score judgments:

Hypothetical

Acceptable Level of Content Knowledge of an Effective Michigan Educator

A certain amount of content knowledge is required to teach effectively in Michigan's K–12 schools. Individuals seeking a Michigan teaching certificate/endorsement may exceed that level of content knowledge, but the individuals you use as a hypothetical reference group for your judgments should be at the level of content knowledge required to be an effective teacher in that content area.

Individuals seeking teacher certification have varying amounts of knowledge to perform the job of an effective educator in Michigan. These individuals will represent a range of knowledge. Candidates may include those who are more than sufficiently knowledgeable about all aspects of a content field to be an effective educator in that field, and other individuals who have little or no content knowledge in an area.

Somewhere along the range between these two extremes is the threshold of knowledge required to be an effective educator qualified to receive a Michigan teaching certificate. Individuals receiving teaching certificates may exceed the threshold, but none should demonstrate less knowledge than the required threshold. The individuals you use as a hypothetical reference group for your ratings should, at a minimum, meet the threshold of knowledge required to be an effective educator, and those individuals must possess the characteristics listed below.

Please recognize that the point you are defining as the threshold of knowledge required to be an effective educator is not necessarily at the middle of a continuum of content knowledge. An effective educator in Michigan is expected to:

know and effectively teach the content defined by the test objectives;

effectively teach all students at a level in keeping with the high standards set for Michigan K–12 students to graduate career- and college-ready;

effectively teach all possible courses governed by the standards for the content area(s) for this certificate or endorsement; and

effectively teach academically advanced students as well as those who are less academically proficient within the grade levels specified by the certificate or endorsement.

You must define for yourself the threshold of knowledge needed to perform effectively the job of an educator receiving a certificate in Michigan. Each committee will discuss the characteristics of the qualified (effective) candidate. You may find it helpful to think about educators, from candidate to veteran, you have known whom you considered to be effective teachers.

Each panel member indicates on a rating form the percent of this hypothetical reference group who would provide a correct response for each item. Panelists provide an independent rating for each item by answering, in their professional judgment, the following question:

"Imagine a hypothetical group of individuals who have the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What percent of this group would answer this item correctly?"

0%–10% = 1
11%–20% = 2
21%–30% = 3
31%–40% = 4
41%–50% = 5

51%–60% = 6
61%–70% = 7
71%–80% = 8
81%–90% = 9
91%–100% = 10

Round one—item-based judgments: constructed-response items. Panelists make similar judgments regarding constructed-response item(s), using a procedure known as the extended Angoff procedure. The scoring of constructed-response items is explained to panelists. The training includes a review and discussion of the performance characteristics and four-point scoring scale used by scorers, examples of marker responses used to train scorers, how item scores are combined, and the total number of points available for the constructed-response section of the test.

Panel members provide an independent rating by answering, in their professional judgment, the following question:

"Imagine a hypothetical individual who has the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What score represents the level of response that would be achieved by this individual?"

Panel members provide their judgments based on combined item scores (e.g., 8 points for a 4-point item scored by two scorers).

Analysis of round one results. After the panelists complete their multiple-choice and (if applicable) constructed-response item ratings, their rating forms are analyzed. Item Rating Summary Reports are produced for each panelist, containing, for each multiple-choice item and the constructed-response section: a) the panelist's rating of the item or section, b) the median rating of all panelists who rated the item or section, and c) the frequency distribution of the item or section ratings for all panelists. Panelists are given an explanation of how to read and interpret the report, including how the ratings would be translated into recommended performance levels for the test.

Round two—revisions to round one item-based judgments. In the second round of judgments, panel members have the opportunity to revise any of their item ratings from round one. In addition to the Item Rating Summary Reports, if available, item difficulty information for multiple-choice items is provided to panel members in the form of candidate performance statistics from the first operational administration period of the new tests. Panelists review the results from the initial round of ratings and candidate performance on the items and have the opportunity to provide a second rating to replace the first rating for any multiple-choice item and the constructed-response item(s). Changes to ratings are made independently, without discussion with other panelists.

Analysis of round two results. After the panelists complete their second round of multiple-choice and (if applicable) constructed-response item ratings, their rating forms are analyzed. Each panelist's individual item ratings are combined into a score that a hypothetical individual would be expected to achieve on the entire test (if the test includes multiple-choice items only), or on each section of the test (multiple-choice and constructed-response). This score represents the panel's recommended passing score. This recommended passing score was calculated for each panelist individually and for the group of panelists as a whole. The recommended passing score for each panelist is calculated by summing the panelist's individual item ratings.

Item-Based Passing Score Summary Reports are then distributed to panel members for each test field. For multiple-choice items, this report contains the number of scorable items on the test, the number of panelists, the recommended passing score for the multiple-choice section, and the distribution of individual panelists' recommended passing scores (from Step B above), sorted in descending order. For test fields with constructed-response items, a second report is provided. This report contains the recommended passing score for the constructed-response section, and the distribution of individual panelists' recommended passing scores for the constructed-response section.

Round three—test-based passing score recommendations. In addition to the Item-based Passing Score Summary, panel members are provided with the test field's Pass Rate Report, describing the performance of examinees during the first administration period of the test (if available). Panel members are given the opportunity to consider the information in these reports and then provide a test-based passing score (e.g., a passing score of 55 multiple-choice items answered correctly out of a possible 80 scorable multiple-choice items on the test) in response to the following question:

"Imagine a hypothetical individual who has the level of content knowledge required to perform effectively the job of a qualified Michigan educator in this certificate/endorsement area. What is the number of multiple-choice items on the test that would be answered correctly by this individual?"

For test fields containing only multiple-choice items, this passing score recommendation is considered a passing score recommendation for the entire test. For test fields also containing constructed-response items, this passing score recommendation is considered a passing score recommendation for the multiple-choice section only.

For test fields with constructed-response items, panel members are instructed to make an additional recommendation pertaining to the constructed-response section of the test. Panel members are asked to consider the data in the reports provided and then provide a passing score in response to the following question:

Panel members are instructed that their response to this question should be the combined total number of points out of the possible number of points for all of the constructed-response items on the test that represents the level of responses that would be achieved by this individual on the constructed-response items.

Archived Descriptions: Standard Setting Procedures

Standard Setting Procedures from Program Inception through October 2003

The section that follows describes the procedures used from the inception of the program through October 2003. In October 2004, an updated standard-setting procedure was implemented. Following the November 2010 standard setting, the question committee members answered in providing standard setting ratings was updated; this change is described in a separate section entitled "Standard Setting Procedures from October 2004 through November 2010".

Standard Setting Procedures through October 2003

Committee selection and notification. Content Advisory Committee members who participated in the review of test objectives and test items were invited to participate in an Item Validation and Standard Setting Conference. Each committee included school educators and teacher educators from various geographical, ethnic, and racial backgrounds, with content-specific expertise in the particular field that they reviewed.

Orientation and training. A representative of the MDE welcomed committee members, provided background information on MTTC program legislation and policies, and introduced the Evaluation Systems representatives.

An Evaluation Systems representative provided background information on program activities completed to date, an overview of the conference goals, and step-by-step training for completion of the item validation and standard setting tasks. The training included:

instructions on completing the Personal Information Form, Agreement of Security and Confidentiality, Expense Statement, and Substitute Reimbursement form;
a description and explanation of the types of materials that were used during the conference;
an explanation of procedures for completing the item validation and standard setting tasks; and
a discussion of the practice exercise.

Each panel member was provided with a training manual that contained materials describing the conference procedures and review tasks. The manual included the following items.

Descriptions of the various conference materials, including the following documents:

Test Framework;
Item Review Booklet;
Item Statistics Report, which included, for field test data, the sequence number, objective number, correct response, number of examinees responding, percent of examinees responding correctly (p-value), and distribution of responses of each item for which field test data were available;
Item Rating Form; and
Item Comment Form.

Description of the item validation rating and standard setting procedures, including:

description (with examples) of how to rate each item with respect to validity;
description of how to handle special characteristics of items (i.e., typographical errors, misspelled words, illustrations that need clarification);
instructions on how to indicate the reason(s) for rating an item as Not Valid (relative to four validity criteria), with an example for each criterion; and
description (with examples) of how to rate each item with respect to standard setting.

Practice materials

The training session included a review of the training manual. Throughout the training process, committee members were referred to specific sections of the training manual and were given opportunities to review their instructions, ask questions, and demonstrate their understanding of the procedures.

When referring to field test data, committee members were reminded that the field test data should be interpreted with caution. For example, field test data indicating the percent of examinees responding correctly reflect all examinees, not just minimally competent examinees.

Committee members participated in a practice exercise. They were instructed to read sample items in the training manual and to complete the sample rating form. Evaluation Systems staff discussed the ratings for each sample item. The sample items demonstrated different aspects of the final item validation and standard setting process. Item validation criteria: content-area tests. Committee members used the following criteria to make the item validation ratings for content-area tests.

Objective Match

Does the item measure an important aspect of the objective?

Accuracy

Is the content of the item accurate and is the one correct or best answer designated?

Freedom from Bias

Is the item free of language or content that would disadvantage any person because of his or her gender, race, nationality, ethnicity, age, religion, handicapping condition, or cultural, economic, or geographical background?

Job-Relatedness

Does the item measure content knowledge important for the job of a Michigan entry-level educator in this content area?

Making item validation ratings. Committee members independently reviewed test items and made item validity ratings. An item was rated valid if it met the four criteria for item validity presented above: objective match, accuracy, freedom from bias, and job-relatedness. If a committee member judged an item or assignment as not valid, he or she was asked to indicate the reason(s) for considering the item invalid. Reasons for rating an item not valid included the following.

the item does not match the objective
the item is not accurate
the item is not free of bias
the item does not measure knowledge important for the job of a Michigan educator

Making standard setting judgments. Committee members independently reviewed test items and made standard setting judgments. The standard setting approach used was based on the procedures suggested by Angoff (1971). Committee members answered the following question for each multiple-choice item on a content-area test that he or she had rated as "valid" in the preceding step.

"Imagine a hypothetical group of individuals who have the minimum amount of content knowledge to perform the role of a Michigan entry-level educator in this endorsement area. What percent of this group would answer this item correctly?"

Committee members used the following ten-point scale to make each standard setting rating.

0%–10% = 1
11%–20% = 2
21%–30% = 3
31%–40% = 4
41%–50% = 5

51%–60% = 6
61%–70% = 7
71%–80% = 8
81%–90% = 9
91%–100% = 10

Special instructions for the Basic Skills test. Members of the Basic Skills panel were asked to review test materials that are intended for all candidates, regardless of their area(s) of content specialization. Candidates for endorsements in Michigan must pass all three sections of the Basic Skills test.

Content validation. The procedures for the review of the Basic Skills test were the same as for the content area fields; the fundamental criteria for evaluating the validity of the materials were also the same. The four aspects of content validity—Objective Match, Accuracy, Freedom from Bias, and Measurement of Basic Skills Knowledge Used to Perform the Job of an Entry-Level Educator in Michigan—were to be addressed when determining the content validity of each item and prompt. Items and writing prompts that a panelist determined to meet all four criteria were to be rated "Valid."

Panelists were asked to consider the following question as the fourth validity criterion for multiple-choice items.

Does this item measure basic skills knowledge necessary to perform successfully in Michigan teacher preparation programs?

For the writing assignments (prompts), panelists were asked to consider the following question as the fourth validity criterion.

Does this writing assignment allow measurement of writing skills necessary to perform successfully in Michigan teacher preparation programs?

An item or prompt failing to meet any one criterion would be rated "Not Valid."

Standard setting ratings. Panelists were asked to use the following question for multiple-choice items for the Basic Skills Test.

Imagine a hypothetical group of individuals who have the minimum amount of basic skills necessary to perform successfully in Michigan teacher preparation programs. What percent of this group would answer this question correctly?

For writing prompts, panelists were asked to review the writing assignment and the sample papers representing each of the four points on the rating scale, and answer the following question.

Imagine a hypothetical group of individuals who have the minimum amount of writing skill necessary to perform the job of a Michigan entry-level educator. Based on the sample papers provided, which of the four points on the rating scale represents the level of writing that would be achieved by this group?

Panelists were asked to remember that the writing portion of the Basic Skills test measures general writing ability. The content or subject matter of the sample is not important so long as it could elicit a sample of writing that allows an individual to demonstrate his or her writing ability.

Item validation and standard setting results. Results of the item validation and standard setting ratings were compiled from the committee members' individual ratings. An item was considered valid if a clear majority (61% or more) of those rating the item designated it as "valid" on each of the four item rating criteria. Only items meeting this threshold could be used as scorable on MTTC test forms. Items not meeting this rule were either deleted from the item bank or used as nonscorable on MTTC test forms until they were deleted from the item bank. The committee-based preliminary minimum passing score was calculated for each test field by summing the median standard setting ratings for the scorable set of valid items. (A description of these computational procedures appears in the Establishing Passing Standards section below.)

Determining passing standards. Evaluation Systems provided the MDE with the results of the standard setting activities. The MDE and STAC members reviewed the standard setting results and the Michigan Board of Education approved a final minimum passing score for the initial test form for each test field. Revised passing scores were implemented at the appropriate test administration, as determined by the MDE.

Standard Setting Procedures from October 2004 through November 2010

In October 2004, an updated standard-setting procedure was implemented. The section that follows describes the procedures used from October 2004 through November 2010. Following the November 2010 standard setting, the question committee members answered in providing standard setting ratings was updated; this change was implemented in October 2012, reflects current practice, and is described in a separate section entitled "Standard Setting Procedures from October 2004 through November 2010".

Standard Setting Procedures from October 2004 through November 2010

In October 2004, an updated standard setting procedure was implemented. For the new tests that were made operational in October 2004, the STAC approved an extended procedure that involved additional rounds of review and decisions by the standard-setting panel. Because item validity judgments were already incorporated into the Content Advisory Committee review of items during the Item Review Conferences, the revised procedure is designed to focus on the establishment of passing standard recommendations. The steps in the revised procedure are described below.

Orientation and Training. Panel members were given an orientation that explained the passing score recommendation process, the materials they would be using, the concept of the "entry-level educator," and the judgments about test items and the total test that panelists would be asked to make.
Simulated Test-taking Activity. In order to familiarize the panel members with the knowledge and skills associated with the test items, each member was given a copy of the appropriate field's test objectives and participated in a simulated test taking experience. Each panel member was provided with a copy of the test form used for the first operational administration and was asked to read and answer the questions on the test without a key to the correct answers. After panel members completed this activity they were provided with the answer key (i.e., the correct responses to the questions on the test) and were asked to score their answers themselves.
Round I: Item-based Judgments. The Evaluation Systems facilitator provided training in the next step of the process, in which panel members were to make item-by-item judgments. For rating of multiple-choice items, panel members used an approach called a modified-Angoff procedure. For each item, panel members were asked to make a judgment regarding the performance of acceptably-qualified individuals.
Panel members were asked to imagine a group of Michigan educators who are just at the level of knowledge and skills required to perform the job of an entry-level educator in Michigan. Each panel member was asked to indicate what percent of this group would provide a correct response for each item. Panel members provided an independent rating for each item by answering, using their professional judgment, the following question:

"Imagine a hypothetical group of individuals who have the minimum amount of content knowledge to perform the role of a Michigan entry-level educator in this endorsement area. What percent of this group would answer this item correctly?"

0%–10% = 1
11%–20% = 2
21%–30% = 3
31%–40% = 4
41%–50% = 5

51%–60% = 6
61%–70% = 7
71%–80% = 8
81%–90% = 9
91%–100% = 10

For test fields with constructed-response items, panel members made similar judgments regarding the constructed-response items on the test form they reviewed, using a procedure known as the "extended Angoff procedure." The scoring of constructed-response items was explained to panelists. The training included a review and discussion of the performance characteristics and four-point score scales used by scorers, as well as examples of marker responses used to train scorers. The marker responses were selected to represent performance at each of the score points. Again panel members were asked to envision a group of Michigan educators who are just at the level of knowledge and skills required to perform the job of an entry-level educator in Michigan, and to provide an independent rating, from 2 to 8, to answer the following question:

"Imagine a hypothetical individual who has the minimum amount of content knowledge to perform the role of a Michigan entry-level educator in this endorsement area. What score represents the level of response that would be achieved by this individual?"
Analysis of Round I Results. After the panelists completed their ratings, their rating forms were analyzed. For each test field, Item Rating Summary Reports were provided to each panelist containing for each multiple-choice item and for each constructed-response item when applicable: a) the median rating by all panelists who rated the item, b) the individual panelist's rating of the item, and c) the distribution of ratings for all panelists for the field.
Round II: Additional Item-Based Judgments. Evaluation Systems staff explained how to read and interpret the Item Rating Summary Reports from Round I. Panel members were also given item-level performance reports, which provided for each multiple-choice item the percent of candidates answering each item correctly at the operational test administration. Based on the additional information provided in Round II, panelists then had the opportunity to provide a second rating to replace the first round rating for each item.
Analysis of Round II Results. After panelists completed their ratings, the Round II rating forms were analyzed. Individual item ratings were then combined into a score that a hypothetical individual would be expected to achieve on the entire test. This recommended passing score was calculated for each panelist individually and for the group of panelists as a whole.
The recommended passing score for each panelist was calculated by summing the panelist's individual item ratings as follows.

For test fields containing multiple-choice items, the following steps were taken.
1. For each scorable item, convert each rating to a number between 0 and 100 by multiplying by 10 and subtracting 5 (e.g., if the rating is 7.0, the resulting value from this step would be 65).
2. Sum the values for all scorable items on the form and divide by 100. Round the result to the nearest integer to calculate the recommended passing score for each individual panelist.
In addition, an analogous calculation was performed for the group using the median of all panelists' ratings on each item. This recommended passing score based on item rating medians was calculated as follows.
1. Calculate the median value of all panelists' ratings for each scorable item on the test form.
2. For each scorable item, convert each median value to a number between 0 and 100 by multiplying the median by 10, subtracting 5, and rounding to the nearest integer (e.g., if the median of panelists' ratings is 7.0, the resulting value from this step would be 65; a median rating of 7.7 would result in a value of 72).
3. Sum the values for all scorable items on the form and divide by 100. Round the result to the nearest integer to calculate the recommended passing score based on item rating medians.
For test fields also containing constructed-response items, the following steps were also taken.
1. Sum the item ratings from each panel member for all constructed-response items in the section.
2. Calculate the median value of the summed ratings across all panel members.
Item-Based Passing Score Summary Reports were then distributed to panel members for each test field. For multiple-choice items, this report contained the following information:
- the number of scorable items on the test;
- the number of panelists;
- the recommended passing score for the multiple-choice section, based on item rating medians (from Step E above); and
- the distribution of individual panelists' recommended passing scores (from Step B above), sorted in descending order.
For test fields with constructed-response items, a second report was provided. This report contained the following information:
- the recommended passing score for the constructed-response section, based on item rating medians; and
- the distribution of individual panelists' recommended passing scores for the constructed-response section, sorted in descending order.
Test-based Passing Score Recommendation. In addition to the Item-Based Passing Score Summaries, panel members were provided with Pass Rate Analyses describing the performance of examinees at the operational test administration. Evaluation Systems staff then provided training for the test-based passing score recommendation. Panel members were instructed to consider the data in these reports and then provide a passing score (e.g., a passing score of 55 items answered correctly out of a possible 80 items on the test) in response to the following question:

"Imagine a hypothetical individual who has the minimum amount of content knowledge to perform the role of a Michigan entry-level educator in this endorsement area. What is the number of multiple-choice items on the test that would be answered correctly by this individual?"

For test fields containing only multiple-choice items, this passing score recommendation was considered a test-based passing score recommendation. For test fields also containing constructed-response items, this passing score recommendation was considered a section-based passing score recommendation

For test fields with constructed-response items, panel members were instructed to make an additional recommendation pertaining to the constructed-response section of the test. Panel members were asked to consider the data in the reports provided and then provide a passing score in response to the following question:

"Imagine an individual who has the minimum amount of content knowledge to perform the role of a Michigan entry-level educator in this endorsement area. What score represents the level of response that would be achieved by this individual?"

Panel members were instructed that their response to this question should be the combined total number of points out of the possible number of points for all of the constructed-response items on the test that represents the level of responses that would be achieved by this individual on the constructed-response items.
Determination of Passing Standards. The Standing Technical Advisory Committee (STAC) and MDE were provided with the results of the standard setting conference for review and discussion. The information provided for review and discussion are described in more detail below. The MDE then approved a final minimum passing score for each test field. For test fields containing constructed-response items, the final minimum passing score took into account both the multiple-choice and the constructed-response sections of the test. Results for all test fields are periodically reviewed by the STAC and the MDE.

Note that the steps of the process from October 2004 through November 2010 were the same as those of the current process. However, the question posed to committee members in providing their standard setting ratings, and assumptions made around that question, were different, and were updated to the current version after November 2010 and implemented in October 2012.

Standard Setting Procedures for the Professional Readiness Examination (October 2013)

For the Professional Readiness Examination, which underwent standard setting in October 2013, the MDE approved a similarly updated standard setting question and explanation of the hypothetical candidate.

For the Professional Readiness Examination, panel members were asked to imagine a group of Individuals who have the level of knowledge and skills required to perform effectively in their student teaching assignment. Panelists were instructed that the individuals used as a hypothetical reference group for their judgments should be at the level of knowledge and skills required to be an effective student teacher.

Each panel member was asked to indicate what percent of this group would provide a correct response for each item. Panel members provided an independent rating for each multiple-choice item by answering, using their professional judgment, the following question:

"Imagine a hypothetical group of individuals who have the level of knowledge and skills required to perform effectively in their student teaching assignment. What percent of this group would answer this item correctly?"

Panelists were also asked to provide independent judgments regarding the performance of a hypothetical candidate on each of the constructed-response assignments by answering the following question:

"Imagine a hypothetical individual who has the level of knowledge and skills required to perform effectively in his or her student teaching assignment. What score represents the level of response that would be achieved by this individual?"

Panel members were instructed that their response to this question should be the combined total number of points out of the possible number of points for all of the constructed-response items on the test that represents the level of responses that would be achieved by this individual on the constructed-response items.

Establishing Passing Standards

Following the Standard Setting Conference, Evaluation Systems calculates recommended performance levels for the multiple-choice and constructed-response sections of each test based on the ratings provided by the Standard Setting Panel members. These calculations are based on the panelists' final rating on each item (i.e., either the unchanged first-round rating or the second-round rating if it was different from the first-round rating). See Calculation of Recommended Performance Levels for further information regarding the calculation of qualifying score judgments.

Evaluation Systems provides the Michigan Department of Education (MDE) with a Standard Setting Conference Report describing the participants, process, and results of the standard setting activities, considerations related to measurement error, and use of the passing scores in scoring and reporting the MTTC test. The MDE sets the passing score for each test based upon the panel-based recommendations and other input. The passing score is applied to the first and subsequent operational administration periods for each test.

Top of Page

Test Development Process

Table of Contents

Establishing Advisory Committees

Content Advisory Committees

Bias Review Committee

Standard Setting Panels

Test Objective Development and Review

Test Objectives

MATHEMATICS (SECONDARY)

Subarea I—MATHEMATICAL PROCESSES AND NUMBER CONCEPTS

Objective 002—Understand problem-solving strategies, connections among different mathematical ideas, and the use of mathematics in other fields.

Preparation of Test Objectives

Documentation of Correspondence between Test Objectives and Sources

Assessment Specifications

Bias Review of Test Objectives and Assessment Specifications

Objectives

Assessment Specifications

Content Reviews of Test Objectives and Assessment Specifications

Structure of Test Objectives

Objectives

Assessment Specifications

Content Validation Surveys

Survey of Michigan School Teachers

Survey of Faculty at Michigan Institutions of Higher Education

Analysis of the Content Validation Surveys

Job Analysis Study: Document Linkage between the Test Objectives and the Job of a Teacher in Michigan

Test Item Development and Review, Field Testing, and Marker Response Establishment

Test Item Preparation

Bias Review of Test Items

Content Review of Test Items

Field Testing

Establishment of Marker Responses for Constructed-Response Items

Establishing Passing Standards

Standard Setting Conference

Hypothetical

Acceptable Level of Content Knowledge of an Effective Michigan Educator

Archived Descriptions: Standard Setting Procedures

Establishing Passing Standards