American Educational Research Association, American Psychological Association, National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Angoff, W. H. (1971). Norms, scales, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.). Washington, DC: American Council on Education.

Angoff, W. H. (1984). Scales, norms and equivalent scores. NJ: Educational Testing Service.

Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.

Breyer, F., & Lewis, C. (1994). Pass-fail reliability for tests with cut scores: A simplified method. Princeton, NJ: Educational Testing Service.

Keats, J. A. (1957). Estimation of error variances of test scores. Psychometrika, 22.

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (2nd edition), New York, NY: Springer Science and Business Media, LLC.

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.

Longford, N. T., Holland, P. W., & Thayer, D. T. (1993). Stability of the M-H D-DIF statistics across populations. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 171–196). Hillsdale, NJ: Erlbaum.

Lord, F. M. (1959). Tests of the same length do have the same standard error of measurement. Educational and Psychological Measurement, 19, 233–239.

Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370.

Top of Page