Implementing Observation Protocols

Lessons for K-12 Education from the Field of Early Childhood

    Download the report (pdf)

    Download the introduction and summary (pdf)

    Read the report in your web browser (Scribd)

    While it might seem counterintuitive, at least some of the answers to turning around our nation’s struggling K-12 public schools can be found at the nearest preschool.

    At a time of considerable urgency and demand for improvements in our nation’s schools, particularly when it comes to evaluating the effectiveness of teachers, there is no need to reinvent the wheel. Instead of looking to the development and implementation of new educational models and methodologies, K-12 educators would do well to learn from the lessons and experience accrued by their counterparts in the early childhood sector, specifically when it comes to teacher performance evaluation.

    There is no shortage of debate on the challenges and promises of teacher performance evaluation as the reauthorization of the Elementary and Secondary Education Act of 2001, also known as No Child Left Behind, proceeds and as states seek to implement reforms. Unfortunately, there is precious little precedent for the use of performance evaluation of teachers in the K-12 education setting, at least good performance evaluation. The well-documented shortcomings of existing evaluation methods from principal “drive-by” observations to hiring interviews to tenure reviews and more all lead to the same conclusion—nearly every teacher “passes” whatever “test” they face. The problem is that the “tests” themselves do not discriminate good performers from poor performers and make virtually no connection between these “tests” and student achievement, professional development, or incentives to improve.

    Relying on the status quo for teacher performance evaluation wastes time and energy—performance metrics are nonexistent or not valid and there is little to no linkage among the key components of most evaluation and performance-improvement systems. As practiced now teacher evaluation is a nonsystem with a lot of moving parts of dubious value and very little connection among them.

    Some measure of teachers’ classroom practices, usually in the form of observation, is at the core of nearly every proposal and early-stage rollout of the next generation of teacher performance evaluation efforts in districts and states. Typically coupled with estimates of teachers’ contributions to student gains on achievement tests as well as with other indicators of performance, observation of teachers’ classroom practices is a cornerstone of this new wave of assessment. To ensure that an evaluation system is capable of providing teachers with the actionable feedback needed to improve, solid information is paramount. Clearly, high-quality classroom behavior and practices are at the core of any definition of “effective teaching” and what most teachers would identify as the manner in which they contribute to student learning.

    Like most initiatives in education reform, observation is subject to implementation and policy challenges that could very well hinder its ultimate benefits.

    It is sensible to think that observational assessment of teachers’ classroom behavior would be a central component of any evaluation system since teachers’ behaviors and interactions are students’ most direct experience of teaching. Yet like most initiatives in education reform, observation is subject to implementation and policy challenges that could very well hinder its ultimate benefits. The short list of challenges include: technical issues in defining and measuring teaching behavior; gathering information about a teacher through consistent and reliable observation; ensuring that the behaviors observed really matter for student learning (for example, validity of the observation); determining how observations connect to high-stakes consequences such as tenure and professional development; and a host of support and infrastructure requirements needed to roll out sound observation efforts on a large scale. Yet there are too few models of how to do observation well in the K-12 sector. But there is one sector where we have more than two decades of widespread application of classroom observation from which to draw lessons: early childhood education, which is the focus of this paper.

    This report draws from decades of experience using observation in early childhood education, which has implications for administrative decisions, evaluation practices, and policymaking in K-12. Early childhood education has long embraced the value of observing classrooms and teacher-child interactions. In early childhood education the features of the settings in which children are served are the hallmarks of quality. These features can include health and safety considerations, the materials and physical layout of the space, and the interactions that take place between adults and children—such as conversations, emotional tone, or physical proximity. Standardized observations of these early childhood education features in turn yield metrics that are used in state and federal policy, program improvement investments, and the credentialing of professionals5—all uses that K-12 education is now considering.

    This paper examines lessons learned from observation in early childhood education that may be helpful as states and districts begin implementing more rigorous observation protocols for K-12 teachers. Although these lessons apply to all grades, they may be particularly relevant for K-3 as assessment of student performance using standardized achievement tests is most challenging in those grades. These lessons focus on the importance of standardization, trained observers, methods for ensuring the validity and reliability of the instruments, and the use of observational measures as a lever to produce effective teaching. These lessons form the basis for the following recommendations:

    • Any measure must provide information in the form of metrics that clearly differentiate those being assessed. Observation is no exception—thus observation is a form of measurement and assessment consisting of codes and benchmarks that must be applied rigorously, just as they are in assessments of student performance.
    • Observations used in systems of decision making and performance improvement must adhere to standardized procedures. There are three components of standardization that are key elements for evaluating any observation instrument and its implementation—training protocol, parameters around observation, and scoring directions.
    • The technical properties of observational protocols and scoring systems are fundamental for their use. Reliability is one of these properties and pertains to the level of error or bias in the scores obtained. It is critical that users select tools that have documented reliability for use across observers, teachers, time, and situations. Effective training programs for observers help to ensure raters are consistent with one another as they make ratings. Similarly, including periodic “drift” testing at predetermined intervals will help to improve the degree to which raters remain consistent with scoring protocols and with each other.
    • Any observation of teacher performance must show empirical relations with student learning and development if the use of observation is expected to drive improvement in student outcomes. Selecting an observation system that includes validity information cannot be overstated.
    • Pragmatically, observation takes time and different systems of observation require different time commitments. The amount of observer time available can be an important practical consideration when selecting an observational system. In general the more ratings a school or district is able to obtain and aggregate, the more stable an estimate of typical teacher practices will result.
    • Observations can identify teacher classroom behaviors that matter for students, can describe typical teacher practices, can show how a given classroom or teacher compares with a national or district average, can forecast the likely contribution of a teacher to children’s learning, or can document improvement in teachers’ practices in response to professional development. Users, however, must be cautious to not overstep the appropriate use of observational instruments in their enthusiasm to apply them in any and all circumstances.
    • Observations can be used in both accountability and program-improvement applications. Importantly, policy and program investments over time can change the typical distribution of scores as teachers, classrooms, and programs improve, and as a consequence it can be necessary to periodically “raise the bar” on performance standards or cutoff scores.
    • Feedback to teachers is most effective when it is individualized and highly specific, focused on increasing teachers’ own observation skills, promotes selfevaluation, and helps teachers see and understand the impact of their behaviors more clearly.

    Note: To better make our point, we’ve employed the technique of using fictional situations throughout this paper to illustrate specific points that further our overall argument that the use of early childhood education observational evaluation methods have value for K-12 education.

    Robert Pianta is the dean of the Curry School of Education, the Novartis U.S. Foundation professor of education, and a professor of psychology at the University of Virginia, where he also directs the University of Virginia Center for Advanced Study of Teaching and Learning.

    Download the report (pdf)

    Download the introduction and summary (pdf)

    Read the report in your web browser (Scribd)