1.
VALIDITY
Definition of
Validity
Validity is an overall evaluative judgment of the degree to which
empirical evidence and theoretical rationales support the adequacy and
appropriateness of interpretations and actions based on test scores or other modes of assessment (Messick,1989).
Validity is the degree to which a test measures what it is supposed to
measure, or can be used successfully for the purposes for which it is intended
(Richards,JackC.and Schmidt,2002).
Types of
Validity
· Content Validity : a type of validity that
is based on the extent to which a test adequately and sufficiently measures the
particular skills or behavior it sets out to measure.
· Construct Validity : a type of validity that is based
on the extent to which the items in a test reflect the essential aspects of the theory
on which the test is based (i.e.,the construct).
For example, the
greater the relationship that can be demonstrated between a test of communicative competence in a language and the theory of communicative
competence, the greater the construct validity of the test.
Ø
Convergent validity : a type of validity that is based on the extent to which two or
more tests that a reclaimed to measure the same underlying CONSTRUCT are in fact doing so.
Ø
Discriminate validity : a type of CONSTRUCT validity that is based on the extent to which
two or more tests that are claimed to measure different underlying
CONSTRUCTS are in fact doing so.
- Criterion Validity : a type of validity that is based on the extent to which a new test is compared or correlated with an established external criterion measure.
Ø
Concurrence validity : a type of validity that is based on the extent to which a test correlates
with some other test that is aimed at measuring the same skill, or with some
other comparable measure of the skill being tested.
Ø
Predictive validity : a type of validity based on the degree to which a test accurately
predicts future performance. A language aptitude test, for example, should have
predictive validity, because the results of the test should predict the ability
to learn a second or foreign language.
· Consequential validity : a type of validity that is based on the extent to which the use and
interpretations of a test that may have an impact on society will result in
fair and positive social consequences for all stakeholders including test
takers.
· Face validity : the
degree to which a test appears to measure the knowledge or abilities it claims
to measure, based on the subjective judgment of an observer. For example, if a
test of reading comprehension contains many dialect words that might be unknown
to the test takers, the test may be said to lack face validity.
2. RELIABILITY
Definition of
Reliability
A measure
of the degree to which a test gives consistent results. A test is said to be reliable if it gives the same
results when it is given on different
occasions or when it is used by different people (Richards,JackC.andSchmidt,2002).
Reliability refers to the
consistency of a measure. A test is considered reliable if we get the same result repeatedly. Unfortunately, it is
impossible to calculate reliability exactly, but there are several different
ways to estimate reliability.
Types of
Realibility
·
Equivalent
Form Reliability (Parallel Form Reliability)
Parallel-forms reliability
is gauged by comparing two different tests that were
created using the same content. This is accomplished by creating a
large pool of test items that measure the same quality and then randomly
dividing the items into two separate tests. The two tests should then be
administered to the same subjects at the same time.
·
Inter-Rater
Reliability
This
type of reliability is assessed by having two or more independent judges score
the test. The scores are then compared to determine the consistency of the
raters’ estimates.
The
degree to which different examiners or judges making different subjective
ratings of ability (e.g.of L2 writing proficiency) agree in their evaluations
of that ability.
·
Test-Retest
Reliability
To gauge test-retestreliability,
the test is administered twice at two different points in time. This kind of
reliability is used to assess the consistency of a test across time. This type
of reliability assumes that there will be no change in the quality or construct
being measured.
An estimate of the reliability of
a test determined by the extent to which a test gives the same results if it is
administered at two different times.
·
Internal
Consistency Reliability
This
form of realibility is used to judge the consistency of results across items on
the same test. Essentially, you are comparing test items that
measure the same construct to determine the tests internal consistency. When
you see a question that seems very similar to another test question, it may
indicate that the two questions are being used to gauge reliability. Because
the two questionsare similar and designed to measure the samething, the test
taker should answer both questions the same, which would indicate that the test
has internal consistency. (Richards,JackC.andSchmidt,2002)
Ways to Reach
Reliability
· Do not
allow candidates too much freedom
· Write
unambiguous items
· Provide
clear and explicit instructions
· Ensure
that tests are well laid out and perfectly legible
· Candidates
should be familiar with format and testing techniques
· Provide
uniform and non-distracting conditions of administration
· Use
items that permit scoring which is as objective as possible
· Provide
a detailed scoring key
· Train
scorers
· Identify
candidates by number, not name
· Employ
multiple, independents coring
(Hughes,1996:36-42)
Factors
Affecting Reliability
·
Student
related reliability
a)
Temporary illness
b)
Fatigue
c)
A bad day
d)
Strategy
·
Rater
reliability
a)
Human Error
b)
Subjectivity
c)
Lack of attention to scoring criteria
d)
Inexperience
·
Test
administration reliability
a)
Street noise
b)
Photocopying variations
c)
Poor light
d)
Temperature variations
e)
Chair and table condition
·
Test
reliability
a)
Timed test
b)
Ambiguity of test items
3. Authenticity
More about Authenticity
Authentic
Materials:
·
Nunan (1989): “any material which
has not been specifically produced for the purpose of language teaching.” (ascitedinMacdonald,Badger&White,2000)
·
Bacon & Finnemann (1990):
“authentic materials are texts produced by native speakers for a non-pedagogical
purpose.
Comparison
about Authentic Vs Non-authentic Materials
Authentic
·
Language data
produced for real life communication purposes.
·
They may
contain false starts, and incomplete sentences.
·
They are
useful for improving the communicative aspects of the language.
Non-Authentic
Materials
·
They are
specially designed for learning purposes.
·
The language
used in them is artificial. They contain well formed sentences all the time.
·
They are
useful for teaching grammar.
Tidak ada komentar:
Posting Komentar