7/7/97 Methods of Empirical Research designing educational research research and statistical inference some measurement syllabus: One objective exam with occasional assignments. ============================== One term that pervades educ is theory. theory = an explanation, true, hypothesis, statement, measurable, prediction and unifying explanation, systematic view, tying together things, hypothesis: considered all that went before via research, does all stuff that has gone before add up to a hypothesis. constructs: important term of soc sci; models, parameters, a postulated underlying behavior that is observed: anxiety intelligence: inferred from tests variable: independent (cause) and dependent (effect) variables, Theory hypothesis construct ---------------------------- observable behaviors How do we get from top to bottom. operational definitions It starts with sci schema and it is redefining. Test and redefine, it is self correcting so new hypothesis are created. ++++++++++++++++++++++++++++++++++++++++++++ Problem: Should one engage in premarital intercourse? not empirical ++++++++++++++++++++++++++++++++++++++++++++ What can be researched: outcomes as compared to other variables. manipulated variables are a subset of indep var. indep is always the way. predict from indep to dep. amount of reading to the intelligence of students. which is indep: which is dep: both are intelligent pop read reading makes intelligent It is in the theory, how the variables are used and defined. The indep in one becomes dep in another. _ X = mean E = sigma or sum X = a number N = number of numbers X - = deviation from mean square them = standard deviation s.d. x x2 x x2 5 2 4 3 0 0 4 1 1 3 0 0 3 0 0 3 0 0 2 -1 1 3 0 0 1 -2 4 3 0 0 ------------------------------------------------ 15/5=3 15/5=3 In the first group the s.d. = 1.4 in the second = 0 The mean indicates the central tendency while s.d = variability _ X s.d. I 35 5 II 35 10 more diverse III 40 10 IV 40 0 they cheated average also may indicate median = midmost score in group. mode = most freq occurring score. See notes on paper 1/1 7/7/97 ++++++++++++++++++++++++++++++++++++++++++++++++ 7/10/97 bell curve hor axis is scores vert axis freq curve has kind of logic biggest pile in middle and fewer in either direction. IQ mean 100 most freq Use it for a normal or average distribution group. ^ / \ / \ - | | - / | | \ - | | | | - 2% 14% 34% 34% 14% 2% ------------------------------------------------------ -2 -1 X 1 2 as we go out 1 deviation see paper notes IQ mean = 100 s.d. = 15 70 = bottom 2% 85 = bottom 16% 100 = mean 115 = top 86% 130 = top 2% SAT mean = 500 s.d. = 100 see 7/10 notes other classifications of variables: var as stimuli and response var. stimulus is something that causes a reaction ie envir that may effect behavior home, time spent on instruc, methods, teacher, response: how they react to it intervening var is in between like being deaf, background, maturity, language, hunger, sleep, active and assigned var: NOT fire which is disaster. active one we manipulate, experimenter signs and are indep assigned we assign a value or a symbol like IQ, M/F, be either indep or dep math properties, or values is another kind of var prob that cant be handled empirically like ethics What is empirical? Problem: How have computers effected the amount of writing a student produces as compared to those who do not use the computer. Hypothesis: Students who use the computer are more apt to write more than those who don't use the computer. no incomplete constructions problem: How does.. effect.. Keep specific measurement tool out the prob = the theory so if the theory is not strong than the hyp will be weak. be specific with words, define them use research words like relate to not correlate. keep jargon out, KISS Summary: Prob : Write in terms of questions, but if sponsor wants statement then defer to sponsor. Variables should be testable. Tell problem early and include var. your hyp is your conjecture of how the var are related. hyp tells nature of relationship. Don't switch words. don't use broad terms, find balance between specific and broad. stay away from euphemism avoid mythological groups scores on X test SAMPLING: universe or population: defined group distinction all conceivable elements IQ's = althe IQ of US part of group, no method of selection, define any portion of that pop random sample: any member of that group can be selected. Ramdom sampling. 7/14/97 Random sampling, don't take numbers out of hat, there is a procedure to do proper random sampling. Number every member of whole samples. Then use table of randomizer. No pattern is discernible, how do you get these. carry out square of 2 out.... see page 640-3 for random numbers: using the chart, we closed our eyes and placed it on the page. Did this 3 times. In each case we chose 10 single digits in the col, found the average. When we charted the class we a bell curve. Why? Central limit therom: Even when the pop. departs from narmality, the sampling distribution of the statistics are normally distributed. What this means is that when we strted with the "random numbers" (the pool) we had a rectangle, an equal amount of choices* and in a sampling we will arrive at the bell. *It is equal because we have equal number of all numbers: x9's, x8's, x7's.... Even if we had a pop of IQ's in TC, the pool may reflect on the high side, so the shape of the pool will be odd shaped, the result wil be bell. _ Standard Error of Mean X s.d. _ -------- = SE X ____ \/n-1 example: IQ's of stud at TC. _ X = 120 s.d. = 15 n = 10 _ ____ SE X = 15/ \/10-1 _ ___ SE X = 15/ \/ 9 _ SE X = 15/3 = 5 ^ _ / | \_ _/ | | |\_ _/ | | | \_ _/ | | | \_ / | | | \ ------------------------------------ 115 120 125 The intervals are called "confidence" levels. In this case 5. This tells me that 68%of the pop is between 115 an 125. 95% would be betwen 110 and 130. The cofidence level is low. In a small sample, too much error. As the sample goes up, the error goes down. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7/16/97 simple random samples take pop number every one in pop and let table pick. diff men and women stratified set: subdivide pop into groups, category, number of objects is proportionate to pop. In NYC: homo and hetero grouping of retarded educable? variable = IQ cluster sampling: going to a sch pick n number of kids, from within know groups. This is economic These are the building blocks. Pollsters know this the best. misconceeptions ain't true. Random is not same as haphazard: Random is diff to achieve: RS is not necesarily representative sampling: RS is ramdom is not necessary in small samples. Never find a true random sample I II a 1 2 b 2 3 c 3 4 d 4 5 e 5 6 patterns I + 1 = II basic notion between co-relation = relationship between 2 sets of numbers. pearson product + coefficient r = index 1.00 to .00 perfect absence of relation strength is determined by the size of the number not pos/neg .50 or -.70 .7 is stronger correlation does not insure causality, but causality does insure correlation. Measurement: reliablity: consistency validity: measuring what we were supposed to measure. scores attributed to chance. test retest reliablity: I II a 5 6 b 4 5 c 3 4 d 2 3 e 1 2 correlation r11 or rtt it is logical. you give it, you gave bit again. one instrument, items are helpless. Prob: test familiarity, memory, fatigue, gap between test. alternate form, equivalent: same type of test 27 84 parallel forms / alternate reliablity is valid in a speed test. disadvantage : carryover effects, high correlation 7/22/97 Reliablity tells us to waht extent scores on a test are nothing but chace. It is a statistic called variance. Real difference between . Reliability is: consistency propostion of variance To measure someone's IQ true differences of correlation: if I get one and see a pattern I am measutring something, maybe not what I want, but I am measuring something real. regression towards the mean those in an extreme group have to move away from the extreme. Raw Score(observed) = RS(true) + RS(error) V = Variance test whole group V(observed) = V(true) + V(error) ---------------------------------------------------- V(O) V(O) V(O) V(true) ------- = Reliability V(O) varibility in those cores is something real but not necesaily what we want. How do we improve reliability? # of ques, adding more can add reliability, more ques more reliability clearing up ambiguity of language in test: instructions, questions, answers A test is reliable is dumb!! consider groups, make the numbers talk for you even if you don't understand it. estimating the error for a person giving one test Standard Error of measurement. mesaurements results and the things you tend to measure is it testing what it is supposed to measure and see chap 27 for: suppose I test and then I want to test validity of the test. Ask other teachers called logical validity; is this set of samples relative to what is to be tested? Proficiency tests criterion related validity. predictive & concurrent selection aptitude tests, prediction type when do content considerations come in to validity of GRE? will the test do the job? predictability concurrent validity same day as close together as possible. Construct validity p420 construct could not be observed. underlying idea: honesty, anxiety, test of intelligence which is a construct and tested reading is that a correlation? yes. big deal in personality type tests, are they valid for what they are being used for? we nibble away at, test is valid is valid for a specific purpose. for one pop. raise reliability raise validity cultural influences which could effect validity face validity: what test seems to be measuring which is not validity. stat inference ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7/23/97 if your measures are no good, your report is no good. Just because someone publishes doesn't mean its good. Don't assume because it is published it is good. Have to look yourselves. statistical inference: L x x2 r x x2 4 0 0 3 0 0 5 1 1 1 -2 4 3 -1 1 5 2 4 2 -2 4 2 -1 1 6 2 4 4 1 1 ---------------------------- 20 10 15 10 ---------------------------- 5 n-1 5 n-1 20/5=4=mean s.d.=10/5-1=10/4=2.5 15/5=3=mean s.d.=10/5-1=10/4=2.5 variance = s.d squared V=s.d.2 V= 2.5 If you describe the variablity look at s.d. we don't know how much is due to chance or indiv dif so we attribute all to chance. diff is called all error var of right side we have a measure of something other than treatments in each group we have measured left and right and then averaged the 2 means of l & r x x2 3 .5 .25 + 4 -.5 .25 ---------------------------------- 7/2 = 3.5 = mean .5 * 5 = 2.5 The 5 above is the number of units. so if the number of pop tested I'd use the number tested. It is an adjusting factor. analysis of variance: computed 2 means are the groups different, cahnce will give diff go thru proced to see variance est of varibility on chance if they are diff it will show up in means is variability greater or is chance est of var as compared to est of chance so we are confiodent i saying that the diff is not negligible. more trad to report out mean and s.d. but reporting out variance and mean is fine in comparing 2 groups use variance. don't take results at face value In real life you need to know how to read the reports and hear the folks speak it. computational formulas: adding a constant to one group, equal addition to all. fudging!! l x2 r x2 6 36 3 9 7 49 1 1 5 25 5 25 4 16 2 4 8 64 4 16 ------------------------------ Ex 30 15 55 _ X 6 Ex2 190 Ext = 30 +15 = 45 Sum of x total Ex2 = 190 + 55 = 245 Sum of x squared (Ext)2 (45)2 2025 C = ------- = ---- = ----- = 202.5 N 10 10 C = Sum of X total squared over N which is total # of pop Total Sum of squares SSt SSt = Exs/t - C = 245 - 202.5 = 42.5 L is a correction term (Exl)2 (Exr)2 SSb = ------ + ------- - C nl nr Sum of squares between = Sum of total left squared over the number in left PLUS sum of total right squared over the number in right MINUS C 30*30 15*15 SSb = ------ + ----- - 202.5 = 22.5 5 5 degrees of freedom = n-1 source d.f. ss V between 1 22.5 22.5 within 8 20 2.5 total 9 42.5 9=F within is computed by subtracting between from total for d.f. and ss computing the V (variance) divide between/total F = Fisher Vb 22.5 F = --- = ---- = 9 Vw 2.5 see pages 211-12 Choose two sets of 10 random numbers and compute this If probability of an occurance of an event ocurs by chance alone is only 5/100 then and such an event does occur we are going to assume that it is not a chance occurance. TEST: Objective sci and sci approach hypotheses and problems sampling sum of means constructs measurement: realibility and validity stat: ideas not computation