The topics in this story are: impact evaluation, job-training, employability, skills, regression discontinuity, before-after analysis, experimental evaluation, excess demand.
Let’s imagine a program that trains young people who are living in a context of vulnerability, proving them with 6 months of training in both technical and socioemotional skills, and giving them better chances at finding employment.
The Program Coordinator (PC) meets the Monitoring and Evaluation Specialist (S). The purpose of the meeting: to decide how the program will be evaluated. After an informal chat about bad traffic and the weather, they start discussing the issue at hand.
S: Alright. Let’s get to business. What are you looking for?
PC: Simple. We just want to measure the impact of our training program.
S: (With a mocking tone) What exactly are you referring to when you say “measure the impact”. You want to measure the number of people who are trained or…?
PC: (interrupts laughing out loud) You know that’s not what I’m talking about. I’m talking about measuring the real impact. I want to know if the kids who went through the program are employed.
S: Alright. You don’t need an impact evaluation for that. You just have to apply a questionnaire to those who graduate and see if they are employed at the moment.
PC: I don’t get it. Why are you saying that’s not measuring the impact of the program? For kids who are coming from a context of vulnerability getting a job is a huge improvement in their lives!
S: Naturally. I wouldn’t argue against that. But that’s not measuring the program’s impact…
PC: (Interrupts again) Sorry, but for the kids getting a job means a huge change in their life prospects, the opportunity to get an income, health insurance, the chance to learn and grow into a career… it’s a long-term growth opportunity.
S: I agree with that one hundred percent.
CP: Then what?
S: To evaluate the impact of a program doesn’t necessarily mean to evaluate long-term changes in quality of life. Evaluating impact comes down to measuring how much of that change is specifically the result of the program.
PC: Alright. I get it. We have to look at the change and not just at the current situation. Some of the kids might have started in the program already having a job, so finding out that they are currently working doesn’t say much in itself.
S: Well, yes, that is in part what I’m talking about…
PC: (Interrupts again) But don’t worry, we can solve that very easily. All the kids can fill out a form when they sign up for the program with questions about their employment situation. Then we can check anything we like: if they are employed, if the employment is formal, how much income they earned, and anything else.
S: Well, yes. You see, now we are getting closer to what we want. One methodology that can be used in impact evaluation is the “before-after” method.
PC: (Asks jokingly and laughs) “before-after method”? Let me guess… This method compares the employment situation before the program and after the program.
CP: See? I told you it was easy.
S: Yes, the “before-after” method is a very simple evaluation technique that can be used practically in any program, as long as the result variable doesn’t naturally change with time.
PC: Mmm… I’m not sure I follow you.
S: Imagine a nutritional program for kids where the result variables are their height and weight. If we measure these variables when the kids begin the program and a year or two after that, it’s very likely that the measurements will go up naturally. That doesn’t necessarily mean that it’s because of the program; the kids would have grown at least a little bit anyway, with or without the program’s intervention.
PC: I get it. But… is time a problem in our case too?
S: I think so. People in vulnerable contexts usually live with very volatile employment situations. I imagine most kids who sign up for the program do it because they don’t have a job right?
PC: Yes, some have very informal jobs, but most are not employed at all.
S: In that case…
PC: (interrupts once more) In that case many of them will find a job simply because they were unemployed before and they looked for one, not necessarily because of the program’s help. So, then what do we do?
S: How many people sign up for the program?
PC: I’d say around 400 in every cohort.
S: And how many do you accept?
PC: We have placements only for 100.
S: How are these 100 selected?
CP: Why so much interest in our selection process?
S: Well… because the selection process is fundamental to choose the most appropriate method to evaluate impact. Tell me, how are the 100 selected?
PC: At first, we did it by order of application, but some years ago we noticed kids were dropping out, so we started applying a selection criterion to choose those who we think will benefit the most out of it.
S: What are those criteria? And how do you apply them?
PC: Basically, there are two points: applicants have to be younger than 25 years old, and have completed high school. All those who meet these criteria, which would be around 350 kids, are called for an interview. From those, there are at least 50 who don’t show up. To those who do, we give an evaluation form with questions that identify their socioemotional skills, including empathy, communication, responsibility, motivation etc. The evaluation gives a score from 1 to 50, and then we simply choose the 100 candidates who scored the highest.
S: (Hunching over and crossing fingers in the style of Mr. Burns) Excellent!
PC: Why so much enthusiasm?
S: Because this process lets us choose from different methodologies.
PC: Really? Which ones?
S: Let’s go from the most ideal to the least.
S: Since you have an excess demand for the program, we can apply an experimental evaluation without affecting the number of beneficiaries. For instance, we could keep the current selection process, but instead of selecting the 100 best candidates, we pre-select the 200 best. From those, we choose 100 people at random. This way, we would have two groups with the same average score. The only difference is that one group will go through the program and the other group wouldn’t. After a year, we could do phone interviews and compare the job situation of people in both groups.
PC: Like in one of those laboratory experiments.
CP: Mmm… I’m not really convinced by this. I know we wouldn’t be excluding people from the program, only changing slightly who gets in and who doesn’t, but I would prefer not to modify our selection process. Why don’t we compare then 100 best applicants with the other 200?
S: Well, we would be falling in a selection bias. In the end, you do choose the 100 best candidates, and not just any 100 candidates, right?
PC: You are right. The 100 candidates we select must be those with highest chances of finding employment in the first place… that’s also why they score so high during our interviews. That is a selection bias.
S: Exactly! But still, without modifying the selection process, we could apply another methodology called regression discontinuity.
PC: What is that? It sounds complicated.
S: Not so much. We could still compare the best 100 with the other 200, but first, we order them from highest to lowest score. Then we compare those who are in positions 51 to 100, with those who are in position 101 and 150 and didn’t get into the program.
PC: I know what you are getting at. We would compare those who didn’t make it by very little.
S: Exactly! The idea is that all the applicants who end up along the limit score for acceptance resemble each other enough in any other way.
PC: I like that… we wouldn’t have to change our selection process. Oh, but you specialists always have something to say… Go on… what’s the problem with this one?
S: (Laughing) We have a reputation huh? There are two issues here. First of all, this method works only if the score is always respected as a criterion for selection. There can’t be any further considerations or manipulation once the evaluation is complete. Only the best 100 candidates have to be accepted to the program and that’s it.
PC: I can assure you that this is always respected. For us it’s very important to be transparent about it.
S: The second issue… is that now our sample would be much smaller. We wouldn’t have 100 people in our treatment group, but only 50 or even less. This means that the statistical power of our evaluation starts going down.
PC: “Statistical power?” Ok… I only made it this far. Let’s grab a cup of coffee and continue this conversation later. In any case, if we only do it once, choosing randomly between the 200 best candidates doesn’t sound that hard.
S: Great! You could even invite those who are not accepted to participate in the program a year after our study.
PC: That’s great! I told you it would be easy!
Both laugh and go get a cup of coffee. They know this is just starting.