Introduction to the Dataset
This dataset is derived from a clustered randomized controlled trial (RCT) conducted in urban neighborhoods across the Delhi NCR region in India. The experiment evaluated whether enabling women to coordinate travel with peers for job interviews would reduce mobility-related constraints and increase employment uptake. The study design, survey instruments, and sampling methodology were developed in collaboration with local partners and refined over multiple pilot rounds.
Experimental Design
The unit of randomization was the neighborhood, defined based on geographic and infrastructural characteristics. The sampling frame was limited to areas located within a 12-kilometer radius of a sample garment factory. A detailed mapping exercise was carried out to identify and define neighborhoods using prominent physical features such as main roads, highways, parks, and open areas. In instances where natural boundaries were insufficient, paved roads or non-residential buildings were used to delineate cluster boundaries. To minimize information spillovers, buffer zones were created between neighborhoods, and clusters in close proximity were excluded if spillover risk could not be mitigated.
A total of 106 neighborhoods were finalized for inclusion. For each neighborhood, we computed the distance from its geographic centroid to the factory and binned neighborhoods into nine distance-based strata. These bins showed a high correlation with travel costs to the factory, measured using prevailing rates for shared and private auto-rickshaws. Within each city-distance stratum, neighborhoods were randomly assigned to one of three groups: (i) Matching and Coordinated Travel, (ii) Matching without Coordination, and (iii) Control.
Sample Recruitment and sampling Procedures
Following randomization, household screening began in February 2024. Within each cluster, enumerators selected a random entry point to begin door-to-door screening. They adhered to a ”right-hand rule” to navigate lanes and avoid bias toward households located near the main access roads. The screening questionnaire assessed eligibility based on the following six criteria:
- Age between 18 and 40 years (initially capped at 35, later relaxed to 40)
- Possession of a valid government-issued ID (Aadhaar)
- Ability to operate a home or factory sewing machine
- Not currently engaged in paid work outside the home
- Not employed by the partner factory (Shahi Exports) within the last 3 months
- Expressed interest in working at the factory, regardless of household approval
These criteria were designed to align with the factory’s hiring requirements while focusing on women who may face barriers to labor force entry. A total of 693 women meeting these criteria were enrolled across 106 neighborhoods, with an average of 7 women per neighborhood.
The original enrollment target was 750 women (10 per neighborhood), but the target was revised after it became evident that some clusters had fewer eligible women. To ensure statistical power, we increased the number of clusters from 75 to 106 and recalculated sample requirements accordingly.
Survey Instruments and Data Collection
Data collection occurred over three major phases—two pilots and one main study—and included four core instruments: baseline survey, neighborhood meeting attendance survey, factory-site survey, and endline survey.
Baseline Survey:
Conducted between March 27 and June 19, 2024, the baseline survey was administered in-person at the respondent’s home immediately following enrollment. Trained female enumerators administered structured questionnaires using tablets. Modules included demographics, employment history, safety perceptions during commuting, mobility and trip histories, aspirations, gender attitudes, and social connections with other women in the neighborhood. We also recorded detailed information about household composition and resource control.
Meeting Attendance Survey:
This survey was administered to women assigned to the two treatment groups, particularly those invited to participate in travel coordination meetings. Early implementation revealed low attendance at these meetings. In response, field protocols were adjusted to include home visits and escorting participants to meetings in smaller groups, located closer to their residences. The meeting survey collected information on attendance, group composition, preferences for travel companions, and perceived utility of the intervention.
Factory-Site Survey:
Enumerators were stationed at partner factory gates during scheduled interview days. For all study participants who appeared, we administered a short survey that recorded their mode of travel, companions, time of departure, cost of travel, and reasons for attending. This data forms the basis for the primary outcome: interview attendance.
Endline Survey:
Conducted between May 31 and August 11, 2024, the endline survey covered all enrolled women approximately 5–6 weeks after the interview invitation window closed. Many respondents had relocated or were temporarily unavailable, requiring multiple follow-up visits and a shift to phone surveys in several cases. Phone surveys employed a shortened version of the endline instrument, focusing on key outcomes such as employment status, travel behavior, and mobility confidence. Attrition between baseline and endline was 19.8%, resulting in an endline sample of 560 women.
Pilots
Pilot 1:
Conducted starting November 26, 2023, Pilot 1 served to test the feasibility of the screening and enrollment protocol and covered 31 respondents. The team assessed the clarity of survey questions and the operational logistics of mapping and randomization.
Pilot 2:
Launched on February 6, 2024, Pilot 2 involved 30 women and incorporated revisions from Pilot 1. Updates included streamlining the survey instrument, clarifying consent procedures, and refining respondent tracking protocols.
The endline surveys for these 61 pilot participants were conducted concurrently with the main study endline in June–August 2024, using the same survey instruments and protocols to ensure comparability.
Dataset Composition and Documentation
The final dataset includes three sets of anonymized files corresponding to the Pilot 1, Pilot 2, and Main Study phases. Each phase includes:
- Individual-level baseline survey data
- Meeting attendance survey (for treatment groups)
- Factory-site tracking survey
- Endline survey data
All data are provided in Stata (.dta) format with clearly labeled variables. Additional documentation includes:
- Survey instruments (baseline, endline, factory site, and meeting)
- Variable-question mapping