Designed & built by Laurel

Laurel-Cook.com

JUMP TO:

Crowdsourcing

Data Sources

Common Problems in Consumer Research Here is a concise, accessible critique of common pitfalls in consumer research. This blog covers theory, measurement, and inference and is written by marketing scholar, Dr. Caleb Warren (University of Arizona). HOW TO FIX COMMON PROBLEMS Gorilla Experiment Builder Developed by researchers affiliated with the Universities of Cambridge and London, Gorilla is an online platform for building and deploying behavioral experiments. It features a graphical, no-code interface, precise stimulus timing, and response-latency measurement. Experiments can be built for free and deployed on a pay-per-participant basis. Reference: Anwyl-Irvine et al. (2019), Behavior Research Methods. GORILLA.SC G*Power This tool is a stand-alone application for power analysis across t-tests, ANOVAs, regressions, and more. Particularly useful when planning experiments with continuous or categorical outcomes. GPOWER.HHU.DE jsPsych jsPsych is a JavaScript framework for creating web-based behavioral experiments that run directly in a browser. Experiments are built by combining modular plugins into timelines, allowing researchers to present stimuli, capture precise response data, and create highly customizable experimental designs. JSPSYCH.ORG Manipulations with Many Labs 2 Supported by the Center for Open Science and the Association for Psychological Science, the Many Labs 2 project replicated nearly 30 classic and contemporary effects across more than 15,000 participants. The publicly available materials provide well-validated examples of experimental manipulations (e.g., correspondence bias, moral judgment, power, affect, risk, goal pursuit) that are useful for study design and benchmarking. PSYARXIV.COM Pre-registration & Registered Reports Pre-registration involves publicly documenting a study’s hypotheses, design, and analysis plan before beginning data collection to increase transparency and research credibility. Here’s a marketing example (‘22). One increasingly adopted evolution of this practice is the Registered Reports publication model, where journals peer-review and provisionally accept study protocols before data are collected, shifting the emphasis from results to research rigor and a reduction in publication bias ASPREDICTED.ORG CENTER FOR OPEN SCIENCE | EXAMPLE Pavlovia Pavlovia’s “where behavior is studied” space provides an online platform that hosts experiments built with tools like jsPsych and PsychoPy. It supports participant management and data collection. PAVLOVIA.ORG Prolific A high-quality, albeit costly, online participant pool favored by behavioral and experimental researchers for reliable data, pre- screening options, and participant diversity. PROLIFIC.COM PsyToolkit PsyToolkit is a free platform for programming and running cognitive and behavioral experiments and surveys, including personality measures. It is widely used in academic research, student projects, and teaching across cognitive and personality psychology. PSYTOOLKIT.ORG Sample Size Calculator Created by Dr. Rebecca Hofstein Grady (University of California, Irvine), this calculator estimates MTurk study costs by computing total sample size, platform fees (Amazon and CloudResearch), participant wages, and total project cost. Users enter study parameters and the spreadsheet auto-populates all cost metrics. DOCS.GOOGLE.COM Crowdsourcing Crowdsourced data collection (e.g., CloudResearch, Prolific, CrowdFlower, Clickworker, Amazon Mechanical Turk) is a complex and evolving research method. As an early adopter of crowdsourced sampling, I have presented on this topic in multiple academic venues, and I am happy to share my latest summary- please feel free to email me. Recent developments, including AI agents, domestic and international click farms, and increasing self-selection bias, have raised concerns about the validity of crowdsourced samples (Goodman & Paolacci, 2017). Although prior research has assessed the quality of various online pools (e.g., Chandler et al., 2019; 2020), important gaps remain in our understanding of (A) participants’ motivations to misrepresent themselves and (B) effective mechanisms for preventing misrepresentation. These issues are particularly salient in studies using screening criteria. For example, in a study examining nutrition label formats among consumers with diet-related conditions, participants can easily misrepresent eligibility through self-report (e.g., claiming to have diabetes or hypertension). At the same time, online marketplaces can also yield high participant investment, a less frequently discussed but important upside of crowdsourcing. In one of my classroom-based projects, MTurk participants reviewed 15+ student-created advertisements and provided both structured ratings and open-ended feedback. Despite minimal compensation, average survey duration exceeded one hour, and participants provided thoughtful qualitative comments for nearly every ad. The average response exceeded 700 words (approximately 2.5 pages), highlighting that high-quality engagement can emerge under the right conditions. Worried About Data Quality from Online Sampling Pools? Concerns about online data quality intensified in 2018 across social science forums (e.g., Psychological Methods Discussion Group, blogs, TurkPrime, Twitter). Common challenges include: (A) Click farms, (B) Non-human respondents (survey bots), (C) Foreign participants using VPNs, and (D) Low motivation or “speeders”. While I continue to support the use of crowdsourced samples, researchers must clearly communicate rigorous data-cleaning procedures to reviewers, associate editors, and editors. High-quality data remain essential (e.g., requiring a 99% approval rating). Fortunately, several tools and practices can improve data integrity. Recommended Practices (1) Attention Checks Useful when applied thoughtfully and sparingly. Overuse can damage participant experience and motivation. I recommend nondiscriminatory checks (e.g., avoiding color-based Stroop tasks). I have developed a Numeric Stroop Test that elicits similar cognitive load without relying on color perception. (Moss, 2021; Curran and Hauser, 2019) (2) Honeypot Questions Honeypot questions are used to identify low-quality or non-genuine responses and can take multiple forms. One approach involves invisible honeypot items including text fields or inputs that are hidden from human participants but detectable by automated bots or copy-paste scripts. These items may be effective for identifying non-human responses and typically require JavaScript implementation (e.g., in Qualtrics; Goodrich et al., 2023). I use this technique regularly in my own research and I am happy to share sample scripts. A second approach, more common in industry and crowdsourcing research, involves “gold-standard” honeypot tasks. These are visible items with known correct responses that are embedded sparingly throughout a task to estimate worker reliability over time, including cases where qualified participants engage in inattentive or low-effort responding. I am happy to share sample scripts and implementation examples for both approaches (e.g., Kang and Tay, 2020). (3) Image-Based Responses Participants submit images in response to prompts, reducing response automation and survey monotony. Images can later be coded using computer-vision tools. Bosch et al. (2019) provide a useful application of this approach. (4) Suspicious ISP and Geolocation Identification Platforms such as CloudResearch allow researchers to block suspicious geocodes, duplicate IPs, and known low-quality workers. Researchers can also upload IP and GPS data for analysis and flagging. (5) Seriousness Checks Simple self-report items asking participants about their engagement can be effective for excluding non-serious respondents. (Aust et al., 2012) (6) Commitment Requests A large-scale experimental study (N ≈ 4,000) found that a brief commitment prompt reduced data-quality issues more effectively than traditional attention checks. For example: “Do you commit to providing thoughtful answers to each question in this survey?” as suggested by Geisen (2022). (7) SurveyTainment Reserchers can embed small, non-intrusive entertainment elements (e.g., short games, visuals, or playful interruptions) into surveys to refresh respondents’ attention, improve mood, and reduce careless or disengaged responding. Rather than distracting from measurement, these elements are designed to support cognitive engagement and ultimately improve survey data quality (Kostyk et al., 2019). Data Sources American Psychological Association Data Repositories The APA curates links to a wide range of behavioral and social science datasets, including data related to health, mental health, retirement, and well-being. APA.ORG/DATA Consumer Complaint Data (CFPB, FCC) The Consumer Financial Protection Bureau (CFPB) and Federal Communications Commission (FCC) publish large-scale consumer complaint data related to financial products and telecommunications services (e.g., billing, phone, internet, television). CONSUMERCOMPLAINTS.FCC.GOV Correlates of State Policy (MSU / IPPSR) The Correlates of State Policy Project provides longitudinal political, social, and economic data across all 50 U.S. states (1900–current), with more than 900 policy-related variables available in multiple formats. IPPSR.MSU.EDU CFPB Financial Well-Being The CFPB Financial Well-Being Public Use File includes a validated 10-item financial well-being scale and extensive supporting variables. Complementary data from the University of Michigan track consumer expectations related to personal finances and the broader economy. CONSUMERFINANCE.GOV Consumer Finances (Federal Reserve) The ‘Survey of Consumer Finances’ (SCF) is a triennial, nationally representative survey of U.S. households covering income, assets, debt, pensions, and demographic characteristics, widely used in academic and policy research. FEDERALRESERVE.GOV Data.gov The U.S. government’s open data portal provides access to more than 200,000 datasets spanning education, health, finance, energy, public safety, and other domains. DATA.GOV General Social Survey (GSS) The GSS offers decades of nationally representative data on U.S. social attitudes, behaviors, and demographics, with tools for custom dataset extraction and analysis. GSS.NORC.ORG Google Dataset Search A dataset discovery engine that enables keyword-based searching across thousands of repositories, supporting data citation and research transparency. TOOLBOX.GOOGLE.COM/DATASETSEARCH Inter-university Consortium for Political & Social Research (ICPSR) ICPSR is a leading repository of social science data, offering curated datasets, extensive documentation, and training resources widely used in peer-reviewed research. ICPSR.UMICH.EDU Nutrition & Health Data (FDA, CDC, USDA) OpenFDA provides downloadable data related to food labeling, recalls, pharmaceuticals, and medical devices. Additional nutrition and health datasets are available through NHANES (CDC) and food and nutrition assistance data from the USDA. CDC.GOV/NCHS/NHANES ERS.USDA.GOV OPEN.FDA.GOV Organisation for Economic Co-operation and Development (OECD) OECD Data provides internationally comparable indicators related to consumer confidence, well-being, inequality, labor markets, and economic performance across member countries. DATA.OECD.ORG Pew Research Center Pew offers high-quality, publicly available datasets on U.S. and global attitudes related to politics, media, technology, religion, science, and social trends. PEWRESEARCH.ORG

Data Collection &

Sources

Designed & built by Laurel

Laurel-Cook.com

Data Collection &

Sources

Experimental Design

Gorilla Experiment Builder A no-code platform for building and deploying online behavioral experiments with precise stimulus timing and latency measurement. GORILLA.SC G*Power A stand-alone power analysis tool for t-tests, ANOVAs, regressions, and experimental planning. GPOWER.HHU.DE jsPsych A JavaScript framework for building fully customizable, browser-based behavioral experiments. JSPSYCH.ORG Many Labs 2 (Open Materials) A large-scale replication project offering validated experimental manipulations across classic psychological effects. PSYARXIV.COM Pavlovia Hosts and manages experiments built with tools like jsPsych and PsychoPy. PAVLOVIA.ORG Prolific A high-quality online participant pool with strong screening, diversity, and data reliability. PROLIFIC.COM PsyToolkit A free platform for programming and running cognitive and behavioral experiments and surveys. PSYTOOLKIT.ORG Sample Size & Cost Calculator A spreadsheet tool estimating MTurk study costs, fees, wages, and total sample size requirements. DOCS.GOOGLE.COM Crowdsourcing Crowdsourced Data Collection Online panels (e.g., Prolific, MTurk, CloudResearch) offer scale and efficiency but require careful screening and transparency in data-cleaning procedures. Common Data Quality Risks Key challenges include bots, click farms, VPN use, and low-effort responding, especially in screened samples. Evidence-Based Mitigation Strategies Recommended practices include attention checks, honeypot items, image-based responses, geolocation filtering, seriousness checks, commitment prompts, and “survey-tainment” to sustain engagement. Note: I regularly present on crowdsourced sampling and am happy to share implementation scripts and best practices upon request. Public Data Sources American Psychological Association Data Repositories Curated behavioral and social science datasets related to health, well-being, and psychology. APA.ORG/DATA CFPB & FCC Consumer Complaint Data Large-scale consumer complaint datasets on financial products and telecommunications. CONSUMERCOMPLAINTS.FCC.GOV Correlates of State Policy (MSU/IPPSR) Longitudinal U.S. policy, economic, and social data spanning 1900–present. IPPSR.MSU.EDU CFPB Financial Well-Being Data Includes a validated 10-item financial well-being scale and related consumer variables. CONSUMERFINANCE.GOV Survey of Consumer Finances (Federal Reserve) A triennial, nationally representative dataset on U.S. household finances. FEDERALRESERVE.GOV Data.gov The U.S. government’s open data portal across education, health, finance, and public policy. DATA.GOV General Social Survey (GSS) Decades of nationally representative data on U.S. attitudes, behavior, and demographics. GSS.NORC.ORG Google Dataset Search A discovery tool for locating datasets across thousands of repositories. TOOLBOX.GOOGLE.COM/DATASETSEARCH ICPSR A leading archive of curated social science datasets with documentation and training resources. ICPSR.UMICH.EDU FDA, CDC & USDA Data Open datasets related to nutrition, food labeling, health, and population surveillance. CDC.GOV/NCHS/NHANES ERS.USDA.GOV OPEN.FDA.GOV OECD Data International indicators of well-being, inequality, labor markets, and economic performance. DATA.OECD.ORG Pew Research Center Publicly available datasets on public opinion, media, technology, and social trends. PEWRESEARCH.ORG