Data quality
Detect suspicious responses, validate survey logic before publishing, and generate test data for QA. These tools help you maintain clean data and catch instrument issues early.
Speeders
The detect_speeders tool finds responses completed faster than a threshold you set. Unrealistically fast completions typically indicate bots or respondents who clicked through without reading.
Via the agent
"Find responses completed in under 60 seconds"
"Check for speeders with a 2-minute threshold"
The agent calls detect_speeders with the threshold in seconds. It scans completed, non-excluded, non-preview responses.
Response shape
{
"threshold_seconds": 60,
"total_complete_responses": 210,
"speeders_found": 8,
"speeders": [
{
"response_id": "resp_abc123",
"time_seconds": 12.3
},
{
"response_id": "resp_def456",
"time_seconds": 28.7
},
{
"response_id": "resp_ghi789",
"time_seconds": 45.1
}
]
}
Parameters
| Parameter | Default | Description |
|---|---|---|
threshold_seconds | 60 | Flag responses faster than this (minimum 5 seconds) |
Results are capped at 50 speeders in the response and ordered by completion time (fastest first).
Choosing a threshold
The right threshold depends on survey length. A 5-question screener might legitimately take 30 seconds; a 40-question study should take several minutes. Use the median completion time from the summary dashboard as a baseline and set the threshold at roughly one-third of the median.
Straightliners
The detect_straightliners tool finds respondents who gave the same answer to every row in grid-type questions. Straightlining is one of the most common indicators of inattentive responding.
Via the agent
"Check for straightlining in my survey"
"Are any respondents giving the same answer on every grid row?"
The agent calls detect_straightliners. It requires at least 2 grid questions in the survey to run (configurable via min_grid_questions).
Response shape
{
"grid_questions_checked": ["Q5", "Q8", "Q12"],
"total_responses_scanned": 210,
"straightliners_found": 5,
"straightliners": [
{
"response_id": "resp_abc123",
"grids_straightlined": 3,
"total_grids": 3
},
{
"response_id": "resp_def456",
"grids_straightlined": 2,
"total_grids": 3
}
]
}
Parameters
| Parameter | Default | Description |
|---|---|---|
min_grid_questions | 2 | Minimum grid questions needed for detection (1-N) |
The tool scans up to 1,000 completed, non-excluded responses. Results are capped at 50 straightliners.
A respondent is flagged only if they straightline across at least min_grid_questions grids — a single grid with identical answers could be a legitimate response pattern.
Response exclusion
After identifying bad data through speeders, straightliners, or manual review, exclude responses from analytics.
Single response
curl -X PATCH https://surveys.flashpoint.ai/api/v1/surveys/{survey_id}/responses/{response_id} \
-H "X-Service-Token: $TOKEN" \
-H "X-Team-ID: $TEAM_ID" \
-H "X-User-ID: $USER_ID" \
-H "Content-Type: application/json" \
-d '{
"excluded": true,
"reason": "Completed in 12 seconds -- suspected bot"
}'
Response:
{
"response_id": "resp_abc123",
"excluded": true,
"reason": "Completed in 12 seconds -- suspected bot"
}
Bulk exclusion
curl -X POST https://surveys.flashpoint.ai/api/v1/surveys/{survey_id}/responses/bulk \
-H "X-Service-Token: $TOKEN" \
-H "X-Team-ID: $TEAM_ID" \
-H "X-User-ID: $USER_ID" \
-H "Content-Type: application/json" \
-d '{
"action": "exclude",
"response_ids": ["resp_abc123", "resp_def456", "resp_ghi789"],
"reason": "Straightlining detected across all grid questions"
}'
Response:
{
"action": "exclude",
"affected": 3,
"response_ids": ["resp_abc123", "resp_def456", "resp_ghi789"]
}
Re-including responses
Set excluded: false on the single-response endpoint or use "action": "include" on the bulk endpoint. Excluded responses remain in the database and can always be restored.
Via the agent
"Exclude those 8 speeders"
The agent calls exclude_response for each flagged response. Because exclusion changes downstream analytics numbers, the agent presents an approval card before executing.
Typical quality workflow
- Run
detect_speedersanddetect_straightlinersto identify suspects. - Review flagged responses with
get_responseif needed. - Exclude bad data with
exclude_response(single) or the bulk endpoint. - Re-run analytics — excluded responses are automatically filtered out.
Test data
The generate_test_data tool creates random junk responses for QA purposes. This is useful for testing the response pipeline, verifying analytics charts render correctly, and validating export functionality.
Via the agent
"Generate 20 test responses for my survey"
"Fill this survey with fake data for a demo"
The agent calls generate_test_data with count=20. The tool has a strict guard: it requires an explicit user request for fake data and will refuse if the intent is ambiguous.
Response shape
{
"survey_id": "srv_...",
"generated": 20,
"note": "Test responses visible in analytics. Use exclude_response to clean up later."
}
Parameters
| Parameter | Default | Description |
|---|---|---|
count | 20 | Number of fake responses to generate (1-200) |
What gets generated
The tool reads the survey document and generates type-appropriate random answers:
| Question type | Generated answer |
|---|---|
select (single) | Random option |
select (multi) | 1-3 random options |
nps | Random 0-10 |
number | Random within configured data range |
text | Random sample phrase |
grid | Random horizontal option per vertical row |
ranking | Shuffled option order |
van-westendorp | Four price points in valid order |
maxdiff | Random best/worst per task |
conjoint | Random profile selection per task |
Each response gets a random completion time between 30 seconds and 10 minutes.
Cleanup
Test responses are visible in analytics immediately. To clean them up:
- Use
exclude_responseon individual test responses. - Use the bulk exclusion endpoint to exclude all at once.
- Or delete the survey and start fresh.
Important: test data vs. synthetic panels
generate_test_data creates random noise for QA. It is not the same as synthetic panel responses, which use AI personas with realistic demographics and personality-driven answers. If you want a synthetic panel to take the survey, use the synthetic panel flow instead (see Synthetic panels).
Pre-publish validation
The spellcheck endpoint validates a survey's DSL logic before publishing. It catches broken references, invalid expressions, and unreachable questions. (The surveys agent calls this internally under the tool name surveycheck.)
REST endpoint
curl https://surveys.flashpoint.ai/api/v1/surveys/{survey_id}/spellcheck \
-H "X-Service-Token: $TOKEN" \
-H "X-Team-ID: $TEAM_ID" \
-H "X-User-ID: $USER_ID"
Response shape
{
"unknown_identifiers": [
{
"question": "Q5",
"location": "skip_condition",
"dsl": "Q99 == `1`",
"unknown": "Q99",
"correction": "Q9"
}
],
"invalid_dsls": [
{
"question": "Q8",
"location": "display_condition",
"dsl": "Q3 ==== `1`"
}
],
"unreachable_questions": [
{
"label": "Q12",
"reason": "All paths skip past this question"
}
],
"post_logic_identifiers": [
{
"question": "Q15",
"unknown": "Q99"
}
]
}
Check categories
| Category | What it catches |
|---|---|
unknown_identifiers | Labels in skip/display logic that do not match any question (with suggested corrections) |
invalid_dsls | Skip or display conditions that fail to parse |
unreachable_questions | Questions that can never be shown due to logic conflicts |
post_logic_identifiers | Variable definition references to non-existent labels |
Via the agent
"Check my survey for logic errors before publishing"
The agent calls surveycheck after significant edits and before recommending publish. An empty response (all arrays empty) means the survey is clean.
DSL validation
For validating individual expressions before applying them:
# Validate via the agent's validate_dsl tool
# Returns: {"valid": true, "identifiers": ["Q1", "Q3"]}
# Or: {"valid": false, "error": "Unexpected token at position 5", "identifiers": []}
The agent uses this proactively when building complex skip logic to catch syntax errors before they reach the survey document.
Next steps
- Analyze clean data: Analyze.
- AI-powered insights: AI insights.
- Distribute to more respondents: Distribute.