Skip to main content
Exclusive access to creating datasets for Trickest Solutions is provided solely for Enterprise users. If you are interested in learning more about the Enterprise Edition, please contact us.

Overview

Datasets are the structured storage layer for your Solution results. They define the schema that determines how data is organized, queried, and displayed in Insights. A well-designed dataset schema is critical for effective data analysis and change tracking.
Building a Complete Solution? If you’re creating a custom solution from scratch, see the Custom Solutions guide for an end-to-end tutorial that covers solution creation, datasets, and workflow building.

Understanding Dataset Fields

Each dataset field (key) consists of several components:
ComponentPurposeOptions
DefaultMarks primary/key fields that identify unique recordsToggle on/off
IconVisual identifier for the field typeFlag, text, database, chart, etc.
Key NameField identifier used in queries and exportsLowercase with underscores (e.g., hostname, vulnerability_id)
TypeData type for validation and formattingtext, int, data, bool, uuid, datetime
DescriptionHuman-readable explanation of the fieldHelps users understand field purpose

Field Types

text      - String values (URLs, hostnames, descriptions)
int       - Integer numbers (ports, counts, severity scores)
data      - Binary or large text data (raw responses, files)
bool      - True/false values (is_active, has_vulnerability)
uuid      - Unique identifiers (record IDs, correlation IDs)
datetime  - Timestamps (discovered_at, last_seen, scanned_at)
Default fields serve as the primary key for your dataset. Mark fields that uniquely identify records (e.g., endpoint_url + http_method for APIs, or hostname for assets). At least one field must be marked as default.

Creating a Dataset

1

Navigate to Insights

Open your Solution and go to the Insights tab.
You can create a dataset immediately after creating the solution.
2

Click 'Create Dataset'

If this is your first dataset, you’ll see an empty state with a Create Dataset button.
3

Name the Dataset

Choose a descriptive name that reflects the data being stored (e.g., “API Endpoints”, “Discovered Assets”, “Vulnerabilities”).
4

Define Fields

Click Add key to add fields one by one. Configure each field’s icon, name, type, and description.
5

Mark Default Fields

Toggle Default on for fields that form the primary key. At least one field should be marked as default.
6

Validate and Create

Ensure your schema is valid (no duplicate key names, at least one default field). Click Create Dataset.
Schema Validation: The system validates your schema before creation. Common errors include:
  • Missing default fields
  • Duplicate key names
  • Invalid characters in key names (use lowercase, numbers, underscores only)
  • Missing required fields (icon, type, description)

Example: API Endpoints Dataset

Here’s a complete example of a dataset schema for tracking API endpoints:
Default | Icon | Key Name          | Type     | Description
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   ✓    |  🔗  | url               | text     | Full endpoint URL
   ✓    |  🔨  | method            | text     | HTTP method (GET, POST, etc.)
   ✓    |  📖  | body_parameters   | data     | Request body parameters
        |  📝  | api_title         | text     | API title from documentation
        |  📍  | source            | text     | Discovery source URL
        |  📄  | content_type      | text     | Request payload format
        |  🌐  | hostname          | text     | URL hostname
        |  🏠  | domain_name       | text     | Registered domain
Key Design Choices:
  • Primary Key: url + method + body_parameters uniquely identify each endpoint variant
  • Metadata Fields: api_title, source, content_type provide context
  • Hierarchical Data: hostname and domain_name enable domain-level filtering
  • Data Type: body_parameters uses data type to store complex structures

Creating Datasets via API

You can create datasets programmatically using the Trickest API.

Getting Your Vault UUID

First, retrieve your vault UUID by calling the user info endpoint:
curl -X GET https://api.trickest.io/hive/v1/users/me/ \
  -H "Authorization: Token YOUR_API_TOKEN"
The response includes your vault_info.id which is your {vault_uuid}:
{
  "profile": {
    "vault_info": {
      "id": "b12a3fda-1161-4bd2-9549-d6bda39d59b0",
      "name": "your-vault-name",
      ...
    }
  }
}

API Endpoint

POST https://api.trickest.io/solutions/v1/{vault_uuid}/dataset
Replace {vault_uuid} with your vault UUID from the /users/me/ endpoint and {solution_id} with your Solution ID (found in the URL when viewing your solution).

Request Example

curl -X POST https://api.trickest.io/solutions/v1/{vault_uuid}/dataset \
  -H "Content-Type: application/json" \
  -H "Authorization: Token YOUR_API_TOKEN" \
  -d '{
    "solution": "{solution_id}",
    "name": "API Endpoints",
    "schema": {
      "fields": [
        {
          "name": "url",
          "description": "Full endpoint URL",
          "is_key": true,
          "icon": "square-arrow-out-up-right",
          "type": "text"
        },
        {
          "name": "api_title",
          "description": "The title of the API according to the documentation if available",
          "is_key": false,
          "icon": "letter-text",
          "type": "text"
        },
        {
          "name": "source",
          "description": "The source where the endpoint was discovered",
          "is_key": false,
          "icon": "anchor",
          "type": "text"
        },
        {
          "name": "method",
          "description": "The HTTP method used to call the endpoint",
          "is_key": true,
          "icon": "axe",
          "type": "text"
        },
        {
          "name": "body_parameters",
          "description": "Request body parameters associated with the endpoint",
          "is_key": true,
          "icon": "book-open",
          "type": "data"
        },
        {
          "name": "content_type",
          "description": "The format of the request payload",
          "is_key": false,
          "icon": "code-xml",
          "type": "text"
        },
        {
          "name": "hostname",
          "description": "The hostname where the URL is located",
          "is_key": false,
          "icon": "arrow-big-right-dash",
          "type": "text"
        },
        {
          "name": "domain_name",
          "description": "The registered domain associated with the hostname",
          "is_key": false,
          "icon": "arrow-big-right",
          "type": "text"
        }
      ]
    }
  }'

Field Properties

PropertyRequiredDescription
nameYesField identifier (lowercase with underscores)
descriptionYesHuman-readable explanation
is_keyYestrue for primary key fields, false otherwise
iconYesIcon identifier for visual representation
typeYesData type: text, int, data, bool, uuid, or datetime
At least one field must have is_key: true to serve as the primary key. Primary key fields uniquely identify records and enable change tracking.

Dataset Design Best Practices

Use Semantic Naming

Name fields clearly: hostname not h, vulnerability_severity not sev. This improves query readability and team understanding.

Include Timestamps

Always add discovered_at, last_seen, or scanned_at fields to track when data was collected. Essential for change detection.

Add Status Fields

Include _status field (text) to track record lifecycle: new, active, removed, resurfaced. Enables powerful filtering in Insights.

Plan for Correlation

Add correlation fields (IDs, hostnames, IPs) that allow joining with other datasets or external systems.

Common Schema Patterns

Web Assets:
url (text, key) + hostname (text) + ip_address (text) + status_code (int)
Vulnerabilities:
cve_id (text, key) + affected_asset (text, key) + severity (text) + cvss_score (int)
Subdomains:
subdomain (text, key) + ip_addresses (data) + discovered_at (datetime) + _status (text)
Network Services:
ip_address (text, key) + port (int, key) + service_name (text) + banner (data)
API Endpoints:
url (text, key) + method (text, key) + body_parameters (data, key) + content_type (text)

Data Quality Best Practices

  • Validation: Add validation nodes in your workflow to check data format before dataset insertion
  • Deduplication: Remove duplicates at every merge point to keep datasets clean
  • Normalization: Standardize formats (lowercase domains, trim whitespace, consistent date formats)
  • Error Handling: Log errors to separate outputs for debugging without breaking the workflow

Connecting Workflow Outputs to Datasets

After creating your dataset, you need to connect your workflow outputs to populate it with data.
1

Select Output Node

In the workflow Builder, identify which node produces the final results you want in your dataset.
2

Configure Dataset Connection

In the node settings, select your target dataset from the dropdown.
3

Map Fields

Ensure output field names match dataset key names exactly. If they don’t match, use transformation scripts or the Transform Data module.
4

Test

Run the workflow and verify data appears correctly in the dataset with proper field mapping.

Troubleshooting

Common Errors:
  • Invalid field name: Use lowercase, numbers, underscores only
  • Duplicate key: Each field name must be unique
  • No default field: At least one field must be marked as default (or is_key: true in API)
  • Missing required fields: All fields need icon, type, and description
Solutions:
  • Rename fields to follow naming conventions: api_endpoint not API-Endpoint
  • Remove or rename duplicate keys
  • Toggle Default on for primary key fields
  • Fill in all field properties before creating
Possible Causes:
  • Workflow run incomplete
  • Output nodes not connected to dataset
  • Field mapping mismatch between node output and dataset schema
  • Data filtered out by queries
Solutions:
  • Verify run completed successfully in Run tab
  • Check node-to-dataset connections in Builder
  • Review field names match exactly (case-sensitive)
  • Remove filters to see all data
Issue: Data doesn’t display correctly in Insights.Cause: Output data type doesn’t match dataset field type.Solutions:
  • Use transformation scripts to convert types (string to int, date parsing)
  • Update dataset schema to match actual data types
  • Add validation nodes to catch type errors before dataset insertion

Next Steps

I