Off-platform lift · DataCatalog cross-listing · v1.0.0
Hugging Face Datasets submission
The 159-row indie saas teardowns dataset ships at /dataset with full Dataset JSON-LD, citation, BibTeX, and per-table CSVs. This page is the canonical handoff surface for mirroring it to Hugging Face Datasets – one of the recognised catalogs Google Dataset Search ranks favourably and a second acquisition surface for the same corpus.
Five-step submission
- Create the HF dataset repo. huggingface.co/new-dataset. Suggested owner+name:
unlocksaas/indie-saas-teardowns. Visibility: Public. License: cc-by-4.0. - Download the canonical dataset card.
curl -O https://unlocksaas.com/dataset/huggingface/raw
The response sets
Content-Dispositiontofilename="README.md", so the file lands ready to upload as-is. - Upload the CSVs to the repo root. Five per-table CSVs, downloadable from /dataset:
- Funnel teardowns (33 rows, 27 columns) →
funnel-teardowns.csv - Pricing teardowns (31 rows, 29 columns) →
pricing-teardowns.csv - Head-to-head comparisons (61 rows, 22 columns) →
comparisons.csv - Named-competitor alternatives (21 rows, 17 columns) →
alternatives.csv - Category buckets (13 rows, 8 columns) →
categories.csv
HF Datasets Server auto-converts each CSV to Parquet on push – no manual conversion step.
- Funnel teardowns (33 rows, 27 columns) →
- Set the activation env var on Vercel.
vercel env add NEXT_PUBLIC_UNLOCKSAAS_HUGGINGFACE_DATASET_URL production # Paste the HF repo URL when prompted, for example: # https://huggingface.co/datasets/unlocksaas/indie-saas-teardowns
Repeat for the preview environment if you want the cross-listing visible on preview deploys too.
- Redeploy and verify. The Dataset JSON-LD on
/datasetnow declares the HF DataCatalog cross-listing automatically. Verify with the Google Rich Results Test against /dataset – the schema graph should show oneincludedInDataCatalogentry withname: "Hugging Face Datasets". Google Dataset Search re-ingests on the next crawl (typically 1–7 days).
YAML frontmatter preview
This is the exact YAML block the HF Hub parses to populate its search facets, the auto-Parquet conversion, and the dataset-card metadata panel. Identical to what /dataset/huggingface/raw serves between its --- delimiters.
---
pretty_name: Indie SaaS Teardowns Dataset
language:
- en
license: "cc-by-4.0"
license_link: "https://creativecommons.org/licenses/by/4.0/"
size_categories:
- n<1K
task_categories:
- "text-classification"
- "text-retrieval"
tags:
- saas
- "indie-hackers"
- marketing
- "funnel-analysis"
- "pricing-analysis"
- comparison
- "russell-brunson"
- "value-ladder"
- editorial
- "honest-claims"
source_datasets:
- original
doi: "10.5281/zenodo.20315741"
configs:
- config_name: funnel_teardowns
data_files:
- split: train
path: "funnel-teardowns.csv"
- config_name: pricing_teardowns
data_files:
- split: train
path: "pricing-teardowns.csv"
- config_name: comparisons
data_files:
- split: train
path: comparisons.csv
- config_name: alternatives
data_files:
- split: train
path: alternatives.csv
- config_name: categories
data_files:
- split: train
path: categories.csv
---Google Dataset Search verification
Google Dataset Search ingests the schema.org Dataset JSON-LD rendered on /dataset – there is no separate submission portal. Three things make a Dataset eligible for inclusion:
- The page is indexable and the canonical URL is stable. Confirmed –
/datasetshipsrobots: index, followand is listed in /sitemap.xml. - The
DatasetJSON-LD carriesname,description,license,creator,distribution,dateModified,keywords,variableMeasured, andidentifier. Confirmed – every field is sourced from the canonical dataset module and includesmeasurementTechniquedescribing the editorial method. - Cross-catalog corroboration via
includedInDataCatalog. Activates automatically once the HF env var lands. Until then the Dataset schema lists the canonical landing as its onlysameAs– still eligible, but without the catalog-cross-listing rank lift.
Use the Google Dataset Search UI to query the canonical name once Google has crawled the updated schema. Typical end-to-end propagation after the HF env var is set: 24 hours for the schema, 1–7 days for Dataset Search re-ingestion.
Brunson Hard-Rule
The HF dataset card body and YAML frontmatter both derive from the same module that drives /dataset and its JSON-LD. The HF mirror cannot drift from the canonical site by construction – every row count, license string, citation, and column contract is read once at module load.
The HF cross-listing itself is operator-gated. The Dataset JSON-LD declares the catalog only when the env var resolves to a valid https:// URL. A missing or malformed value is silently skipped – the schema validator never sees a fabricated catalog claim.
Author: Maryan, Founder, Unlock SaaS
Last verified: 2026-05-18
Next editorial review: 2026-08-16
Raw README.md: /dataset/huggingface/raw