Dataerai Provides Input to OSTP on “Accelerating the American Scientific Enterprise”
Dataerai proposes transforming taxpayer-funded research into standardized, AI-ready open data resources to drive scientific productivity. To achieve this, Dataerai recommends the following: making data sharing a deliverable for federal grantees, providing financial support for the costs of data curation and publication, and enforcing accountability through both penalties for non-compliance and merit-based rewards.
Background
Academic-Industry data partnerships have historically driven American innovation. For example, freely available NOAA weather data powers logistics and insurance industries, and sharing the COVID-19 genome globally enabled vaccine development in record time. However, much of our taxpayer-funded academic research data remains inaccessible or unusable.
Current law and policy (from the 1999 law through to the 2025 OSTP directive) position publicly funded research data as a catalyst for American innovation, but execution lags in practice: well-intentioned researchers are effectively disincentivized by administrative hurdles and a lack of financial support to shoulder the liability and infrastructure required for sharing.
Data behind locked doors (firewalls) can’t readily be accessed and used. As our competitors aggressively invest in AI and data, ensuring **AI-ready open data** from U.S. research is now a matter of national competitiveness and security, economic growth, and public benefit.
Analysis
Today, federal agencies require grant recipients to plan for data sharing, but there is no systematic compliance. Many research projects finish with their data effectively locked away on a lab computer. This status quo means lost opportunities:
Recommendations
We propose a targeted approach to make federally funded research data publicly available and usable, with robust compliance. The strategy has four pillars:
1. Make Data Sharing a Deliverable: Require every federal grantee to deposit their project data in a suitable public repository and cite the dataset (with a DOI link) in their grant reports. Deliverables are not met until data is posted in a machine-readable, accessible form.
2. Support the Costs of Sharing: Allow grant funds or supplements to cover the work of curating and publishing data. Researchers get tools and modest resources (via platforms like Dataerai) to document, format, and securely share datasets without undue burden.
3. Hold Funded Projects Accountable: If researchers don’t share data as promised, agencies can and should withhold further funds or future awards. Compliance will be monitored, and non-compliance should have real consequences, just as it would for misused funds or safety violations.
4. Reward Good Data Stewards: Factor investigators’ data-sharing track record into grant evaluations. Those who consistently share high-quality data get a leg up; those who don’t may find it harder to win new grants. This creates a culture of compliance through positive incentive.
Conclusion
Dataerai’s cloud platform will be a key enabler, providing a one-stop solution for researchers to upload data, link to existing repositories, automatically check usability standards (e.g. metadata completeness and provenance), and publish data assets. At the same time, agencies can use the platform to track who has shared their data and who hasn’t, across all their projects. The platform makes enforcement efficient and facilitates the production of data records that are “AI-ready” (indexed and standardized) from the start.
Initiatives like the Genesis Mission have a foundational need for rich scientific data of an unprecedented scale. New tools and methodologies will be essential for building the datasets required to train the models that will deliver new scientific breakthroughs.