Datasets


Contexts (Environment & Climate)

Logo

Beijing Air Quality

This dataset contains the air quality in Beijing from 12 different country-level locations with dates from 2013 to 2017. It contains measurements for air quality such as CO₂ and O₃. The data was originally retrieved from the UCI Machine Learning Repository.

https://www.kaggle.com/datasets/sid321axn/beijing-multisite-airquality-data-set

Logo

ECOTOX Knowledgebase

The ECOTOX Knowledgebase provides single chemical environmental toxicity data for aquatic and terrestrial species. It includes data for over 12,000 chemicals and 13,000 species across hundreds of thousands of tests.

https://www.epa.gov/comptox-tools/ecotoxicology-ecotox-knowledgebase-resource-hub

Logo

Environmental Protection Agency (EPA)

The Environmental Protection Agency (EPA) provides context data related to environmental protection and public health. Key datasets cover areas such as air quality, water quality with geospatial information available, and pollution monitoring. Freely available.

https://www.epa.gov/

Logo

European Environment Agency (EEA) Waterbase

The European Environment Agency’s Waterbase is an open database on the status and quality of Europe’s water. It contains data for rivers, lakes, groundwater, and coastal waters on metrics such as nutrients (nitrogen, phosphorus) across European monitoring stations. Freely available.

https://surl.gd/usolve

Logo

Food and Agriculture Organization of the United Nations (FAO)

FAO offers comprehensive context data on food and agriculture spanning over 245 countries and territories, from 1961 to the most recent year available; it includes date, longitude, and latitude. Some datasets may require free registration for access to specific tools or services. Some datasets may require free registration for access to specific tools or services.

https://www.fao.org/faostat/en/#data

Logo

Google Maps Platform

Google Maps Platform provides a context dataset for planet Earth (air quality, for example) with longitude and latitude and is regularly updated. Possible payments required.

https://mapsplatform.google.com/?utm_experiment=13102196

Logo

Information Platform for Chemical Monitoring (IPCheM)

IPCheM is the European Commission’s central access point for chemical monitoring data collected across Europe in air, water, soil, biota, and indoor environments. Supports chemical risk assessment and policy-making.

https://ipchem.jrc.ec.europa.eu

Logo

Mussel Watch

Mussel Watch is a biomonitoring program that tracks nearly 600 chemical contaminants, including heavy metals, chlorinated pesticides (like DDT), PAHs, and emerging contaminants. Data collected from 1986 to the present. Freely available.

coastalscience.noaa.gov

Logo

National Aquatic Resource Survey - Rivers and Streams

This dataset from the U.S. EPA provides condition assessments of over 1,000 rivers and streams in the lower 48 U.S. states using biological, chemical, and physical indicators. Part of the National Aquatic Resource Surveys.

https://catalog.data.gov/dataset/national-aquatic-resource-survey-rivers-and-streams-data

Logo

Open Meteo API

The Open-Meteo platform provides a comprehensive Historical Weather API that offers access to a vast amount of meteorological data. Some key features: temperature, relative humidity, cloud cover, and wind speed.

https://tinyurl.com/322bve8j

Logo

SatBird: Bird Species Distribution Modeling

SatBird is a dataset for modeling bird species distributions using satellite imagery and citizen science data (from eBird). It includes more than 2 million observations and multispectral data for 214 bird species across the U.S.

https://neurips.cc/virtual/2023/poster/73571

Logo

Toxics Release Inventory (TRI)

The Toxics Release Inventory (TRI) tracks the management of certain toxic chemicals that may pose a threat to human health and the environment. It includes annual data reported by U.S. facilities since 1987.

https://catalog.data.gov/dataset/toxics-release-inventory-tri

Logo

U.S. Water Quality Portal (WQP)

A large repository integrating water quality monitoring data. The WQP provides access to millions of records on parameters like pH, dissolved oxygen, temperature, salinity, nutrients, and contaminant levels across the United States from 1950 to the present and is continually updated. Freely available.

https://www.usgs.gov/media/images/water-quality-portal-new

Logo

USA Air Pollution

This context dataset contains date, country-level locations from 2000 to 2023 for around 650 thousand records for the air quality in the United States based on pollutants such as carbon monoxide (CO), ozone (O₃), and others.

https://www.kaggle.com/datasets/guslovesmath/us-pollution-data-200-to-2022

Logo

United States Large-Scale Solar Photovoltaic Database (USPVDB)

USPVDB provides context datasets on the performance of large-scale solar photovoltaic systems across the U.S. with dates, longitude, and latitude. Requires a special request per dataset for access.

https://www.nrel.gov/pv/us-pv-database.html

Logo

planet.com

Planet.com provides daily satellite earth data analytics with 200 satellites; they provide longitude, latitude, and dates and measurements for different planetary variables (soil water, land surface temperature, forest carbon), and it is updated daily. Payments required.

https://www.planet.com/

Contexts (Misc Data & APIs)

Logo

Here Maps

Here Maps provides world data with longitudes and latitudes and dates for real-time traffic data, including flow, incidents, and congestion details. It gets updated regularly. Requires an API key and offers to get started for free, with usage-based pricing as needs grow.

https://www.here.com/

Logo

Awesome Spatial Datasets

This curated collection provides links to high-quality spatial datasets from urban planning, transportation, demography, and environmental monitoring. It’s a valuable resource for geospatial data scientists seeking open access datasets across domains.

https://www.spatialedge.co/p/awesome-datasets

Logo

CDC Foodborne Outbreak Data (BEAM)

The CDC’s BEAM Dashboard provides curated data on foodborne disease outbreaks across the United States. It includes temporal and spatial details of outbreaks, pathogen types, and affected demographics, supporting public health and epidemiological research.

https://www.cdc.gov/ncezid/dfwed/beam-dashboard.html

Logo

Gallup World Poll Public Datasets

Gallup World Poll offers cross-national survey data on public opinion, well-being, economics, and health. The public datasets support comparative research across more than 160 countries, making them valuable for global development and policy analysis.

https://www.gallup.com/analytics/318923/world-poll-public-datasets.aspx

Logo

Global Suicide Rates

This context dataset contains year- and country-level location data from 2000 to 2015 for the rates of suicide around the globe. The dataset was originally retrieved from the World Health Organization (WHO).

https://www.kaggle.com/datasets/mexwell/global-suicide-rates

Logo

Google Earth Engine API

Google Earth Engine provides a context dataset for planet Earth (climate and weather) with longitude and latitude that goes back as early as 1979 and is updated on a daily basis. Freely available and may need free registration.

https://developers.google.com/earth-engine/reference/rest

Logo

KidSat: Satellite Imagery for Childhood Poverty

KidSat is a benchmark dataset for mapping childhood poverty using satellite imagery. It links high-resolution visual data with poverty labels across Africa and Latin America, enabling fair benchmarking for ML models.

https://arxiv.org/pdf/2407.05986

Logo

MOSAICs: Machine Learning for Satellite Imagery

MOSAICs provides preprocessed satellite imagery and feature embeddings designed for machine learning applications. It allows scalable learning with satellite data, especially for environmental, socio-economic, and infrastructure-related tasks across geographies.

https://www.mosaiks.org/

Logo

Meta on Humanitarian Data Exchange (HDX)

Meta’s HDX profile offers curated datasets that support humanitarian and development efforts globally. These datasets cover topics such as population movements, digital connectivity, and global crises, enabling data-driven decision-making in policy, health, and disaster response.

https://data.humdata.org/organization/meta

Logo

NASA Open Data Portal

The NASA Open Data Portal is a comprehensive resource that provides access to various events and potential context datasets, such as oceanography datasets and more with dates, longitudes, and latitudes. Freely available.

https://data.nasa.gov/browse?sortBy=newest&pageSize=20&page=1

Logo

Our World in Data

Our World in Data is an open-access data and research platform that explores the world’s largest problems through empirical evidence. It provides interactive charts, extensive datasets, and in-depth articles on topics such as global health, poverty, education, climate change, energy, and economic development.

https://ourworldindata.org/

Logo

USA COVID-19

Contains the COVID-19 events in the USA with date, cases, and deaths on a country-level location from 2020 to 2023.

https://github.com/nytimes/covid-19-data

Contexts (Population Data & Mobility)

Logo

Disaster Ninja

Disaster Ninja, is a geospatial tool for disaster management that visualizes recent natural disasters, mapping gaps, and contributor activity. It integrates datasets like population density and OpenStreetMap to help humanitarian organizations prioritize and coordinate mapping efforts.

https://disaster.ninja/

Logo

Environmental Systems Research Institute (ESRI) Demographics

Esri offers a vast collection of datasets from population count, income per capita, and more. It contains dates, longitude, and latitude, with dates going back 5 years and more. The date update frequency varies from 1 dataset to another. It requires a subscription or purchase.

https://www.esri.com/en-us/arcgis/products/data/data-portfolio/demographics

Logo

Germany Population Data (Zensus 2022)

Population dataset from the German Federal Statistical Office (Destatis), based on the Zensus 2022. Offers detailed demographic distributions by region, gender, and age groups across Germany.

https://atlas.zensus2022.de/

Logo

Global Human Settlement Layer (GHSL)

GHSL provides open and free context data for assessing human presence on Earth, such as built-up surface data, population data, and temporal data with date, longitude, and latitude. Freely available.

https://human-settlement.emergency.copernicus.eu/

Logo

Spectus

The Spectus API provides context data for human mobility analytics with longitude and latitude and density measurements. It dates from 2019 and provides real-time data. Subscription payments are required.

https://docs.spectus.ai/

Logo

United Nations High Commissioner for Refugees (UNHCR)

UNHCR provides data for resettlement statistics, population statistics, and refugee statistics through their APIs.

https://www.unhcr.org/

Events

Logo

911 Emergency Calls

This events dataset contains the date, longitude, and latitude for 911 emergency calls from 2015 to 2016 for around 99 thousand records.

https://www.kaggle.com/datasets/sachinpatil1280/911-emergency-calls?select=911.csv

Logo

Armed Conflict Location and Event Dataset (ACLED)

ACLED provides multiple event datasets for armed conflict dates, longitude, and latitude. It contains all events from around the world, such as conflicts in Central Africa, Brazil, and Pakistan. Free registration required.

https://acleddata.com/data-export-tool/

Logo

Atlanta Police Department Crimes

The Atlanta Police Department (APD) is an open data portal for crime events from 1997 to 2025 with timestamps and address-level locations. It contains historical crime data for the city of Atlanta. Freely available.

https://opendata.atlantapd.org/

Logo

Beijing Traffic

This dataset contains traffic speeds at 5-minute granularity for 3126 roadway segments in Beijing between 2022/05/12 and 2022/07/25 and contains 3126 streets.

https://github.com/deepkashiwa20/Urban_Concept_Drift/tree/main

Logo

Chicago Crimes

This events dataset contains the Chicago crimes from 2001 to the present, except for the past 7 days, with date, longitude, and latitude. Freely available.

https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data

Logo

City Bike Trips

This events dataset includes records of city bike trips from 2013 to 2025, providing information on trip start and end times, as well as the corresponding longitude and latitude coordinates. Freely available.

https://citibikenyc.com/system-data

Logo

Correlates of War (COW)

The COW project provides event datasets for conflicts, wars, and more. It contains dates, longitude, and latitude. COW data spans several centuries. Freely available.

https://correlatesofwar.org/data-sets/

Logo

Crimes in Vancouver

Includes individual crime events with time (minute, hour, day, month, year) and latitude and longitude from 2003 to 2017 with 530 thousand records. The data was originally retrieved from city of Vancouver open data portal.

https://www.kaggle.com/datasets/wosaku/crime-in-vancouver

Logo

Global Animal Disease Information System

Supports access to global disease information in time, longitude, and latitude coordinates for high-impact animal diseases. Requires free registration.

https://empres-i.apps.fao.org/general

Logo

Global Database of Events, Language, and Tone (GDELT)

GDELT provides world data with longitudes and latitudes and dates on events such as protests and violent attacks with context data such as people's emotions from the news. It has historical data dating to 1971 and efforts to further extend it further back to 1800. It gets updated every 15 minutes. Freely available.

https://www.gdeltproject.org/

Logo

Global Terrorism Database (GTD)

GTD provides event datasets for terrorism with dates and city-level locations on terrorism from 1970 to 2020 with over 200 thousand records. Requires free registration.

http://apps.start.umd.edu/gtd/

Logo

Google Health COVID-19 Open Data Repository

The Google Health COVID-19 Open Data Repository is a comprehensive collection of up-to-date COVID-19-related information.

https://health.google.com/covid-19/open-data/raw-data

Logo

Gun Violence Archive (GVA)

GVA provides event datasets for gun violence for the USA with dates, city-level locations, and addresses from 2013 to the present, and it is updated on a regular basis. Freely available.

https://www.gunviolencearchive.org/

Logo

Humanitarian Data Exchange (HDX)

HDX provides a wide range of humanitarian datasets with spatial and temporal information, including crisis data of varying historical depth. Most datasets are freely available, though some require access requests.

https://data.humdata.org/

Logo

Indonesia Volcanoes

This events dataset contains the date, longitude, and latitude from 1300 to 2021 for around 200 records of volcanoes in Indonesia among relevant information such as impacts like the number of houses destroyed.

https://www.kaggle.com/datasets/corneliuskristianto/volcano-events-in-indonesia-13002021

Logo

Mass Mobilization in Autocracies Database (MMAD)

MMAD provides event data on protests on a city-level location on a daily basis from 2003 to 2012. Freely available.

https://mmadatabase.org/

Logo

Motor Vehicle Collisions - Crashes

Individual events with time, latitude, longitude and marks. Freely available.

https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95/about_data

Logo

NYC Complaints

This events dataset includes all crimes reported to the New York City Police Department (NYPD) by date, longitude, and latitude from 2016 to 2025. Freely available.

https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243/about_data

Logo

New York City Shootings

This events dataset contains the date, longitude, and latitude for New York City shootings from 2006 to 2019 for around 22 thousand records, including relevant information about the incident, such as the shooter (age, gender, race). The data was originally retrieved from NYC OpenData.

https://www.kaggle.com/datasets/thaddeussegura/new-york-city-shooting-dataset

Logo

Northern California Earthquake Data Center (NCEDC)

NCEDC provides dates, longitude, latitude, and other relevant information about the earthquake events in central and northern California. Some datasets may require special requests for access based on specific terms.

https://ncedc.org/

Logo

Social Conflict Analysis Database (SCAD)

SCAD provides event datasets for protests, riots, and other social conflicts with date, longitude, and latitude from 1990 to 2017 covering all of Africa, Mexico, Central America, and the Caribbean. Freely available.

https://www.strausscenter.org/ccaps-research-areas/social-conflict/database/

Logo

The Uppsala Conflict Data Program (UCDP)

UCDP provides multiple event datasets for armed conflict dates, longitude, and latitude. (UCDP) provides data on organized violence with a history of almost 40 years. Freely available.

https://ucdp.uu.se/encyclopedia

Logo

UK Biobank

The UK Biobank is a comprehensive biomedical research resource containing health information from 500,000 participants. Requires a special request per dataset and is typically granted to approved researchers after an application and approval process.

https://www.ukbiobank.ac.uk/

Logo

USA Gun Violence

This events dataset contains the date and country-level locations from 2013 to 2022 for around 472 thousand records for gun violence incidents and mass shootings in the USA. This data was originally retrieved from the Gun Violence Archive organization.

https://www.kaggle.com/datasets/emmanuelfwerr/gun-violence-incidents-in-the-usa

Logo

USA Population

the data was originally retrieved from the U.S. Environmental Protection Agency (EPA).

https://www.kaggle.com/datasets/mohamedmagdy11/usa-county-population-total-20162021

Logo

USA Traffic Congestion

This dataset contains events of traffic congestion with the context of weather severity; it contains the date, longitude, and latitude from 2016 to 2021 for around 3 million records of traffic congestion in the USA, among relevant information such as the severity of the congestion.

https://www.kaggle.com/datasets/omosaad/events-dataset

Logo

United States Geological Survey (USGS)

The USGS provides information on natural hazard events that occur daily, such as earthquakes and volcanoes. Freely available.

https://www.usgs.gov/

Logo

World Airplane Crashes

This events dataset contains the date and country-level locations of the incident for around 5 thousand records from 1908 to 2020, along with relevant information such as casualties and the operator (whether private or something else).

https://www.kaggle.com/datasets/aiaiaidavid/airplane-crash-fatalities-since-1908-dv-03032020?select=Airplane_Crashes_and_Fatalities_Since_1908_DV_03032020.csv

Logo

World Earthquakes

This events dataset contains the date, longitude, and latitude from 1900 to 2023 for around 37 thousand records for the world-recorded earthquakes among relevant information related to the earthquake, such as the depth of the earthquake. This dataset contains nulls in some features but not in longitude, latitude, or date.

https://www.kaggle.com/datasets/jahaidulislam/significant-earthquake-dataset-1900-2023

Logo

World Fireballs

This events dataset contains the date, longitude, and latitude from 1988 to 2017 for around 800 records for exceptionally bright fireballs that are spectacular enough to be seen over a very wide area in the sky. Only the brightest fireballs are recorded.

https://www.kaggle.com/datasets/nasa/fireballs

Logo

World Natural Disasters

This events dataset contains dates, longitudes, and latitudes from 1900 to 2021 for around 16 thousand records of various types of natural disasters in the world, such as droughts, volcanoes, and earthquakes, among relevant information such as the number of people injured, total deaths, and damage costs.

https://www.kaggle.com/datasets/brsdincer/all-natural-disasters-19002021-eosdis

Logo

World Tsunamis

This events dataset contains the date, longitude, and latitude from 1900 to 2023 for around 2 thousand records of tsunamis around the world. It also includes relevant information such as the magnitude of each event and the impact of the tsunami, such as the number of deaths and injuries.

https://www.kaggle.com/datasets/harshalhonde/tsunami-events-dataset-1900-present

Logo

World Wildfires

This events and context dataset contains date, longitude, and latitude from 2000 to 2020 for measuring the light brightness and whether a fire happened or not. Data originally retrieved from NASA's FIRM Earthdata.

https://www.kaggle.com/datasets/ransakaravihara/h2oai-wildfire-bushfire-challenge-dataset