Datasets
Contexts (Environment & Climate)

Beijing Air Quality
This dataset contains the air quality in Beijing from 12 different country-level locations with dates from 2013 to 2017. It contains measurements for air quality such as CO₂ and O₃. The data was originally retrieved from the UCI Machine Learning Repository.
https://www.kaggle.com/datasets/sid321axn/beijing-multisite-airquality-data-set

ECOTOX Knowledgebase
The ECOTOX Knowledgebase provides single chemical environmental toxicity data for aquatic and terrestrial species. It includes data for over 12,000 chemicals and 13,000 species across hundreds of thousands of tests.
https://www.epa.gov/comptox-tools/ecotoxicology-ecotox-knowledgebase-resource-hub
.jpg)
Environmental Protection Agency (EPA)
The Environmental Protection Agency (EPA) provides context data related to environmental protection and public health. Key datasets cover areas such as air quality, water quality with geospatial information available, and pollution monitoring. Freely available.
.jpg)
European Environment Agency (EEA) Waterbase
The European Environment Agency’s Waterbase is an open database on the status and quality of Europe’s water. It contains data for rivers, lakes, groundwater, and coastal waters on metrics such as nutrients (nitrogen, phosphorus) across European monitoring stations. Freely available.
.jpg)
Food and Agriculture Organization of the United Nations (FAO)
FAO offers comprehensive context data on food and agriculture spanning over 245 countries and territories, from 1961 to the most recent year available; it includes date, longitude, and latitude. Some datasets may require free registration for access to specific tools or services. Some datasets may require free registration for access to specific tools or services.

Google Maps Platform
Google Maps Platform provides a context dataset for planet Earth (air quality, for example) with longitude and latitude and is regularly updated. Possible payments required.

Information Platform for Chemical Monitoring (IPCheM)
IPCheM is the European Commission’s central access point for chemical monitoring data collected across Europe in air, water, soil, biota, and indoor environments. Supports chemical risk assessment and policy-making.

Mussel Watch
Mussel Watch is a biomonitoring program that tracks nearly 600 chemical contaminants, including heavy metals, chlorinated pesticides (like DDT), PAHs, and emerging contaminants. Data collected from 1986 to the present. Freely available.

National Aquatic Resource Survey - Rivers and Streams
This dataset from the U.S. EPA provides condition assessments of over 1,000 rivers and streams in the lower 48 U.S. states using biological, chemical, and physical indicators. Part of the National Aquatic Resource Surveys.
https://catalog.data.gov/dataset/national-aquatic-resource-survey-rivers-and-streams-data

Open Meteo API
The Open-Meteo platform provides a comprehensive Historical Weather API that offers access to a vast amount of meteorological data. Some key features: temperature, relative humidity, cloud cover, and wind speed.

SatBird: Bird Species Distribution Modeling
SatBird is a dataset for modeling bird species distributions using satellite imagery and citizen science data (from eBird). It includes more than 2 million observations and multispectral data for 214 bird species across the U.S.

Toxics Release Inventory (TRI)
The Toxics Release Inventory (TRI) tracks the management of certain toxic chemicals that may pose a threat to human health and the environment. It includes annual data reported by U.S. facilities since 1987.
https://catalog.data.gov/dataset/toxics-release-inventory-tri
.jpg)
U.S. Water Quality Portal (WQP)
A large repository integrating water quality monitoring data. The WQP provides access to millions of records on parameters like pH, dissolved oxygen, temperature, salinity, nutrients, and contaminant levels across the United States from 1950 to the present and is continually updated. Freely available.

USA Air Pollution
This context dataset contains date, country-level locations from 2000 to 2023 for around 650 thousand records for the air quality in the United States based on pollutants such as carbon monoxide (CO), ozone (O₃), and others.
https://www.kaggle.com/datasets/guslovesmath/us-pollution-data-200-to-2022
.jpg)
United States Large-Scale Solar Photovoltaic Database (USPVDB)
USPVDB provides context datasets on the performance of large-scale solar photovoltaic systems across the U.S. with dates, longitude, and latitude. Requires a special request per dataset for access.

planet.com
Planet.com provides daily satellite earth data analytics with 200 satellites; they provide longitude, latitude, and dates and measurements for different planetary variables (soil water, land surface temperature, forest carbon), and it is updated daily. Payments required.
Contexts (Misc Data & APIs)

Here Maps
Here Maps provides world data with longitudes and latitudes and dates for real-time traffic data, including flow, incidents, and congestion details. It gets updated regularly. Requires an API key and offers to get started for free, with usage-based pricing as needs grow.

Awesome Spatial Datasets
This curated collection provides links to high-quality spatial datasets from urban planning, transportation, demography, and environmental monitoring. It’s a valuable resource for geospatial data scientists seeking open access datasets across domains.

CDC Foodborne Outbreak Data (BEAM)
The CDC’s BEAM Dashboard provides curated data on foodborne disease outbreaks across the United States. It includes temporal and spatial details of outbreaks, pathogen types, and affected demographics, supporting public health and epidemiological research.

Gallup World Poll Public Datasets
Gallup World Poll offers cross-national survey data on public opinion, well-being, economics, and health. The public datasets support comparative research across more than 160 countries, making them valuable for global development and policy analysis.
https://www.gallup.com/analytics/318923/world-poll-public-datasets.aspx

Global Suicide Rates
This context dataset contains year- and country-level location data from 2000 to 2015 for the rates of suicide around the globe. The dataset was originally retrieved from the World Health Organization (WHO).
https://www.kaggle.com/datasets/mexwell/global-suicide-rates

Google Earth Engine API
Google Earth Engine provides a context dataset for planet Earth (climate and weather) with longitude and latitude that goes back as early as 1979 and is updated on a daily basis. Freely available and may need free registration.

KidSat: Satellite Imagery for Childhood Poverty
KidSat is a benchmark dataset for mapping childhood poverty using satellite imagery. It links high-resolution visual data with poverty labels across Africa and Latin America, enabling fair benchmarking for ML models.

MOSAICs: Machine Learning for Satellite Imagery
MOSAICs provides preprocessed satellite imagery and feature embeddings designed for machine learning applications. It allows scalable learning with satellite data, especially for environmental, socio-economic, and infrastructure-related tasks across geographies.

Meta on Humanitarian Data Exchange (HDX)
Meta’s HDX profile offers curated datasets that support humanitarian and development efforts globally. These datasets cover topics such as population movements, digital connectivity, and global crises, enabling data-driven decision-making in policy, health, and disaster response.

NASA Open Data Portal
The NASA Open Data Portal is a comprehensive resource that provides access to various events and potential context datasets, such as oceanography datasets and more with dates, longitudes, and latitudes. Freely available.
https://data.nasa.gov/browse?sortBy=newest&pageSize=20&page=1

Our World in Data
Our World in Data is an open-access data and research platform that explores the world’s largest problems through empirical evidence. It provides interactive charts, extensive datasets, and in-depth articles on topics such as global health, poverty, education, climate change, energy, and economic development.

USA COVID-19
Contains the COVID-19 events in the USA with date, cases, and deaths on a country-level location from 2020 to 2023.
Contexts (Population Data & Mobility)

Disaster Ninja
Disaster Ninja, is a geospatial tool for disaster management that visualizes recent natural disasters, mapping gaps, and contributor activity. It integrates datasets like population density and OpenStreetMap to help humanitarian organizations prioritize and coordinate mapping efforts.

Environmental Systems Research Institute (ESRI) Demographics
Esri offers a vast collection of datasets from population count, income per capita, and more. It contains dates, longitude, and latitude, with dates going back 5 years and more. The date update frequency varies from 1 dataset to another. It requires a subscription or purchase.
https://www.esri.com/en-us/arcgis/products/data/data-portfolio/demographics

Germany Population Data (Zensus 2022)
Population dataset from the German Federal Statistical Office (Destatis), based on the Zensus 2022. Offers detailed demographic distributions by region, gender, and age groups across Germany.
.jpg)
Global Human Settlement Layer (GHSL)
GHSL provides open and free context data for assessing human presence on Earth, such as built-up surface data, population data, and temporal data with date, longitude, and latitude. Freely available.

Spectus
The Spectus API provides context data for human mobility analytics with longitude and latitude and density measurements. It dates from 2019 and provides real-time data. Subscription payments are required.

United Nations High Commissioner for Refugees (UNHCR)
UNHCR provides data for resettlement statistics, population statistics, and refugee statistics through their APIs.
Events

911 Emergency Calls
This events dataset contains the date, longitude, and latitude for 911 emergency calls from 2015 to 2016 for around 99 thousand records.
https://www.kaggle.com/datasets/sachinpatil1280/911-emergency-calls?select=911.csv
.jpg)
Armed Conflict Location and Event Dataset (ACLED)
ACLED provides multiple event datasets for armed conflict dates, longitude, and latitude. It contains all events from around the world, such as conflicts in Central Africa, Brazil, and Pakistan. Free registration required.

Atlanta Police Department Crimes
The Atlanta Police Department (APD) is an open data portal for crime events from 1997 to 2025 with timestamps and address-level locations. It contains historical crime data for the city of Atlanta. Freely available.

Beijing Traffic
This dataset contains traffic speeds at 5-minute granularity for 3126 roadway segments in Beijing between 2022/05/12 and 2022/07/25 and contains 3126 streets.
https://github.com/deepkashiwa20/Urban_Concept_Drift/tree/main

Chicago Crimes
This events dataset contains the Chicago crimes from 2001 to the present, except for the past 7 days, with date, longitude, and latitude. Freely available.
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data

City Bike Trips
This events dataset includes records of city bike trips from 2013 to 2025, providing information on trip start and end times, as well as the corresponding longitude and latitude coordinates. Freely available.
.jpg)
Correlates of War (COW)
The COW project provides event datasets for conflicts, wars, and more. It contains dates, longitude, and latitude. COW data spans several centuries. Freely available.

Crimes in Vancouver
Includes individual crime events with time (minute, hour, day, month, year) and latitude and longitude from 2003 to 2017 with 530 thousand records. The data was originally retrieved from city of Vancouver open data portal.

Global Animal Disease Information System
Supports access to global disease information in time, longitude, and latitude coordinates for high-impact animal diseases. Requires free registration.
.jpg)
Global Database of Events, Language, and Tone (GDELT)
GDELT provides world data with longitudes and latitudes and dates on events such as protests and violent attacks with context data such as people's emotions from the news. It has historical data dating to 1971 and efforts to further extend it further back to 1800. It gets updated every 15 minutes. Freely available.
.jpg)
Global Terrorism Database (GTD)
GTD provides event datasets for terrorism with dates and city-level locations on terrorism from 1970 to 2020 with over 200 thousand records. Requires free registration.

Google Health COVID-19 Open Data Repository
The Google Health COVID-19 Open Data Repository is a comprehensive collection of up-to-date COVID-19-related information.
.jpg)
Gun Violence Archive (GVA)
GVA provides event datasets for gun violence for the USA with dates, city-level locations, and addresses from 2013 to the present, and it is updated on a regular basis. Freely available.
.jpg)
Humanitarian Data Exchange (HDX)
HDX provides a wide range of humanitarian datasets with spatial and temporal information, including crisis data of varying historical depth. Most datasets are freely available, though some require access requests.

Indonesia Volcanoes
This events dataset contains the date, longitude, and latitude from 1300 to 2021 for around 200 records of volcanoes in Indonesia among relevant information such as impacts like the number of houses destroyed.
https://www.kaggle.com/datasets/corneliuskristianto/volcano-events-in-indonesia-13002021
.jpg)
Mass Mobilization in Autocracies Database (MMAD)
MMAD provides event data on protests on a city-level location on a daily basis from 2003 to 2012. Freely available.

Motor Vehicle Collisions - Crashes
Individual events with time, latitude, longitude and marks. Freely available.
https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95/about_data

NYC Complaints
This events dataset includes all crimes reported to the New York City Police Department (NYPD) by date, longitude, and latitude from 2016 to 2025. Freely available.

New York City Shootings
This events dataset contains the date, longitude, and latitude for New York City shootings from 2006 to 2019 for around 22 thousand records, including relevant information about the incident, such as the shooter (age, gender, race). The data was originally retrieved from NYC OpenData.
https://www.kaggle.com/datasets/thaddeussegura/new-york-city-shooting-dataset
.jpg)
Northern California Earthquake Data Center (NCEDC)
NCEDC provides dates, longitude, latitude, and other relevant information about the earthquake events in central and northern California. Some datasets may require special requests for access based on specific terms.
.jpg)
Social Conflict Analysis Database (SCAD)
SCAD provides event datasets for protests, riots, and other social conflicts with date, longitude, and latitude from 1990 to 2017 covering all of Africa, Mexico, Central America, and the Caribbean. Freely available.
https://www.strausscenter.org/ccaps-research-areas/social-conflict/database/
.jpg)
The Uppsala Conflict Data Program (UCDP)
UCDP provides multiple event datasets for armed conflict dates, longitude, and latitude. (UCDP) provides data on organized violence with a history of almost 40 years. Freely available.

UK Biobank
The UK Biobank is a comprehensive biomedical research resource containing health information from 500,000 participants. Requires a special request per dataset and is typically granted to approved researchers after an application and approval process.

USA Gun Violence
This events dataset contains the date and country-level locations from 2013 to 2022 for around 472 thousand records for gun violence incidents and mass shootings in the USA. This data was originally retrieved from the Gun Violence Archive organization.
https://www.kaggle.com/datasets/emmanuelfwerr/gun-violence-incidents-in-the-usa

USA Population
the data was originally retrieved from the U.S. Environmental Protection Agency (EPA).
https://www.kaggle.com/datasets/mohamedmagdy11/usa-county-population-total-20162021

USA Traffic Congestion
This dataset contains events of traffic congestion with the context of weather severity; it contains the date, longitude, and latitude from 2016 to 2021 for around 3 million records of traffic congestion in the USA, among relevant information such as the severity of the congestion.
.jpg)
United States Geological Survey (USGS)
The USGS provides information on natural hazard events that occur daily, such as earthquakes and volcanoes. Freely available.

World Airplane Crashes
This events dataset contains the date and country-level locations of the incident for around 5 thousand records from 1908 to 2020, along with relevant information such as casualties and the operator (whether private or something else).

World Earthquakes
This events dataset contains the date, longitude, and latitude from 1900 to 2023 for around 37 thousand records for the world-recorded earthquakes among relevant information related to the earthquake, such as the depth of the earthquake. This dataset contains nulls in some features but not in longitude, latitude, or date.
https://www.kaggle.com/datasets/jahaidulislam/significant-earthquake-dataset-1900-2023

World Fireballs
This events dataset contains the date, longitude, and latitude from 1988 to 2017 for around 800 records for exceptionally bright fireballs that are spectacular enough to be seen over a very wide area in the sky. Only the brightest fireballs are recorded.

World Natural Disasters
This events dataset contains dates, longitudes, and latitudes from 1900 to 2021 for around 16 thousand records of various types of natural disasters in the world, such as droughts, volcanoes, and earthquakes, among relevant information such as the number of people injured, total deaths, and damage costs.
https://www.kaggle.com/datasets/brsdincer/all-natural-disasters-19002021-eosdis

World Tsunamis
This events dataset contains the date, longitude, and latitude from 1900 to 2023 for around 2 thousand records of tsunamis around the world. It also includes relevant information such as the magnitude of each event and the impact of the tsunami, such as the number of deaths and injuries.
https://www.kaggle.com/datasets/harshalhonde/tsunami-events-dataset-1900-present

World Wildfires
This events and context dataset contains date, longitude, and latitude from 2000 to 2020 for measuring the light brightness and whether a fire happened or not. Data originally retrieved from NASA's FIRM Earthdata.
https://www.kaggle.com/datasets/ransakaravihara/h2oai-wildfire-bushfire-challenge-dataset