TY - GEN
T1 - Aligning Geo-Tagged Clip Representations and Satellite Imagery for Few-Shot Land Use Classification
AU - Jain, Pallavi
AU - Marcos, Diego
AU - Ienco, Dino
AU - Interdonato, Roberto
AU - Dhakal, Aayush
AU - Jacobs, Nathan
AU - Berchoux, Tristan
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - A major difference between ground-level and satellite imagery of landscapes lies in their semantic granularity: ground-level images tend to offer details on objects and human activities, while satellite images provide broader geographic context but, typically, with coarser semantics. This study aims to leverage this complementary information by integrating fine-grained insights from a ground-level view into the analysis of satellite image data. To achieve this integration, we propose to align a satellite image representation with co-located geo-tagged ground-level image CLIP representations. This method focuses on enriching satellite image visual features by leveraging the inherent visual characteristics found in ground-level images as a reference in a contrastive manner, without relying on additional textual information to guide the learning process. We evaluate the quality of the learned representations on the EuroSAT benchmark in various few-shot settings.
AB - A major difference between ground-level and satellite imagery of landscapes lies in their semantic granularity: ground-level images tend to offer details on objects and human activities, while satellite images provide broader geographic context but, typically, with coarser semantics. This study aims to leverage this complementary information by integrating fine-grained insights from a ground-level view into the analysis of satellite image data. To achieve this integration, we propose to align a satellite image representation with co-located geo-tagged ground-level image CLIP representations. This method focuses on enriching satellite image visual features by leveraging the inherent visual characteristics found in ground-level images as a reference in a contrastive manner, without relying on additional textual information to guide the learning process. We evaluate the quality of the learned representations on the EuroSAT benchmark in various few-shot settings.
KW - computer vision
KW - contrastive learning
KW - land use
KW - satellite images
UR - https://www.scopus.com/pages/publications/85204914326
U2 - 10.1109/IGARSS53475.2024.10641235
DO - 10.1109/IGARSS53475.2024.10641235
M3 - Conference contribution
AN - SCOPUS:85204914326
T3 - International Geoscience and Remote Sensing Symposium (IGARSS)
SP - 319
EP - 323
BT - IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2024
Y2 - 7 July 2024 through 12 July 2024
ER -