Astronomy
&Astrophysics

A&A, 682, A4 (2024)
https://doi.org/10.1051/0004-6361/202347649
© The Authors 2024

Shedding light on low-surface-brightness galaxies in dark energy
surveys with transformer models⋆

H. Thuruthipilly1 , Junais1 , A. Pollo1,2, U. Sureshkumar2,3 , M. Grespan1, P. Sawant1,
K. Małek1 , and A. Zadrozny1

1 National Centre for Nuclear Research, Warsaw, Poland
e-mail: hareesh.thuruthipilly@ncbj.gov.pl; junais@ncbj.gov.pl; agnieszka.pollo@ncbj.gov.pl

2 Jagiellonian University, Kraków, Poland
3 Wits Centre for Astrophysics, School of Physics, University of the Witwatersrand, Johannesburg, South Africa

Received 3 August 2023 / Accepted 17 October 2023

ABSTRACT

Context. Low-surface-brightness galaxies (LSBGs), which are defined as galaxies that are fainter than the night sky, play a crucial
role in our understanding of galaxy evolution and in cosmological models. Upcoming large-scale surveys, such as Rubin Observatory
Legacy Survey of Space and Time and Euclid, are expected to observe billions of astronomical objects. In this context, using semi-
automatic methods to identify LSBGs would be a highly challenging and time-consuming process, and automated or machine learning-
based methods are needed to overcome this challenge.
Aims. We study the use of transformer models in separating LSBGs from artefacts in the data from the Dark Energy Survey (DES)
Data Release 1. Using the transformer models, we then search for new LSBGs from the DES that the previous searches may have
missed. Properties of the newly found LSBGs are investigated, along with an analysis of the properties of the total LSBG sample in
DES.
Methods. We created eight different transformer models and used an ensemble of these eight models to identify LSBGs. This was
followed by a single-component Sérsic model fit and a final visual inspection to filter out false positives.
Results. Transformer models achieved an accuracy of ∼94% in separating the LSBGs from artefacts. In addition, we identified 4083
new LSBGs in DES, adding an additional ∼17% to the LSBGs already known in DES. This also increased the number density of
LSBGs in DES to 5.5 deg−2. The new LSBG sample consists of mainly blue and compact galaxies. We performed a clustering analysis
of the LSBGs in DES using an angular two-point auto-correlation function and found that LSBGs cluster more strongly than their
high-surface-brightness counterparts. This effect is driven by the red LSBG. We associated 1310 LSBGs with galaxy clusters and
identified 317 ultradiffuse galaxies among them. We found that these cluster LSBGs are getting bluer and larger in size towards the
edge of the clusters when compared with those in the centre.
Conclusions. Transformer models have the potential to be equivalent to convolutional neural networks as state-of-the-art algorithms in
analysing astronomical data. The significant number of LSBGs identified from the same dataset using a different algorithm highlights
the substantial impact of our methodology on our capacity to discover LSBGs. The reported number density of LSBGs is only a lower
estimate and can be expected to increase with the advent of surveys with better image quality and more advanced methodologies.

Key words. methods: data analysis – techniques: image processing – Galaxy formation – galaxies: clusters: general –
galaxies: evolution

1. Introduction

Low-surface-brightness galaxies (LSBGs) are most often
defined as galaxies with a fainter central surface brightness than
the night sky or galaxies with a B-band central surface brightness
µ0(B) of below a certain threshold. In the literature, this thresh-
old value varies from µ0(B) ≥ 23.0 mag arcsec−2 (Bothun et al.
1997) to µ0(B) ≥ 22.0 mag arcsec−2 (Burkholder et al. 2001).

It is estimated that the LSBGs only contribute a few percent
(<10%) of the local luminosity and of the stellar mass density
of the observable Universe (Bernstein et al. 1995; Driver 1999;
Hayward et al. 2005; Martin et al. 2019). However, LSBGs are
considered to account for a significant fraction (30% ∼ 60%) of
the total number density of galaxies (McGaugh 1996; Bothun
et al. 1997; O’Neil & Bothun 2000; Haberzettl et al. 2007;
Martin et al. 2019), and as much as 15% of the dynamical mass
⋆ LSBG catalog is available at the CDS via anonymous ftp to
cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.
u-strasbg.fr/viz-bin/cat/J/A+A/682/A4

content of the Universe (Driver 1999; Minchin et al. 2004).
These numbers imply that LSBGs can contribute significantly
to our understanding of the physics of galaxy evolution and cos-
mological models. However, as their name indicates, LSBGs are
very faint systems, and due to the observational challenges in
detecting them, LSBGs remain a mostly unexplored realm.

In recent years, despite observational challenges, advances in
digital imaging have improved our ability to detect LSBGs. The
first and largest LSBG to be identified and verified is Malin 1,
serendipitously discovered by Bothun et al. (1987) during a
survey of galaxies of low surface brightness in the Virgo clus-
ter. Notably, Malin 1 is the largest spiral galaxy known today
(e.g., Impey et al. 1988; Junais et al. 2020; Galaz et al. 2022).
Current searches for LSBGs have shown that they exhibit a
wide range of physical sizes (Greene et al. 2022) and can be
found in various types of environments: for example, from satel-
lites of local nearby galaxies (Danieli et al. 2017; Cohen et al.
2018), ultrafaint satellites of the Milky Way (McConnachie 2012;
Simon 2019), galaxies found in the field (Leisman et al. 2017;

A4, page 1 of 23
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0),

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

https://www.aanda.org
https://doi.org/10.1051/0004-6361/202347649
https://orcid.org/0000-0002-7413-0647
https://orcid.org/0000-0002-7016-4532
https://orcid.org/0000-0002-2210-0681
https://orcid.org/0000-0003-3080-9778
mailto:hareesh.thuruthipilly@ncbj.gov.pl
mailto:junais@ncbj.gov.pl
mailto:agnieszka.pollo@ncbj.gov.pl
http://cdsarc.u-strasbg.fr
ftp://130.79.128.5
http://cdsarc.u-strasbg.fr/viz-bin/cat/J/A+A/682/A4
http://cdsarc.u-strasbg.fr/viz-bin/cat/J/A+A/682/A4
https://www.edpsciences.org/en/
https://creativecommons.org/licenses/by/4.0
https://www.aanda.org/subscribe-to-open-faqs
mailto:subscribers@edpsciences.org


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Prole et al. 2021), to members of massive galaxy clusters like
Virgo (Mihos et al. 2015, 2017; Junais et al. 2022) and Coma
(van Dokkum et al. 2015; Koda et al. 2015).

LSBGs have also been separated into several subclasses
based on their physical size, surface brightness, and gas content.
Ultradiffuse galaxies (UDGs) represent a subclass of LSBGs
characterised by their considerable size, which is comparable
to that of Milky Way-like galaxies, yet they exhibit very faint
luminosities akin to dwarf galaxies. Although the term ‘UDG’
was coined by van Dokkum et al. (2015), such galaxies were
identified in several earlier studies in the literature (Sandage
& Binggeli 1984; McGaugh & Bothun 1994; Dalcanton et al.
1997; Conselice et al. 2003a). Similarly, giant LSBGs (GLS-
BGs) form another subclass of LSBGs that are extremely gas-
rich (MHI > 1010 M⊙ ), faint, and extended (Sprayberry et al.
1995; Saburova et al. 2023). The formation and evolution of
extreme classes such as UDGs and GLSBGs are still debated
(Amorisco & Loeb 2016; Di Cintio et al. 2017; Saburova et al.
2021; Benavides et al. 2023; Laudato & Salzano 2023).

In order to understand the formation mechanism(s) giving
rise to the various types LSBGs in different environments, in it
crucial to study them extensively across different environments
(galaxy clusters vs field) over a large area of the sky. Recently,
Greco et al. (2018) detected 781 LSBGs in the Hyper Suprime-
Cam Subaru Strategic Program (HSC SSP) in a blind search
covering around 200 deg2 of the sky from the Wide layer of the
HSC SSP. Similarly, in a recent study, Tanoglidis et al. (2021b)
used a support vector machine (SVM) and visual inspection to
analyse the first three years of data from the Dark Energy Survey
(DES). These authors identified more than 20 000 LSBGs, thus
building the largest LSBG catalogue available.

A common feature observed in both of these untargeted
searches for LSBGs was the significant presence of low-surface-
brightness artefacts. As pointed out in Tanoglidis et al. (2021b),
these artefacts predominantly consist of diffuse light from
nearby bright objects, galactic cirrus, star-forming tails of spi-
ral arms, and tidal streams. These artefacts typically pass the
simple selection cuts based on photometric measurements and
often make up the majority of the LSBG candidate sample.
These contaminants need to be removed, which is often accom-
plished using semi-automated methods, which has a low success
rate, and visual inspection, which is more precise but time-
consuming.

For example, in HSC SSP, Greco et al. (2018) applied
selection cuts on the photometric measurements from
SourceExtractor (Bertin & Arnouts 1996), which led to
the selection of 20 838 LSBG candidates. Using a galaxy mod-
elling pipeline based on imfit (Erwin 2015), the sample size
was subsequently reduced to 1521. However, after visual inspec-
tion, only 781 candidates were considered confident LSBGs,
which is around 4% of the preliminary candidate sample and
50% of the sample selected by the pipeline. Similarly, in DES,
Tanoglidis et al. (2021b) shortlisted 419 895 LSBG candidates
using the selection cuts on SourceExtractor photometric
measurements. After applying a feature-based machine learning
(ML) classification (SVM) on the photometric measurements,
the candidate sample was further reduced to 44 979 objects.
However, a significant number of false positives still remained,
and only 23 790 were later classified as confident LSBGs.
Therefore, these numbers indicate that the occurrence of LSBGs
in these methods is roughly 5% for the initial selection and 50%
for the subsequent selection.

Upcoming large-scale surveys, such as Legacy Survey
of Space and Time (LSST; Ivezić et al. 2019) and Euclid

(Euclid Collaboration 2022), are expected to observe billions of
astronomical objects. In this scenario, it would be impractical
to rely solely on photometric selection cuts or semi-automated
methods – such as galaxy model fitting – to identify LSBGs con-
fidently. Furthermore, the accuracy of the classification method-
ology in distinguishing between LSBGs and artefacts must be
improved in order to achieve meaningful results. This situa-
tion therefore demands more effective and efficient automation
methodologies in searches for LSBGs.

Recently, advancements in deep learning have been widely
applied in astronomy, opening up a plethora of opportunities.
Particularly for analysing astronomical images, convolutional
neural networks (CNNs) have emerged as a state-of-the-art tech-
nique. For example, CNNs have been used for galaxy classifica-
tion (Pérez-Carrasco et al. 2019), galaxy merger identification
(Pearson et al. 2022), supernova classification (Cabrera-Vives
et al. 2017), and finding strong gravitational lenses (Schaefer
et al. 2018; Davies et al. 2019; Rojas et al. 2022). One of the fas-
cinating features of CNNs is their ability to directly process the
image as input and learn the image features, making them one of
the most popular and robust architectures in use today. Generally,
the learning capacity of a neural network increases with the num-
ber of layers in the network. The first layers of the network learn
the low-level features, and the last layers learn more complex
features (Russakovsky et al. 2015; Simonyan & Zisserman 2015).

One of the main requirements for creating a trained CNN
is a sufficiently large training dataset that can generalise the
features of the data being analysed. Recently, Tanoglidis et al.
(2021a) used a catalogue of over 20 000 LSBGs from DES to
distinguish LSBGs from artefacts using a CNN for the first
time and achieved an accuracy of 92% and a true positive rate
of 94%.

While CNNs have been the dominant choice for analysing
image data in astronomy, the current state-of-the-art models for
computer vision are transformers. Transformers were initially
introduced in natural language processing (NLP) as an attention-
based model (Vaswani et al. 2017). The fundamental concept
behind the transformer architecture is the attention mechanism,
which has also found a broad range of applications in machine
learning (Zhang et al. 2018; Fu et al. 2019; Parmar et al. 2019;
Zhao et al. 2020; Tan et al. 2021). In the case of NLP, atten-
tion calculates the correlation of different positions of a single
sequence to calculate a representation of the sequence. Later the
idea was adapted to computer vision and has been used to pro-
duce state-of-the-art models for various image processing tasks,
such as image classification (Wortsman et al. 2022) and image
segmentation (Chen et al. 2023).

Generally, two categories of transformers are present in the
literature. The first type integrates both CNNs and attention to
perform the analysis. An example of this type is the detection
transformer (DETR) originally proposed for end-to-end object
detection by Carion et al. (2020). The key idea behind using
CNNs and transformers together is to leverage the strengths
of both architectures. CNNs excel at local feature extraction,
capturing low-level details and spatial hierarchies, while atten-
tion layers excel at modelling global context and long-range
dependencies. The second class of transformers contains the
models that do not use a CNN and operate entirely based
on self-attention mechanisms. An example of this type is the
Vision Transformer (ViT) proposed for object classification by
Dosovitskiy et al. (2021). ViTs have demonstrated remarkable
performance in image classification tasks and have surpassed the
accuracy of CNN-based models on various benchmark datasets
(Dosovitskiy et al. 2021; Yu et al. 2022; Wortsman et al. 2022).

A4, page 2 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Although transformers were introduced very recently in
astronomy, they have already found a wide variety of appli-
cations. For example, the transformer models have been used
to detect and analyse strong gravitational lensing systems
(Thuruthipilly et al. 2022b,a; Huang et al. 2023; Jia et al.
2023), to represent light curves that can be used for classifi-
cation or regression (Allam & McEwen 2021), and to classify
multi-band light curves of different supernovae (SN) types
(Pimentel et al. 2023).

In this paper, we explore the capability of transformers to
classify LSBGs from artefacts in DES and compare the perfor-
mance of transformers with the CNNs presented in Tanoglidis
et al. (2021a). We also use the transformer models to look for new
LSBGs that previous searches may have missed. For comparison
purposes, throughout this work, we follow the LSBG definition
from Tanoglidis et al. (2021b), which is based on the g-band
mean surface brightness (µ̄eff) and the half-light radii (r1/2). We
consider LSBGs as galaxies with µ̄eff > 24.2 mag arcsec−2 and
r1/2 > 2.5′′.

The paper is organised as follows. Section 2 discusses the
data we used to train our models and to look for new LSBGs.
Section 3 provides a brief overview of the methodology used in
our study, including the model architecture, information on how
the models were trained, and details of the visual inspection. The
results of our analysis are presented in Sect. 4. A detailed dis-
cussion of our results and the properties of the newly identified
LSBGs are presented in Sects. 5 and 6, respectively. A further
analysis of the clustering of LSBGs is presented in Sect. 7 and
a detailed discussion on the UDGs identified as a subsample of
LSBGs is presented in Sect. 8. Section 9 concludes our anal-
ysis by highlighting the significance of LSBGs, the impact of
our methodology on our capacity to discover LSBGs, and future
prospects with regard to the upcoming survey LSST.

2. Data

2.1. Dark Energy Survey

The Dark Energy Survey (DES; Abbott et al. 2018, 2021) is a
six-year observing program (2013–2019) covering ∼5000 deg2

of the southern Galactic cap in the optical and near-infrared
regime using the Dark Energy Camera (DECam) on the 4 m
Blanco Telescope at the Cerro Tololo Inter-American Obser-
vatory (CTIO). The DECam focal plane comprises 62 2k ×
4k charge-coupled devices (CCDs) dedicated to science imag-
ing and 12 2k × 2k CCDs for guiding, focus, and alignment.
The DECam field of view covers 3 deg2 with a central pixel
scale of 0.263 arcsec pixel−1 (Flaugher et al. 2015). To address
the gaps between CCDs, DES uses a dithered exposure pattern
(Neilsen et al. 2019) and combines the resulting individual expo-
sures to form co-added images, which have dimensions of 0.73 ×
0.73 degrees (Morganson et al. 2018). The DES has observed the
sky in grizY photometric bands with approximately ten overlap-
ping dithered exposures in each filter (90 s in griz-bands and 45 s
in Y-band).

2.2. DES DR1 and the gold catalogue

In this work, we use the image data from the Dark Energy
Survey Data Release 1 (DES DR1; Abbott et al. 2018) and the
DES Y3 Gold catalogue (DES Y3_gold_2_2.1) obtained from
the first three years of the DES observations (Sevilla-Noarbe
et al. 2021). The DES DR1 comprises optical and near-infrared
imaging captured over 345 different nights between August 2013

and February 2016. The median 3σ surface brightness limits
of the g, r, and i-bands of DES DR1 are 28.26, 27.86, and
27.37 mag arcsec−2, respectively (Tanoglidis et al. 2021b). It is
worth mentioning that the DES source detection pipeline has
not been optimised for detecting large, low-surface-brightness
objects (Morganson et al. 2018). Therefore, the above-mentioned
surface brightness values can be considered as the limits for
detecting faint objects in each band. The gold catalogue shares
the same single-image processing, image co-addition, and object
detection as the DES DR1. The objects in the gold catalogue
were detected using SourceExtractor (Bertin & Arnouts
1996) and have undergone selection cuts on minimal image depth
and quality, additional calibration, and deblending. The median
coadd magnitude limit of the DES Y3 Gold catalogue at a signal-
to-noise ratio (S/N) = 10 is g = 24.3 mag, r = 24.0 mag, and
i = 23.3 mag (Sevilla-Noarbe et al. 2021). The DES Y3 Gold cat-
alogue contains around 319 million astronomical objects, which
we used for searching LSBGs in DES. For a detailed review and
discussion of the data from the DES, please refer to Abbott et al.
(2018) and Sevilla-Noarbe et al. (2021).

We reduced the number of objects processed in our
study using preselections, in a similar way to Greco et al.
(2018) and Tanoglidis et al. (2021b). We first removed objects
classified as point-like objects in the DES Y3 Gold cata-
logue based on the i-band SourceExtractor SPREAD_MODEL
parameter and EXTENDED_CLASS_COADD, as described in
Tanoglidis et al. (2021b). In addition, we constrained the
g-band half-light radius (FLUX_RADIUS_G) and surface bright-
ness (MUE_MEAN_MODEL_G) within the range of 2.5′′ < r1/2 <
20′′ and 24.2 < µ̄eff < 28.8 mag arcsec−2, respectively. Further-
more, we also limited our sample to objects with colours (using
the MAG_AUTO magnitudes) in the range:

−0.1 < g − i < 1.4, (1)
(g − r) > 0.7 × (g − i) − 0.4, (2)
(g − r) < 0.7 × (g − i) + 0.4. (3)

These colour cuts are based on Greco et al. (2018) and Tanoglidis
et al. (2021b). As mentioned by Greco et al. (2018), these colour
requirements will remove the spurious detections due to optical
artefacts detected in all bands and blends of high-redshift galax-
ies. Finally, we also restricted the axis ratio (B_IMAGE/A_IMAGE)
of each object to be greater than 0.3 in order to remove artefacts
such as the highly elliptical diffraction spikes. Our complete
selection criteria were based on the selection criteria presented
in Appendix B of Tanoglidis et al. (2021b). After the preliminary
selections using the SourceExtractor parameters from the
DES Y3 Gold catalogue, our sample contains 419 784 objects.

2.3. Training data

All of the trained, validated, and tested models in this study used
the labelled dataset of LSBGs and artefacts identified from DES
by Tanoglidis et al. (2021b). Below, we briefly summarise the
primary steps taken by Tanoglidis et al. (2021b) in construct-
ing the LSBG catalogue. (i) The SourceExtractor parameters
from the DES Y3 Gold catalogue presented by Sevilla-Noarbe
et al. (2021) were used to create the initial selection cuts, as
discussed in Sect. 2.2. (ii) The candidate sample was further
reduced using an SVM to classify artefacts and LSBGs. The
SVM was trained with a manually labelled set of approximately
8000 objects (640 LSBGs) and using the SourceExtractor
parameters as features for learning. (iii) From the candidate
sample generated through SVM, over 20 000 artefacts were

A4, page 3 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

(a) (b)

Fig. 1. Four examples of LSBGs (a) and artefacts (b) used in the training data. Each image of the LSBG and artefact corresponds to a 67.32′′ ×
67.32′′ region of the sky. Images were generated by combining the g, r, and z bands using APLpy package (Robitaille & Bressert 2012).

excluded upon visual inspection. Most of the rejected objects
that had passed the SVM feature-based selection were found to
be astronomical artefacts (such as galactic cirrus, star-forming
extensions of spiral arms, and tidal streams) rather than instru-
mental artefacts (such as scattered light emitted by nearby bright
objects) during visual inspection. (iv) Objects that passed the
visual inspection were subjected to Sérsic model fitting and
Galactic extinction correction. Following this, new selection cuts
were applied to the updated parameters, and the final LSBG
catalogue containing 23 790 LSBGs was created.

For training our classification models, we selected LSBGs
from the LSBG catalogue as the positive class (label – 1) and the
objects rejected in the third step (visual inspection) by Tanoglidis
et al. (2021b) as the negative class (label – 0). The catalogues
for the positive and negative classes are publicly available, and
we used these catalogues to create our training dataset1. The
selection of the artefacts and LSBGs for training was random,
and after selection, we had 18 474 artefacts and 23 103 LSBGs.
However, when we further inspected these LSBGs and artefacts,
we found that there were 797 objects belonging to both classes.
After conducting a thorough visual examination, we identified
that these are, in fact, LSBGs that had been mistakenly catego-
rized as artefacts in the publicly accessible artefact catalogue.
However, we avoided these 797 objects from our training set in
order to avoid contamination and ambiguity among classes dur-
ing training. We generated multi-band cutouts for each object
in the flexible image transport system (FITS) format using the
cutout service provided in the DES public data archive. Each
cutout corresponds to a 67.32′′ × 67.32′′ (256 × 256 pixels)
region of the sky and is centred at the coordinates of the object
(LSBG or artefact). We resized the cutouts from their initial size
to 64 × 64 pixels to reduce computational costs. The cutouts of

1 https://github.com/dtanoglidis/DeepShadows/blob/
main/Datasets

g, r, and z-bands were stacked together to create the dataset for
training the models. Examples of LSBGs and artefacts used for
training the model are shown in Fig. 1. Our training catalogue
contains 39 983 objects, of which 22 306 are LSBGs and 17 677
are artefacts. Before training, we randomly split the full sample
into a training set, a validation set, and a test set, consisting of
35 000, 2500, and 2483 objects, respectively.

3. Methodology

3.1. Transformers and attention

As mentioned in Sect. 1, the central idea behind every trans-
former architecture is attention. Before applying attention, the
input sequence is transformed into three vectors in multi-head
attention: query (Q), key (K), and value (V). The dot product
between the query and key vectors is used to obtain attention
scores. The attention scores are then used to weight the value
vector, producing a context vector that is a weighted sum of the
value vectors. For our work, the vectors (Q, V, and K) are iden-
tical, and this method is termed self-attention. This approach
enables the transformer to model long-range dependencies and
capture complex patterns in the input sequence. Mathematically,
the attention function is defined as

Attention(Q,K,V) = softmax
(

QKT

√
dk

)
V, (4)

where Q,K, and V are the query, key, and value vectors and
dk is the dimension of the vector K. The softmax function, by
definition, is the normalised exponential function that takes an
input vector of K real numbers and normalises it into a proba-
bility distribution consisting of K probabilities proportional to
the exponential of the input numbers. The building blocks of
our transformer models are layers applying self-attention and

A4, page 4 of 23

https://github.com/dtanoglidis/DeepShadows/blob/main/Datasets
https://github.com/dtanoglidis/DeepShadows/blob/main/Datasets


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Fig. 2. Scheme of the general architecture of the detection transformer
(LSBG DETR) taken from Thuruthipilly et al. (2022b). The extracted
features of the input image by the CNN backbone are combined with
positional encoding and are passed on to the encoder layer to assign
attention scores to each feature. The weighted features are then passed
to the feed-forward neural network (FFN) to predict the probability.

are termed transformer encoders. Please refer to Vaswani et al.
(2017) for a detailed discussion on transformer encoders.

3.2. LSBG detection transformer (LSBG DETR)

We implemented four transformer models that use a CNN back-
bone and self-attention layers to classify the labels, which we call
LSBG detection transformer (LSBG DETR) models in general.
The LSBG DETR architecture is inspired by transformer mod-
els from Thuruthipilly et al. (2022b), which were used to explore
diverse structures and hyperparameters in order to optimise clas-
sification performance. Each individual model is followed by a
number indicating its chronological order of creation. The LSBG
DETR models have an eight-layer CNN backbone to extract fea-
ture maps from the input image. The feature maps produced by
the CNN backbone are then passed on to the transformer encoder
layer to create an attention map that helps the transformer com-
ponent focus on the most relevant features for classification. The
transformer encoder layer has subcomponents known as heads,
which, in parallel, apply the self-attention to the input vector
split into smaller parts. Output generated by the transformer
encoder is then passed on to a feed-forward neural network
(FFN) layer to predict the probability that the input is an LSBG.
Another point to be noted is that the transformers are permuta-
tion invariant; we therefore add positional encoding to address
this issue and retain the positional information of features. For
the LSBG DETR, we used fixed positional encoding defined by
the function

PE(pos,2i) = sin
(
pos/12800

2i
dmodel

)
, (5)

PE(pos,2i+1) = cos
(
pos/12800

2i
dmodel

)
, (6)

where pos is the position, i is the dimension of the positional
encoding vector, and dmodel is the dimension of the input feature
vector. We follow the positional encoding defined in Vaswani
et al. (2017), and for a detailed discussion on positional encod-
ing and its importance, we refer to Liutkus et al. (2021); Su et al.
(2021); Chen et al. (2021). The general structure of the LSBG
DETR is shown in Fig. 2. For a detailed discussion on the trans-
former models similar to LSBG DETR, we refer to Carion et al.
(2020) and Thuruthipilly et al. (2022b).

3.3. LSBG Vision

We created four transformer models similar to the ViT intro-
duced by Google Brain (Dosovitskiy et al. 2021), which we call
LSBG vision transformers (LSBG ViT) in general. Similar to

LSBG DETR models, each individual model is followed by a
number indicating its chronological order of creation. One of the
main features of LSBG ViT models is that they do not use any
convolutional layers to process the image, unlike LSBG DETR.
In the ViT architecture, the input image is divided into fixed-size
patches, which are flattened into a sequence of 1D vectors. As the
transformers are permutation invariant, the positional embed-
ding is added to the patch embedding before they are fed into
the transformer layers. The positional embedding is typically a
fixed-length vector that is added to the patch embedding, and
is learned during training along with the other model parame-
ters. The combined 1D sequence is then passed through a stack
of transformer layers. An additional learnable (class) embedding
is affixed to the input sequence, which encodes the class of the
input image. This class embedding for each input is calculated by
applying self-attention to positionally embedded image patches.
Output from the class embedding is passed on to a multi-layer
perceptron (MLP) head to predict the output class. A schematic
diagram of the vision transformer is shown in Fig. 3. For a
detailed discussion on ViT models, please refer to Dosovitskiy
et al. (2021).

3.4. Training

All of the LSBG DETR and LSBG ViT models were trained
with an initial learning rate of α = 10−4. We used the expo-
nential linear unit (ELU) function as the activation function
for all the layers in these models (Clevert et al. 2016). We
initialise the weights of our model with the Xavier uniform
initialiser (Glorot & Bengio 2010), and all layers are trained
from scratch by the ADAM optimiser with the default expo-
nential decay rates (Kingma & Ba 2015). We used the early
stopping callback from Keras2 to monitor the validation loss
of the model and stop training once the loss was converged.
The models LSBG DETR 1 and 4 were each given 8 heads
and were trained for 150 and 93 epochs, respectively. Similarly,
the LSBG DETR 2 and 3 were given 12 heads and were
trained for 134 and 105 epochs, respectively. Regarding the
LSBGS ViT models, the hyperparameters we varied were the
size of the image patches, the number of heads, and the number
of transformer encoder layers. The hyperparameters for the all
the LSBG DETR models were customised based on the results
from Thuruthipilly et al. (2022b), who extensively investigated
the hyperparameter configurations of DETR models. When it
comes to the LSBG ViT models, we maintained the hyperparam-
eters from the LSBG DETR models, such as learning rate, batch
size – except for adjustments in image patch size –, the count
of attention heads, and the number of transformers encoder
layers. We varied these parameters and the four best models are
presented in Table 1. In the spirit of reproducible research, our
code for LSBG DETR and LSBG ViT is publicly available3.

3.5. Ensemble models

We took two classes of transformers (LSBG DETR and LSBG
ViT) with four models in each class, and used an ensemble model
of these four models for each class to look for new LSBGs from
DES DR1. Ensemble models in deep learning refer to combin-
ing multiple models to create a single model that performs better
than the individual models. The idea behind ensemble models
is to reduce the generalisation error and increase the stability of

2 https://keras.io/api/callbacks
3 https://github.com/hareesht23/

A4, page 5 of 23

https://keras.io/api/callbacks
https://github.com/hareesht23/


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Input
Image

Linear Projection of Flattened Patches 

1 2 3 4 6 70 * 8 9

Transformer Encoder Layer

Image
Patches 

Patch + Positional
Embeddings

5

* Extra Learnable
Classification

Token

Output MLP Head

Fig. 3. Scheme of the general architecture of the LSBG ViT. The input image is split into small patches and flattened into a sequence of 1D
vectors and combined with positional encoding. The numbered circular patches represent the position encoding, and the counterpart represents the
flattened 1D sequence of the image patches. The combined 1D sequence is passed to the transformer layers. The extra learnable class embedding
encodes the class of the input image after being updated by self-attention and passes it on to an MLP head to predict the output.

Table 1. Name of the model, size of the image patches (s), number of
heads (h), number of transformer encoder layers (T), and the number of
epochs taken to train the four vision models (e) in chronological order
of creation.

Model name s h T e

LSBG VISION 1 4 12 4 55
LSBG VISION 2 4 12 8 55
LSBG VISION 3 6 12 4 67
LSBG VISION 4 6 16 8 67

the system by taking into account multiple sources of informa-
tion. Various kinds of ensemble learning exist in the literature,
and they have been found helpful in a broad range of machine
learning problems (Wang et al. 2022). For a detailed review of
ensemble methods, please refer to Domingos & Hulten (1999)
and Dietterich (2000). One of the easiest and most common
ensemble methods is model averaging. In model averaging, mul-
tiple models are trained independently on the same training data,
and the outputs of the models are averaged to make the final pre-
diction. One of the main advantages of model averaging is that it
is computationally efficient and does not require any additional
training time; it also allows the use of different types of model
architecture and can take advantage of their strengths and weak-
nesses and improve overall performance. Here we use averaging
to create the ensemble models for LSBG DETR and LSBG ViT.

3.6. Sérsic fitting

The candidates identified independently by both LSBG DETR
and LSBT ViT ensemble models were subjected to a single-
component Sérsic fitting using Galfit (Peng et al. 2002). This
was done to re-estimate the µ̄eff and r1/2 values of the LSBG can-
didates initially used for our sample selection. We employed a
single-component Sérsic-fitting method to align with the LSBG
search methodology of Tanoglidis et al. (2021b), who also used a
similar approach. However, we also note that Sérsic fitting does
not always capture the full light from a galaxy.

We used the magnitude (MAG_AUTO) and radius (FLUX_
RADIUS) values from the gold catalogue as an initial guess for the
Galfit procedure. Moreover, the Sérsic index (n) and axis ratio
(q) were initialised to be at a fixed value of 1 and were allowed
to vary only within the range of 0.2 < n < 4.0 and 0.3 < q ≤ 1.0,
respectively. A similar fitting procedure was used for both the
g-band and i-band images of our sample. After the fitting, we
excluded all the sources with poor or failed fits with either a
reduced χ2 > 3 or if their Galfit magnitude estimates diverge
from their initial MAG_AUTO values by more than 1 mag. We also
excluded the cases where the estimated n and q values do not
converge and are on the edge of the range specified above. For the
remaining galaxies, we re-applied our g-band sample selection
criteria of µ̄eff > 24.2 mag arcsec−2 and r1/2 > 2.5′′, following
Tanoglidis et al. (2021b). The µ̄eff values were calculated using
the relation given by Eq. (7):
µ̄eff = m + 2.5 × log10(2πr2

1/2), (7)

A4, page 6 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

where µ̄eff is the mean surface brightness within the effec-
tive radius, m is the total magnitude, and r1/2 is the half-light
radius in a specific band estimated from Galfit. For all our
measurements, we also applied a foreground Galactic extinc-
tion correction using the Schlegel et al. (1998) maps normalised
by Schlafly & Finkbeiner (2011) and a Fitzpatrick (1999) dust
extinction law.

3.7. Visual inspection

We considered for visual inspection only those candidates (i)
identified independently by LSBG DETR and LSBT ViT ensem-
ble models and (ii) that passed the selection criteria for being
an LSBG with the updated parameters from the Galfit. This
refined sample was subjected to visual inspection by two authors
independently. Candidates identified as LSBGs by both authors
were treated as confident LSBGs, and candidates identified as
LSBGs by only one author were reinspected together to make
a decision. As visual inspection is time-consuming, we only
resorted to this at the last step and tried to reduce the number
of candidates shortlisted for visual inspection.

To aid in visual inspection, we used two images for every
candidate. We generated images enhancing the low-surface-
brightness features using the APLpy package (Robitaille &
Bressert 2012) and images downloaded from the DESI Legacy
Imaging Surveys Sky Viewer (Dey et al. 2019). Furthermore, the
g-band Sérsic models from Galfit were also used to visually
inspect the quality of the model fitting. Each candidate was then
categorised into three classes based on the Galfitmodel fit and
the images: LSBG, non-LSBG (Artifacts), or misfitted LSBGs.
If the model of the galaxy was fitted correctly and the candidate
showed LSBG features, it was classified as an LSBG. If the can-
didate showed LSBG features but the model did not fit correctly,
we classified it as a misfitted LSBG. Finally, if the candidate
did not show LSBG features, we classified it as an artefact or
non-LSBG.

3.8. Metrics for comparing models

Here, we use accuracy, true positive rate (TPR), false positive
rate (FPR), and area under the receiver operating characteristic
(AUROC) curve as the metrics with which to compare the per-
formance of the created transformer models. The classification
accuracy of a model is defined as:

Accuracy =
TP + TN

TP + FP + TN + FN
, (8)

where TP is the number of true positives, TN is the number of
true negatives, FP is the number of false positives, and FN is the
number of false negatives. As identifying LSBGs with less con-
tamination is our primary focus, rather than the overall accuracy
of the classifier, TPR and FPR are more informative metrics for
evaluating the performance of the classifier. The TPR is the ratio
of LSBGs identified by the model to the total number of LSBGs,
which can be expressed as

TPR =
TP

TP + FN
. (9)

In the literature, sensitivity is another term used to represent
the true positive rate (TPR), and it measures how well a clas-
sifier detects positive instances (in this case, LSBGs) from the
total number of actual positive instances in a dataset. Similarly,
FPR can be considered a contamination rate because it measures

how often the classifier incorrectly classifies negative instances
as positive. The FPR is defined as

FPR =
FP

FP + TN
. (10)

All the quantities defined above are threshold dependent and
vary as a function of the chosen probability threshold. By con-
structing the receiver operating characteristic curve (ROC) and
finding the AUROC, one could define a threshold-independent
metric for comparing the models. The ROC curve is constructed
by plotting the true positive rate (TPR) and FPR as functions of
the threshold. The area under the ROC curve (AUROC) mea-
sures how well a classifier distinguishes between classes and is
a constant for the model, unlike the accuracy, which varies with
a threshold. If the AUROC is 1.0, the classifier is perfect with
TPR = 1.0 and FPR = 0.0 at all thresholds. A random classi-
fier has an AUROC ∼ 0.5, with TPR almost equal to FPR for all
thresholds.

4. Results

4.1. Model performance on the testing set

We created four models of each transformer, namely LSBG
DETR and LSBG ViT, with different hyperparameters to gen-
eralise our results for both transformers. Each model was imple-
mented as a regression model to predict the probability of an
input being an LSBG, and we set 0.5 as the threshold proba-
bility for classifying an input as an LSBG. Further, we use an
ensemble of the four models as the final model for LSBG DETR
and LSBG ViT. Table 2 describes the architecture, accuracy and
AUROC of all the models, including the ensemble models on the
test dataset, as mentioned in Sect. 2.3.

As mentioned earlier, the more insightful metrics are the
TPR and the FPR rather than overall accuracy. These metrics
can be visualised using a confusion matrix, which is shown in
Fig. 4 for the ensemble models using a threshold of 0.5. The
LSBG DETR ensemble had a TPR of 0.96 and an FPR of 0.07,
indicating that the LSBG DETR ensemble model can accurately
identify 96% of all LSBGs in the DES data, with an estimated
7% contamination rate in the predicted sample. Similarly, the
LSBG ViT Ensemble model can identify 97% of all the LSBGs
in DES but with 11% contamination.

The receiver operator characteristic (ROC) curve of the
LSBG DETR and LSBG ViT ensemble models are shown in
Fig. 5. In terms of accuracy and AUROC, the LSBG DETR
models performed slightly better than the LSBG ViT models.
It is clear from Fig. 5 that both the ensemble models have a
TPR ∼0.75 even for a high threshold such as 0.9. Indicating
that both the ensemble models can confidently identify around
∼75% of all the LSBGs in DES and assign these candidates with
a probability of greater than 0.9.

4.2. Search for LSBGs in the full coverage of DES

As the LSBG DETR model and the LSBG ViT model have dif-
ferent architectures and feature extraction principles, we regard
the ensemble models of these two as separate independent trans-
former classifiers. In order to search for new LSBGs from
DES, we employed the transformer ensemble model on the
419 782 objects that satisfied the selection criteria defined in
Sect. 2.2. The candidates scoring above the threshold proba-
bility of 0.5 were catalogued as potential LSBG candidates.

A4, page 7 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Fig. 4. Confusion matrix of LSBG DETR and LSBG ViT models plot-
ted for a threshold = 0.5. Class 0 represents the artefacts, and Class 1
represents the LSBGs.

10 2 10 1 100

FPR

0.5

0.6

0.7

0.8

0.9

1.0

TP
R

ROC evaluated on testing set

LSBG Vision ensemble AUROC = 0.9829
LSBG DETR ensemble AUROC = 0.9837
LSBG Vision ensemble threshold=0.9
LSBG DETR ensemble threshold=0.9

Fig. 5. Receiver operating characteristic curve of the ensemble mod-
els. The red and blue lines represent the variation of FPR and TPR as
a function of the threshold for LSBG DETR and LSBG Vision ensem-
bles, respectively. The red and blue points mark the TPR and FPR for a
threshold = 0.9.

Table 2. Architecture, accuracy, TPR, FPR, and AUROC of all the
models in chronological order of creation.

Model name Accuracy (%) TPR FPR AUROC

LSBG VISION 1 93.55 0.97 0.12 0.980
LSBG VISION 2 93.79 0.97 0.11 0.980
LSBG VISION 3 93.47 0.97 0.11 0.981
LSBG VISION 4 93.51 0.97 0.11 0.980

LSBG VISION Ensemble 93.75 0.97 0.11 0.983
LSBG DETR 1 94.36 0.97 0.09 0.982
LSBG DETR 2 94.28 0.96 0.08 0.980
LSBG DETR 3 94.36 0.96 0.08 0.982
LSBG DETR 4 94.24 0.95 0.07 0.982

LSBG DETR Ensemble 94.60 0.96 0.07 0.984

The LSBG DETR ensemble classified 27 977 objects as LSBGs,
among which 21 005 were already identified by Tanoglidis et al.
(2021b). Similarly, the LSBG ViT ensemble classified 30 508
objects as LSBGs, among which 21 396 LSBGs were also iden-
tified by Tanoglidis et al. (2021b). Therefore, finally, 6972 and
9112 new candidates were classified as potential LSBGs by the
LSBG DETR and LSBG ViT ensembles, respectively. However,

only the 6560 candidates identified by both the ensemble mod-
els independently were considered for further analysis in order
to reduce the number of false positives. As there is a possibility
that there might be duplicates of the same candidates existing in
the selected sample, we ran an automated spatial cross-match
to remove duplicate objects separated by <5′′. The origin of
these duplicates can be traced back to the fragmentation of larger
galaxies into smaller parts by SourceExtractor. After remov-
ing the duplicates, the number of potential LSBG candidates
reduced from 6560 to 6445. As discussed in Sect. 3.6, these can-
didates were subjected to single-component Sérsic model fitting
using Galfit.

During the Galfit modelling, 999 candidates had failed
fits and were consequently removed from the sample, because
our objective is to produce a high-purity sample with accu-
rate Sérsic parameters. We visually inspected these unsuccessful
fits and found that in most cases the presence of a very bright
object near the candidate was the cause of the poor Sérsic fit.
Of the remaining 5446 candidates, 4879 passed the µ̄eff and
r1/2 selection criteria outlined in Sect. 2.2 with the updated
parameters. These 4879 candidates were inspected visually to
identify the genuine LSBGs. After independent visual inspec-
tions by the authors, 4190 candidates were classified as LSBGs
and 242 candidates were found to be non-LSBGs. During visual
inspection, 447 candidates were found to be possible LSBGs
with unreliable measurements from Galfit. These candidates
are excluded from our final sample, and here we only report the
candidates most confidently identified as LSBGs during visual
inspection. After correcting for the Galactic extinction correc-
tion, our final sample reduced to 4083 new LSBGs from DES
DR1. The schematic diagram showing the sequential selection
steps used to find the new LSBG sample is shown in Fig. 6.
A sample catalogue listing the properties of the newly identi-
fied LSBGs is shown in Table 3, and some examples of the new
LSBGs that we have found are plotted in Fig. 7.

The distributions of the r1/2, µ̄eff , Sérsic index (n), and axis
ratio (q) of the new sample of LSBGs are plotted in Fig. 8. The
majority of the LSBGs in this new sample have r1/2 < 7′′ and µ̄eff
<26 mag arcsec−2. The Sérsic index of the new LSBG sample
predominantly lies between 0.5 and 1.5 and has a median value
of 0.85. This pattern is similar to the trend identified by Poulain
et al. (2021) in the case of dwarf ellipticals, suggesting that a sig-
nificant portion of the LSBG sample could be comprised of such
sources. In the case of the axis ratio, the new LSBG sample has a
median axis ratio of 0.72 and has a distribution lying in the range
of 0.3–1. The median value of 0.72 suggests that most galaxies
in this sample have a slightly flattened or elongated shape. A
detailed discussion of the properties of the new LSBGs identi-
fied in this work and their comparison with LSBGs identified by
Tanoglidis et al. (2021b) is presented in Sect. 5.

5. Discussion

5.1. Transformers as LSBG detectors

In this study, we introduce the use of transformers as classifier
models for finding the undiscovered LSBGs in DES. Currently,
in the literature, one of the reported deep-learning-based mod-
els for classifying LSBGs and artefacts is a CNN model named
DeepShadows created by Tanoglidis et al. (2021a). These authors
used the catalogue of LSBGs and artefacts identified from DES
reported in Tanoglidis et al. (2021b) to generate the training
data. The DeepShadows model achieved an accuracy of 92%
in classifying LSBGs from artefacts and had a TPR of 94%

A4, page 8 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Fig. 6. Schematic diagram showing the sequential selection steps used to find the new LSBG sample.

Table 3. Sample of new LSBGs identified in this work.

COADD_ID RA Dec ggf gcor µ̄g_eff_gf rg1/2 n q log10(Σstar) χ2
νg igf icor µ̄i_eff_gf ri,1/2 χ2

νi
(deg) (deg) (mag arcsec−2) (arcsec) (M⊙ kpc−2) (mag arcsec−2) (arcsec)

61 456 395 29.7062 −60.4882 19.06 18.97 25.41 9.41 2.17 0.62 5.86 1.06 18.72 18.67 25.37 10.86 1.03
61 508 029 29.925 4.60483 19.84 19.67 24.33 3.43 0.72 0.85 6.82 1.03 19.27 19.19 23.52 3.06 0.99
61 580 602 30.3125 −58.1927 18.89 18.83 24.56 5.99 1.5 0.82 6.87 1.02 18.07 18.04 23.72 5.93 1.02
61 638 403 29.9539 4.75811 20.02 19.87 24.28 3.84 0.74 0.54 6.52 1.04 19.61 19.53 23.62 3.45 1
61 638 933 29.4862 4.74468 19.2 19.06 24.24 4.59 0.93 0.79 6.82 0.98 18.7 18.63 23.38 3.89 0.99
61 712 539 30.2824 5.28992 19.67 19.53 24.26 4.57 1.01 0.52 6.58 1.01 19.47 19.4 23.2 3.07 1.03
61 766 250 30.1112 −9.24769 20.76 20.69 26.29 7.8 1.47 0.42 5.87 1.13 20.19 20.15 25.2 6.15 1.1
6 2011 325 29.9449 −6.78979 21.6 21.51 25.91 3.17 0.5 0.84 6.27 1.02 20.8 20.76 25.21 3.31 1
6 2053 525 29.8595 −16.8586 19.87 19.8 24.24 3.49 0.87 0.73 6.81 1.03 19.32 19.29 23.41 3.06 1.02
6 2071 354 29.8342 −17.1604 20.53 20.46 24.47 3.01 0.84 0.66 6.71 0.99 19.91 19.87 23.65 2.75 0.99
6 222 7622 29.5063 −13.3511 20.84 20.79 24.88 2.68 0.68 0.91 7.24 1.02 19.97 19.94 22.96 1.66 1.05
6 2371 677 29.6675 −5.35506 19.7 19.62 24.25 3.54 0.79 0.84 6.97 1.05 19 18.96 23.32 3.2 1.03
62 646 182 30.4643 −24.0018 18.46 18.4 24.44 6.88 1.19 0.83 6.77 1 17.98 17.95 23.59 5.8 0.97
6 2830 903 29.9594 −28.8656 20.11 20.07 24.35 2.94 0.71 0.91 7.56 1.03 19.01 18.99 22.44 2.03 1.04
62 840 965 30.1905 −6.18018 20.18 20.1 24.35 2.74 0.76 0.98 7.35 1.01 19.22 19.18 22.85 2.14 1
63 037 584 28.9741 −59.5261 19.46 19.39 24.4 6.33 0.9 0.38 6.31 1.01 18.88 18.85 23.99 6.81 1.01
63 097 874 29.3755 −61.1452 20.6 20.51 24.87 3.08 1.02 0.85 6.58 1.02 20.29 20.25 23.86 2.23 1.02
63 113 174 29.1535 −61.4326 18.81 18.72 24.78 6.73 0.8 0.86 6.47 1 18.52 18.47 24.15 5.76 0.99
63 262 376 30.2929 −24.9787 21.48 21.43 26.1 3.92 0.98 0.73 6.18 1.03 20.62 20.6 25.35 4.12 1.04
63 527 438 30.1134 −4.56234 18.8 18.73 24.51 6.72 0.42 0.68 6.2 1.02 18.68 18.65 24.37 6.63 0.99
63 716 543 30.0657 −39.5446 20.58 20.52 25.1 3.73 1.16 0.74 6.67 1.02 19.84 19.81 24.01 3.17 1.01
63 922 768 29.5244 −32.7958 18.41 18.35 24.6 7.62 1.61 0.83 6.59 1.08 18.06 18.03 23.87 6.37 1.04
64 480 503 29.4619 −23.0107 18.98 18.94 24.33 5.05 0.91 0.86 6.81 1.04 18.5 18.48 23.54 4.39 1
64 560 481 30.0521 −5.09785 18.18 18.1 24.36 7.21 1.17 0.91 6.86 1.06 17.66 17.62 23.49 6.14 1.01
64 697 654 29.5818 −8.50243 18.36 18.28 24.25 6.46 1 0.87 6.81 0.97 17.87 17.83 23.53 5.78 0.96
64 773 733 29.0012 −50.2462 19.39 19.32 24.7 5.13 1.4 0.8 6.66 1.03 18.94 18.91 23.78 4.13 1.01
64 868 340 29.3394 −23.8729 19.43 19.38 24.2 3.72 0.86 0.93 7.63 1.02 18.36 18.34 22.23 2.47 1.26

Notes. ‘COADD_ID’ is the unique id of the source, and ‘RA’ and ‘Dec’ gives the sky coordinates of the source as estimated from DES Y3 Gold
catalogue (Sevilla-Noarbe et al. 2021). Columns ‘ggf’, ‘gcor’, ‘µ̄g_eff_gf’, and ‘rg1/2’ represent the magnitude in g band, the g band magnitude after
correcting for Galactic extinction, mean surface brightness, and the half-light radius for the g-band fitting using Galfit, respectively. The columns
‘n’, ‘q’, and log10(Σstar) represent the Sérsic index, axis ratio, and the stellar mass density, respectively. Column ‘χ2

νg’ represents the reduced chi-
square value for the g-band fitting using Galfit. Similarly, columns ‘igf’, icor’, ‘µ̄i_eff_gf’,‘ri1/2’, and ‘χ2

νi’ represent the magnitude in i band, the
i band magnitude after correcting for Galactic extinction, the mean surface brightness, the half-light radius, and the reduced chi-square value for
the i band fitted using Galfit, respectively.

A4, page 9 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

(a) Coadd Object Id - 295747204 (b) Coadd Object Id - 61515112 (c) Coadd Object Id - 62646182

(d) Coadd Object Id - 64560481 (e) Coadd Object Id - 67813078 (f) Coadd Object Id - 69253856

(g) Coadd Object Id - 70739980 (h) Coadd Object Id - 73726929 (i) Coadd Object Id - 76917094

Fig. 7. Cutouts of nine confirmed new LSBGs after visual inspection. The unique identification number (co object id) for each galaxy in DES DR1
is given below each image. The images were generated by combining the g, r, and z bands using the APLpy package (Robitaille & Bressert 2012),
and each image corresponds to a 67.32′′ × 67.32′′ region of the sky with the LSBG at its centre.

with a threshold of 0.5. Moreover, the DeepShadows model also
achieved an AUROC score of 0.974 on this training dataset.
However, the DeepShadows was not applied to the complete
DES data and its performance was not evaluated. Nevertheless,
DeepShadows was the first deep-learning model used to classify

LSBGs and artefacts. In addition, Tanoglidis et al. (2021a)
showed that the DeepShaodws was a better classifier than the
support vector machine or random forest models. However, in
our work, all of our transformer models were able to surpass
the DeepShadows model in every metric individually, which

A4, page 10 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
 r1/2 (arcsec)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

N
or

m
al

is
ed

 g
al

ax
y 

co
un

t Median = 3.952 arcsec

25 26 27
eff (mag arcsec 2)

0.0

0.2

0.4

0.6

0.8

1.0

N
or

m
al

is
ed

 g
al

ax
y 

co
un

t Median = 24.729 mag arcsec 2

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Axis ratio (q)

0.0

0.5

1.0

1.5

2.0

2.5

N
or

m
al

is
ed

 g
al

ax
y 

co
un

t Median = 0.723

0.5 1.0 1.5 2.0 2.5 3.0 3.5
Sersic index (n)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

N
or

m
al

is
ed

 g
al

ax
y 

co
un

t Median = 0.846

Fig. 8. Normalised distribution of half-light radius (top left panel), mean surface brightness (top right panel), Sérsic index (bottom left panel), and
axis ratio (bottom right panel) of the new sample of LSBGs. The dashed line shows the median of the distribution.

can be seen from Table 2. Namely, in their respective classes,
LSBG DETR 1 and LSBG ViT 2 had the highest accuracies
(94.36 and 93.79%), respectively.

Earlier searches for LSBGs used semi-automated methods
such as pipelines based on imfit by Greco et al. (2018) or
simple machine-learning models such as SVMs by Tanoglidis
et al. (2021b). However, the success rate of these methods was
very low, and 50% of the final candidate sample produced by
these methods was made up of false positives, which had to be
removed by visual inspection. Here we explore the possibilities
of transformer architectures in separating LSBGs from artefacts.
We used two independent ensemble models of LSBG DETR and
LSBG ViT models and single-component Sérsic model fitting
to filter the LSBG candidates. Only 5% of our final sample was
made up of non-LSBGs, which is a significant improvement on
the results of previous methods in the literature. Following the
definition of an LSBG as described in Tanoglidis et al. (2021b),
we identified 4083 new LSBGs from DES DR1, increasing the
number of identified LSBGs in DES by 17%. Our results high-
light the significant advantage of using deep-learning techniques
to search for LSBGs in the upcoming large-scale surveys.

To gain further insight into the fraction of false posi-
tives from our method, we evaluated the performance of these
models during training. We encountered around 7 and 11% arte-
facts from the LSBG DETR ensemble and LSBG ViT sample,
respectively, during training on the test dataset. However, using
a combination of these models, we reduced the artefact fraction

to less than 5% during visual inspection. Most of the non-LSBGs
we encountered during visual inspection were faint compact
objects that blended in the diffuse light from nearby bright
objects. We use the term ‘non-LSBG’ instead of artefacts here
because, during visual inspection, we classified some potential
LSBGs as non-LSBGs; these are objects for which the g-band
images contained instrumental artefacts or lacked sufficient sig-
nal in the g-band. As the machine learning model takes three
bands as input (g, r and z), this suggests that the model was able
to study and generalise the nature of LSBGs in each band and
was able to predict whether or not it was an LSBG based on
the signal from the other bands. However, as we define LSBGs
based on their g-band surface brightness and radius in this
work, we classified the galaxies without reliable g-band data as
non-LSBGs. Some non-LSBGs we encountered during visual
inspection are shown in Figs. 9 and 10. With the upcoming
surveys of deeper imaging, these galaxies might be classified
as LSBGs, which might further reduce the non-LSBGs in our
candidate sample.

When discussing the non-LSBGs from the candidate sam-
ple, we must also mention that some of the candidates identified
as LSBGs by the ensemble models (567 out of 5446) did not
meet the selection criteria for being an LSBG after being fitted
with Galfit. These galaxies had r1/2 ranging from 2′′ to 20′′,
with a median of 3.85′′, which is similar to the new LSBG sam-
ple we found. However, the majority of these galaxies have a
mean surface brightness of between 24.0 and 24.2 mag arcsec−2,

A4, page 11 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

(a) (b) (c) (d)

Fig. 9. Examples of candidates (Coadd object id - 149796289 and 374192591) classified as non-LSBGs during visual inspection because of glitches
in the g-band near the galaxy. The panels a and c show the RGB image created using the g, r, and z bands with APLpy package (Robitaille &
Bressert 2012). Panels b and d show the image in the g band. Each image corresponds to a 67.32′′ × 67.32′′ region of the sky with the candidate at
its centre.

(a) Coadd object id - 251235955 (left) and 99585243 (right)

(b) Coadd object id - 125313682 (left) and 113818243 (right)

Fig. 10. Examples of candidates classified as non-LSBG during visual
inspection because of lack of sufficient signal in the g-band (a) are
shown in the top panel. Candidates classified as non-LSBG during
visual inspection because of being artefacts are shown in the lower
panel (b). The RGB images are created using the g, r, and z bands with
the APLpy package (Robitaille & Bressert 2012). Each image corre-
sponds to a 67.32′′ × 67.32′′ region of the sky with the candidate at
its centre.

with a median of 24.16 mag arcsec−2. This suggests that the
machine learning model understood the criteria for angular size
for LSBGs during its training, but it did not learn the strict
condition regarding surface brightness. This situation is simi-
lar to a human expert analysing a galaxy image to determine
whether or not it is an LSBG. Features such as the size of
the galaxy are easily identifiable to the human eye. However,
determining the surface brightness accurately with only the
human eye would be challenging, and there may be possible

errors near the threshold region, similar to our machine learn-
ing model. One could therefore say that the machine learning
model is behaving in approximately the same way as a human
visual expert.

Judging from the performance of our model on the training
data, we cannot assert that we have discovered all the possible
existing LSBGs from the DES DR1. As we can see from Fig. 4,
the TPRs for the individual ensemble models were 0.96 and 0.97,
respectively. This means that the model has not found all the pos-
sible LSBGs and a minor fraction of LSBGs is yet to be found in
DES DR1. Moreover, to reduce the FPR and the burden during
visual inspection, we only visually inspected the candidates iden-
tified commonly by both the ensemble models and that passed
the criteria for correctly fitting by Galfit.

We also note that, in this work, we are using two different
ensemble models, each being an ensemble of four models. As
mentioned earlier, each ML model can be considered equivalent
to a human inspector, and the ensemble models help balance
out the disadvantages of the other models in the ensemble. A
closer look at the individual probability distribution of these
models shows that there are 310 candidates among the 4 083 con-
firmed LSBG candidates, which had a probability of less than
0.5 for at least one model among the individual models. How-
ever, as we used an average ensemble model, we were able to
identify these LSBGs by balancing out the probability, which
demonstrates the advantage of using an ensemble model over a
single model.

Here, we use visual inspection as the final step to confirm
the authenticity of an LSBG detected by the models. How-
ever, it is essential to acknowledge the potential for human bias
during the visual inspection, which can impact the accuracy
and reliability of the results. For example, during the visual
inspection, there was disagreement over the labelling of approx-
imately 10% of the candidate sample. Most of these galaxies had
a mean surface brightness of greater than 25.0 mag arcsec−2,
which suggests that even for human experts, it is challenging to
characterise extremely faint LSBGs. With better imaging, this
might change, but we must acknowledge that there will always
be some human bias and error associated with human inspec-
tion. Also, we must consider that in the upcoming surveys,
such as LSST and Euclid, there will be too much data to make
visual inspection a viable possibility. In this scenario, relying
solely on improved automated methods to purify the sample and
accepting a small fraction of false positives could be a feasible
solution.

A4, page 12 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
g i

0.0

0.5

1.0

1.5

2.0

N
or

m
al

is
ed

 g
al

ax
y 

co
un

t

Blue LSBGs Red LSBGs

LSBGs identified
in this work
LSBGs identified in
Tanoglidis et al. (2021b)

24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0

eff (mag arcsec 2)

0.4

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

g
i

LSBGs identified in
Tanoglidis et al. (2021b)
LSBGs identified in
this work

Fig. 11. Normalised distribution of colour of the LSBGs from the new sample identified in this work and the LSBGs identified by Tanoglidis
et al. (2021b) plotted in the left panel. The right panel shows the colour as a function of mean surface brightness in the g-band for the new sample
identified in this work and the LSBGs identified by Tanoglidis et al. (2021b). The dashed line shows the separation between red and blue LSBGS.

6. The new sample of DES DR1 LSBGs

6.1. The newly identified LSBG sample

The optical colour of a galaxy can provide insights into its stellar
population. Conventionally, based on their colour, the galaxies
are divided into red and blue galaxies, and it has been shown
that colour is strongly correlated to the morphology of a galaxy
(Strateva et al. 2001). Blue galaxies are usually found to be highly
actively star-forming spiral or irregular systems. In contrast, red
galaxies are mostly found to be spheroidal or elliptical. In addi-
tion, the red galaxies have also been found to tend to cluster
together compared to the blue galaxies (Bamford et al. 2009).

The LSBGs found by Tanoglidis et al. (2021b) have found a
clear bimodality in the g − i colour distribution, which is sim-
ilar to the LSBGs found by Greco et al. (2018). In the left
panel of Fig. 11, we present the g − i colour distribution of the
4083 new LSBGs and the 23 790 LSBGs found by Tanoglidis
et al. (2021b). We follow the criteria defined by Tanoglidis et al.
(2021b) to define red galaxies as galaxies with g − i > 0.6 and
blue galaxies as galaxies with g − i < 0.6, where g and i rep-
resent the magnitude in each band. In the right panel of Fig. 11,
we present the colour as a function of mean surface brightness in
g-band for the new sample identified in this work and the LSBGs
identified by Tanoglidis et al. (2021b). There are 1112 red LSBGs
and 2944 blue LSBGs in the new LSBG sample4. From Fig. 11,
we can see that we have identified a relatively large fraction of
blue LSBGs compared to Tanoglidis et al. (2021b) and a con-
siderable fraction of new red LSBGs with g − i ≥ 0.80 and a
mean surface brightness of less than 25.0 mag arcsec−2. The bias
against blue LSBGs and highly red LSBGs in the sample created
by Tanoglidis et al. (2021b) may have been caused by the bias
in the training set used to create the SVM, which preselected the
LSBG candidates. This bias could have occurred because a large
fraction of their training set consisted of LSBGs near the Fornax
cluster, which are mainly red LSBGs.

Looking at the distribution of µ̄eff values of the new sample,
both the red and blue LSBGs have a similar mean surface bright-
ness range, with median µ̄eff of 24.75 and 24.68 mag arcsec−2,
respectively. Both red LSBG and blue LSBG populations from

4 27 LSBGs failed the modelling using Galfit for i-band, and they are
not included in this colour analysis.

Fig. 12. Normalised distribution of axis ratio (left panel) of red and
blue LSBGs from the new sample. The vertical lines show the median
for each class.

the new sample have sizes ranging from 2.5′′ to 20′′. However,
as mentioned above, most of these LSBGs have radii of less than
7′′, with a median of 4.01′′ for blue LSBGs and 3.59′′ for red
LSBGs. In comparison, blue LSBGs tend to have larger angu-
lar radii compared to red LSBGs. The Sérsic index distribution
is similar in both the red and blue LSBGs in the new sample
and they have an almost equal median value (0.847 and 0.845
for red and blue LSBGs, respectively). A median Sérsic index of
around 0.84 indicates that the majority of the galaxies are closer
to a disc-shaped geometry, irrespective of their colour. The dis-
tribution of the axis ratio of the red LSBGs from the new sample
is clearly different from that of the blue LSBGs, as shown in
Fig. 12. The median of the axis ratio distribution of the blue and
red LSBGs is 0.7 and 0.8, respectively. This indicates that, in
general, the red LSBGs are rounder than the blue LSBGs.

6.2. Why are there additional LSBGs?

Another aspect worthy of investigation at this moment is the
extent to which the new LSBG sample is different from the
LSBGs identified by Tanoglidis et al. (2021b). More specifi-
cally, one might wonder why this many LSBGs were previously

A4, page 13 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

Fig. 13. Normalised distribution of the Sérsic index of the LSBGs iden-
tified in this work and by Tanoglidis et al. (2021b). The vertical lines
show the median for each class.

missed, and whether or not this is somehow related to the nature
of the galaxies themselves. Apart from the Sérsic index, all other
Sérsic parameters of the new and old samples have a similar dis-
tribution. The distribution of the Sérsic index for the new sample
identified in this work and that of the LSBG sample identified by
Tanoglidis et al. (2021b) are shown in Fig. 13. The new LSBG
sample has a Sérsic index predominantly in the range n < 1,
which is lower than the Sérsic index distribution of LSBGs iden-
tified by Tanoglidis et al. (2021b). However, this does not point
to any reason why these LSBGs were missed in the previous
search, and moreover, Tanoglidis et al. (2021b) also mentioned
an under-representation of red LSBGs with small Sérsic index in
their sample.

To answer the aforementioned question, a close inspection of
the methodology of Tanoglidis et al. (2021b) shows that most
of the new LSBGs (82%) we identified here were missed by
the SVM in their first preselection step. This demonstrates the
importance of methodology in preselecting the samples. As the
methodologies used by Tanoglidis et al. (2021b) and Greco et al.
(2018) show considerable similarity (e.g. usage of SVM), this
indicates that Greco et al. (2018) might have also missed some
LSBGs from the HSC-SSP survey and that the fraction should
be greater in comparison to Tanoglidis et al. (2021b). It should
be noted that there is a slight overlap in the regions of observa-
tion by Greco et al. (2018) and DES, as shown in Fig. 14. There
are 198 LSBGs identified by Greco et al. (2018) from HSC-SSP
in the field of view of DES and detected in the DES Y3 Gold
catalogue. Among these 198 LSBGs, Tanoglidis et al. (2021b)
recovered 183 LSBGs, and we recovered 10 additional LSBGs
from this field, taking the total number of recovered LSBGs to
193. We would also like to point out that there are additional
LSBGs (∼200) in our total sample in the same region that were
missed by Greco et al. (2018), despite the fact that the HSC-
SSP data used by Greco et al. (2018) are about two orders of
magnitude deeper than the DES DR1. However, we also missed
some LSBGs (∼150) that were identified by Greco et al. (2018).
These LSBGs were not detected in the DES Y3 Gold catalogue
and were subsequently missed by the searches by Tanoglidis
et al. (2021b) and ours. Given that the DES data release 2 (DES
DR 2) are of greater depth (∼0.5 mag; Abbott et al. 2021),
we should expect an increase in the number of LSBGs from
DES. Therefore, there is a potential for using transfer learn-
ing with transformers in the future search for LSBGs from

DES DR 2 (Abbott et al. 2021) and HSC-SSP Data Release 3
(Aihara et al. 2022).

With the addition of the new 4083 LSBGs, the number of
LSBGs in the DES increased to 27 873, effectively increasing
the average number density of LSBGs in DES to ∼5.5 deg−2.
In addition, it should be noted that there are still around
3000 candidates identified by the ensemble models, which were
not analysed further for verification of their possible LSBG
nature, potentially indicating that the number of LSBGs in
DES might increase further in future. The average number den-
sity of 5.5 deg−2 reported here can therefore only be taken as
a lower limit. Earlier, Greco et al. (2018) estimated that the
average number density of LSBGs in HSC-SSP is ∼3.9 deg−2.
However, this estimate was based on LSBG samples with µ̄eff
> 24.3 mag arcsec−2, unlike the µ̄eff > 24.2 mag arcsec−2

selection we adopted in this work. For a similar selection on
mue > 24.2mag arcsec−2 in the combined sample presented
here (LSBGs identified in this work plus LSBGs identified by
Tanoglidis et al. 2021b), we obtain a higher number density of
4.9 deg−2, compared to the previous estimates (3.9 deg−2 from
Greco et al. 2018 and 4.5 deg−2 from Tanoglidis et al. 2021b).

As discussed above, the number density of LSBGs will be
influenced by the methodology used to search for them. Sim-
ilarly, one other intrinsic factor that can influence the number
density is the completeness of the survey. Improved imaging
techniques can reveal fainter objects, leading to an increase in the
number density. The completeness of a survey can be determined
by plotting the galaxy number count, and one could also obtain
a rough idea of the redshift distribution of the objects of inter-
est by comparing this count with the Euclidean number count.
Figure 15 shows the number count of LSBGs identified in DES
(this work and Tanoglidis et al. 2021b) and HSC (Greco et al.
2018). As expected, HSC has higher completeness than DES.
However, HSC still has a lower number density than DES, which
is evident from comparing the peaks of both number counts.
The slope of the number counts near 0.6 (representing Euclidean
geometry) for both HSC and DES suggests that most identified
LSBGs are local (Yasuda et al. 2001). Furthermore, Greene et al.
(2022) analysed the LSBG sample from HSC and estimated that
the 781 LSBGs identified by Greco et al. (2018) have a redshift
of less than 0.15.

With the increasing number of LSBGs identified from dif-
ferent surveys, a further open question at this moment refers to
the precise definition of an LSBG. A different definition for an
LSBG could be used, consequently leading to finding a com-
pletely different sample of LSBGs from the same dataset, which
in turn could affect the conclusions of the study. The current
discrepancies in defining LSBGs largely stem from the predom-
inant reliance on surface brightness-based definitions, which are
inherently dependent on the observational band in use. Differ-
ent observation bands may involve distinct threshold values.
Depending on the band we use, the LSBG definition will likely
vary. In this scenario, one potential solution is to define an LSBG
based on the stellar mass density of the galaxy. Current defi-
nitions based on the stellar mass density define an LSBG as a
galaxy with a stellar mass density of Σstar ≲ 107 M⊙ kpc−2 (e.g.
Carleton et al. 2023). Following Eq. (1) of Chamba et al. (2022),
we estimated the stellar mass surface density using our observed
i-band surface brightness µ̄eff and the stellar mass-to-light ratio
obtained from the g − i colour (Du et al. 2020). The stellar mass
surface density distributions of the LSBGs from DES and HSC-
SSP are shown in Fig. 16. Here, we can see that most of the
LSBGs satisfy this condition, and only a small percentage stay
above the threshold of 107 M⊙ kpc−2. On average, the LSBGs

A4, page 14 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

300306090120150 9060
RA [deg]

300306090

6060

4545

3030

1515

00

D
ec

 [d
eg

]

DES footprint
HSC
DES

Fig. 14. Sky distribution of the LSBGs identified from DES (black dots) by Tanoglidis et al. (2021b) and in this work and the LSBGs identified
from HSC-SSP (blue dots) by Greco et al. (2018).

Fig. 15. Number count of galaxies as a function of i-band magnitude,
with the y-axis displaying the logarithm of the number density as a
function of apparent magnitude. The red line with the blue error bars
represents the data from HSC, and the black dashed line with green
error bars represents the data from DES.

from DES have a higher stellar mass surface density than those
from HSC-SSP, which could be attributed to the higher depth
in the data used by Greco et al. (2018). However, as argued by
Chamba et al. (2022), accurate estimation of the stellar mass
density requires deep photometry in multiple bands. In our case,

5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5

log10(Σstar (M� kpc−2))

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

N
or

m
al

is
ed

ga
la

x
y

co
u

n
t

LSBGs in DES

LSBGs in HSC

Fig. 16. Normalised distribution of stellar mass surface density of
LSBGs identified in HSC (red line) and DES (black line).

we employed a single colour, and as a result, the constraints we
derived on the stellar mass density may be limited in accuracy.

7. Clustering of LSBGs in DES

The on-sky distributions of the red and blue LSBGs
identified in this work, along with those identified by
Tanoglidis et al. (2021b), are shown in Figs. 17 and 18. In the

A4, page 15 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

−30◦0◦30◦60◦90◦120◦150◦ −90◦−60◦

RA [deg]

−30◦0◦30◦60◦90◦

−60◦−60◦

−45◦−45◦

−30◦−30◦

−15◦−15◦

0◦0◦

D
ec

[d
eg

]

DES footprint

Red LSBGs identified in
this work

Red LSBGs identified in
Tanoglidis et al. (2021b)

Fig. 17. Sky distribution of the red LSBGs identified in this work (red dots) and of the LSBGs identified (black dots) by Tanoglidis et al. (2021b).

local Universe, ‘normal’ high-surface-brightness red galaxies
tend to cluster together, while blue galaxies are much more dis-
persed in the field (Zehavi et al. 2005). Such a trend is also
clearly visible for the LSBG sample. As seen in Fig. 17, red
LSBGs tend to form concentrated nodes. In contrast, the blue
LSBGs are distributed much more homogeneously in the sky, as
seen in Fig. 18.

A two-point auto-correlation function is a statistical tool
commonly used to quantify the galaxy clustering (Peebles 1980).
Here we use the angular two-point auto-correlation function,
ω(θ), computed using the Landy & Szalay (1993) estimator,
which is defined as

ω =
D̂D(θ) − 2D̂R(θ) + R̂R(θ)

R̂R(θ)
, (11)

where

D̂D =
DD(θ)

nd(nd − 1)/2
, (12)

D̂R =
DR(θ)
ndnr

, (13)

R̂R =
DD(θ)

nr(nr − 1)/2
, (14)

where DD(θ) is the number of pairs in the real sample with angu-
lar separation θ, RR(θ) is the number of pairs within a random
sample, DR(θ) is the number of cross pairs between the real and
random samples, nd is the total number of real data points, and
nr is the total number of random points.

We use a random sample of 4 491 746 points generated
from the DES footprint mask. To compute ω(θ) we employ

treecorr (Jarvis 2015). Errors are estimated using jackknife
resampling where the sky is divided into 100 equal-sized batches
for resampling (Efron & Gong 1983). For samples of high-
surface-brightness galaxies (HSBGs), the angular correlation
function can very often be well fitted by a single power law
(Peebles & Hauser 1974; Peebles 1980; Hewett 1982; Koo &
Szalay 1984; Neuschaefer et al. 1991):

ω(θ) = Aθ1−γ, (15)

where A is the amplitude that represents the strength of the clus-
tering, and γ represents the rate at which the strength of the
clustering reduces as we go to large angular scales. This power-
law behaviour is usually observed on a wide range of angular
scales; however, it is not universal, especially on the smallest
scales. Full modelling of the shape of the correlation function
requires accounting for the different processes governing galaxy
clustering on small scales (corresponding to galaxies located in
the same dark matter halo) and at larger scales (corresponding
to clustering of different haloes). This modelling is usually done
using the halo occupation distribution models (HODs; Ma & Fry
2000; Peacock & Smith 2000; Zheng et al. 2005; Kobayashi et al.
2022). However, in this work, we perform only a preliminary
analysis and base interpretation of our data on the power-law
fitting alone.

To compare the clustering of the LSBGs with the cluster-
ing of the HSBGs, we constructed a control sample of HSBGs
from the DES data. To this purpose, we selected galaxies in
the surface brightness range 20.0 < µ̄eff < 23 mag arcsec−2

and in the magnitude range 17 < g < 23 mag (which is
the same magnitude range as our LSBG sample). Addition-
ally, we applied a photometric redshift z < 0.1 cut in order

A4, page 16 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

−30◦0◦30◦60◦90◦120◦150◦ −90◦−60◦

RA [deg]

−30◦0◦30◦60◦90◦

−60◦−60◦

−45◦−45◦

−30◦−30◦

−15◦−15◦

0◦0◦

D
ec

[d
eg

]

DES footprint

Blue LSBGs identified in
this work

Blue LSBGs identified in
Tanoglidis et al. (2021b)

Fig. 18. Sky distribution of the blue LSBGs identified from the new sample (blue dots) and of the LSBGs identified (black dots) by Tanoglidis
et al. (2021b).

to keep the HSBGs sample consistent with the LSBGs, which
are also expected to be mostly local (Greene et al. 2022).
For this end, we used the photometric redshifts from the DES
Y3 Gold catalogue calculated using the Directional Neigh-
bourhood Fitting (DNF) algorithm (Sevilla-Noarbe et al. 2021;
De Vicente et al. 2016). In addition, we also applied the
selection cuts on the parameters from SourceExtractor such
as SPREAD_MODEL,EXTENDED_CLASS_COADD and on colours
(using the MAG_AUTO magnitudes) as described in Sect. 2.2.

Initially, we computed the angular two-point auto-correlation
function for the samples of LSBGs and HSBGs. We then split
the samples into red and blue galaxies to measure their cluster-
ing properties separately. For LSBGs, we followed the criterion
defined in Sect. 6, that is, a colour cut of g − i = 0.6 mag to
separate blue and red sources. As seen from the colour his-
togram presented in Fig. 19, the HSBGs show a bimodality
around g − i = 1.0 mag, which can be most likely attributed to
their different stellar masses. Consequently, we use the bound-
ary g − i = 1.0 mag to divide our HSBG sample into red and
blue subsamples. The properties of all the samples used for the
measurement of the galaxy clustering, together with the best-
fit power-law parameters, are listed in Table 4. The two-point
autocorrelation functions for all the samples described above are
shown in Fig. 20.

As is clear from Fig. 20, the angular two-point auto-
correlation function of the red LSBGs does not follow a power
law at small angular scales. Therefore, the power-law fits were
only performed in the range of 0.15 deg to 7 deg to prevent them
being affected by the one-halo effects. In part well fitted by the

−0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

g − i
0

5000

10000

15000

20000

25000

30000

G
al

ax
y

co
u

n
t Blue HSBGs Red HSBGs

HSBGs

Fig. 19. Colour distribution of the HSBGs from the DES DR1. The
vertical line at g − i = 1.0 shows the colour separation of the HSBGs
into red and blue galaxies.

power law, for the red LSBGs, ω(θ) is significantly steeper than
for the blue LSBGs; however, it flattens at smaller scales, that is,
between 0.01 deg and 0.2 deg. This behaviour is also transmit-
ted to the full sample of LSBGs. In contrast, the blue LSBGs
follow a power-law behaviour, with a lower clustering ampli-
tude and a much shallower slope, at almost all angular scales.
This behaviour of the angular correlation function might be
explained by the observations by van der Burg et al. (2016) and
Wittmann et al. (2017) that the number of LSBGs close to the

A4, page 17 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

10 1 100 101

 [deg]

10 2

10 1

100

w
(

)

All LSBGs,
a = 1.941 ± 0.048

All HSBGs,
ah = 1.651 ± 0.021

10 1 100 101

 [deg]

10 2

10 1

100

w
(

)

Red LSBGs,
rl = 2.090 ± 0.071

Blue LSBGs,
bl = 1.620 ± 0.025

Red HSBGs,
rh = 1.848 ± 0.012

Blue HSBGs,
bh = 1.631 ± 0.036

Fig. 20. Angular autocorrelation function for the full sample of LSBGs (grey line with open circles) and the sample of HSBGs (black line with
crosses) is shown in the left panel. The angular autocorrelation function of the red LSBGs (red line), blue LSBGs (blue line), red HSBGs (orange
line), and blue HSBGs (purple line) is shown in the right panel. The vertical green shaded region represents the region fitted for a power law
(ω = Aθ1−γ), and the corresponding γ values are shown in the legend.

Table 4. Best-fitting power-law parameters for the angular two-point
autocorrelation function for HSBGs and LSBGs along with information
on the number of galaxies, median g-band magnitude, and the mean
surface brightness for each sample.

Sample Number of
galaxies

Median
g

(mag)

Median
µ̄eff

(mag arcsec−2)
A γ

All
HSBGs 451,310 18.84 21.66 0.091

±0.004
1.651
±0.021

Red
HSBGs 103,900 17.96 21.21 0.245

±0.004
1.848
±0.012

Blue
HSBGs 347,410 19.21 21.81 0.0648

±0.004
1.631
±0.036

All
LSBGs 27,840 20.11 24.66 0.138

±0.013
1.941
±0.048

Red
LSBGs 18,924 20.23 24.89 0.671

±0.079
2.090
±0.071

Blue
LSBGs 8,916 20.07 24.59 0.051

±0.001
1.620
±0.025

cores of galaxy clusters decreases. Such suppression may reduce
the clustering power on small scales, leading to a flattening of
the autocorrelation function, which is seen for the red LSBGs,
which are mostly associated with clusters.

Notable differences are also seen in the clustering of the
LSBGs and the HSBGs. Not surprisingly, red samples – both of
HSBGs and LSBGs – are more clustered than their blue counter-
parts. At the same time, the red LSBG sample has a significantly
higher clustering amplitude than the reference red HSBG sam-
ple. Red LSBGs also display a steeper slope of ω(θ) at angular
scales larger than 0.15 deg, but at smaller scales, their ω(θ) flat-
tens, unlike in the case of red HSBGs for which we can even
observe hints of an upturn, which can be associated with a one-
halo term. This picture is consistent with a scenario in which

red LSBGs are mostly associated with dense structures like clus-
ters; however, these LSBGs do not populate the centres of these
structures but rather their outskirts. In contrast, red HSBGs dis-
play the usual behaviour of red passive galaxies, appearing in a
variety of environments, with a tendency to cluster and gather
most strongly in the cluster centres.

Blue LSBGs have a significantly lower clustering amplitude
than their HSBG counterparts. At the same time, the slope of
their ω(θ) at scales larger than 0.15 deg remains very similar.
The blue HSBGs and LSBGs follow the usual distribution of
blue star-forming galaxies, dispersed in the field and avoiding
clusters. These results are consistent with the results obtained by
Tanoglidis et al. (2021b) for their sample of DES LSBGs. These
latter authors compared the clustering of LSBGs with very bright
galaxies in the magnitude range of 14 < g < 18.5 mag from the
2MPZ catalogue (Bilicki et al. 2014), finding that LSBGs had
higher clustering amplitude in the range of 0.1–2 degrees, which
is similar to our observations.

However, our results contradict the early estimates from
Bothun et al. (1993) and Mo et al. (1994), who infer that the
LSBGs tend to show weak spatial clustering. However, their
analyses were limited by a small data sample (∼400 LSBGs), a
small area of the sky, and most likely selection biases. Given the
low accuracy of photometric redshifts for LSBGs in our sample,
we do not attempt to reconstruct their spatial clustering in this
work. Further analysis is planned as a follow-up to this study.

8. Identification of ultradiffuse galaxies

As discussed in Sect. 1, UDGs are a subclass of LSBGs that
have extended half-light radii of r1/2 ≥ 1.5 kpc and a central sur-
face brightness of µ0 > 24 mag arcsec−2 in g-band (van Dokkum
et al. 2015). A significant population of UDGs has been discov-
ered in the Coma cluster by van Dokkum et al. (2015) and other
investigations have revealed a large number of UDGs in other
galaxy clusters (Koda et al. 2015; Mihos et al. 2015; Lim et al.
2020; La Marca et al. 2022). Later studies showed that thousands
of UDGs can be found in single individual clusters and that the
abundance of UDGs scales almost linearly with host halo mass
(van der Burg et al. 2016; Mancera Piña et al. 2018).

A4, page 18 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

1.5 kpc

(a) Coadd Object Id - 221536249

1.5 kpc

(b) Coadd Object Id - 287347379

1.5 kpc

(c) Coadd Object Id - 323324928

1.5 kpc

(d) Coadd Object Id - 392317512

1.5 kpc

(e) Coadd Object Id - 295038501

1.5 kpc

(f) Coadd Object Id - 461241198

Fig. 21. Cut-outs of six confirmed new UDGs. The unique identification number (Coadd Object Id) for each galaxy in DES DR1 is given below
each image. The images were generated by combining the g, r, and z bands using the APLpy package (Robitaille & Bressert 2012), and each image
corresponds to a 33.66′′ × 33.66′′ region of the sky with the UDG at its centre.

In order to search our sample of LSBGs – identified in
DES – for cluster UDGs, we cross-matched our total LSBG sam-
ple (23 790 LSBGs from Tanoglidis et al. 2021b and the 4083
new LSBGs we identified) with the X-ray-selected galaxy clus-
ter catalogue from the ROSAT All-Sky Survey (RXGCC; Xu
et al. 2022). All the LSBGs at an angular distance from the
centre of the cluster of less than R200

5 (i.e. the virial radius
of the cluster) were associated with that cluster. Here, R200 is
the radius at which the average density of a galaxy cluster is
200 times the critical density of the Universe at that redshift. We
find 1310 LSBGs from the combined catalogue and 123 LSBGs
from our new sample to be associated with 130 and 53 clus-
ters, respectively. Using the redshift of the cluster provided in
Xu et al. (2022), and assuming that any associated LSBG is
at the same redshift as the cluster, we estimated the half-light
radius of those LSBGs and their projected comoving distance
from the cluster centre. It should be noted that, as we perform
our cross-matching with only projected distances, some of the
LSBGs associated with clusters could be non-cluster members

5 We used the R500 values and the redshifts provided by Xu et al. (2022)
to obtain the R200 cross-matching radius. Following Ettori & Balestra
(2009), we assume R200 ≈ R500/0.65, where R500 is the radius at which
the average density of a galaxy cluster is 500 times the critical density
of the universe at that redshift.

that are projected along the field. However, this is unlikely to be
the case for all of them, and given that we do not have any other
distance estimate for the LSBGs, we chose to adopt this method.
However, it should also be noted that UDGs are not exclusively
located in clusters; they can also be observed in groups (Cohen
et al. 2018; Marleau et al. 2021) and even in field environments
(Prole et al. 2019). In this section, we focus on the LSBGs and
UDGs associated with the clusters.

Among the 1310 cluster LSBGs, we further classify
317 cluster UDG candidates based on their half-light radius
(r1/2 ≥ 1.5 kpc) and their central surface brightness (µ0 >
24.0 mag arcsec−2) in g-band. As we have not confirmed the
physical distances to these galaxies, and therefore cannot be cer-
tain of their physical sizes, they can only be regarded as UDG
candidates. From here onward, when referring to UDGs in this
paper, it is important to note that we are discussing UDG candi-
dates and not confirmed UDGs. These 317 UDGs are distributed
within 80 clusters, making it the largest sample of clusters in
which UDGs have been studied. It should also be noted that
Tanoglidis et al. (2021b) also identified 41 UDGs from their
LSBG sample in DES by associating the nine most overdense
regions of LSBGs with known clusters. However, these authors
did not study the properties of those 41 UDGs in detail, and the
276 UDGs among the 317 UDGs reported here are completely
new. The UDGs presented here have a median r1/2 of 2.75 kpc

A4, page 19 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
g i

0.0

0.5

1.0

1.5

2.0

2.5

N
or

m
al

is
ed

 g
al

ax
y 

co
un

t

Blue galaxies Red galaxies

Cluster UDGs
Cluster LSBGs

Fig. 22. Colour distribution of the 1310 cluster LSBGs and 317 cluster
UDGs from the DES DR1.

24.0 24.5 25.0 25.5

0 (mag arcsec 2)

2

4

6

8

10

12

14

r 1
/2

 (k
pc

)

Blue UDGs
Red UDGs

0

1

2

N
or

m
al

is
ed

 c
ou

nt

Blue Median: 24.41
Red Median: 24.52
Blue UDGs
Red UDGs

0.0 0.2 0.4
Normalised count

Blue Median: 2.78
Red Median: 2.75
Blue UDGs
Red UDGs

Fig. 23. Joint distribution of the red (red dots) and blue (blue cross)
UDGs in the space of r1/2 and µ0 in the g-band. The vertical lines in the
histogram on the x-axis and y-axis show the median for each class.

and µ0 of 24.51 mag arcsec−2. Six of the newly identified UDGs
are shown in Fig. 21.

As seen from Fig. 22, the majority of the cluster UDGs (253
out of 317) are red in colour (g− i > 0.6 mag), which is similar to
the trend of cluster LSBGs (909 out of 1310). This is consistent
with theoretical predictions for cluster UDGs (Benavides et al.
2023). Mancera Piña et al. (2019) found a similar distribution for
the g − r colour of 442 UDGs observed in eight galaxy clusters.
The joint distribution of the red and blues UDGs in the space of
r1/2 and µ0 is shown in Fig. 23. The red UDGs presented here
have a median r1/2 of 2.75 kpc and µ0 of 24.52 mag arcsec−2.
Similarly, the blue UDGs have a median r1/2 of 2.78 kpc and µ0
of 24.41 mag arcsec−2. Most of the red and blue UDGs have
a half-light radius in the range 1.5 < r1/2 < 6 kpc. However,
there is a small fraction of UDGs (6 out of 317) with r1/2 >
10 kpc; these are all red and have µ0 < 25.0 mag arcsec−2,
and could be regarded as good potential candidates for
follow-up studies.

For all the cluster LSBGs, we can see a gradient in colour
as shown in Fig. 24, where LSBGs towards the outskirts of clus-
ters tend to be bluer than those in the centre. This is similar to

the behaviour found in Virgo cluster LSBGs from Junais et al.
(2022). However, for the cluster UDGs presented in this study,
the colour gradient appears much weaker, showing an almost
flat distribution in comparison to the LSBGs. A similar weak
trend, where more blue UDGs are found towards the cluster cen-
tre, was also noted by Mancera Piña et al. (2019). On the other
hand, Román & Trujillo (2017) and Alabi et al. (2020) reported
a more pronounced colour trend as a function of cluster-centric
distance, while La Marca et al. (2022) did not find any signifi-
cant trend. However, when directly comparing the trends in the
colour of UDGs in the cluster, one should keep in mind that these
trends will be affected by several factors, such as the bands used
to determine colour, sample size, and the studied cluster, as we
can see from the results in the literature. For example, our sam-
ple size (>300) is similar to the sample size of Mancera Piña
et al. (2019), and we obtain similar results, whereas our find-
ings are different from those of Román & Trujillo (2017); Alabi
et al. (2020) and La Marca et al. (2022), who use smaller sample
sizes (<40).

The trend observed in the half-light radius (Fig. 24) for both
the cluster LSBGs and UDGs is quite evident. As we move
towards the outer regions of the cluster centre, both LSBGs and
UDGs show an increase in size. This behaviour is in agreement
with the findings of Román & Trujillo (2017). The gradients we
observe in colour and size with respect to the cluster-centric dis-
tance are consistent with the proposed UDG formation scenarios,
such the galaxy harassment (Conselice 2018), tidal interac-
tions Mancera Piña et al. (2019), and ram-pressure stripping
(Conselice et al. 2003b; Buyle et al. 2005). Such trends are also
similar to what is observed for dwarf galaxies in the literature
(Venhola et al. 2019), providing further support for the argu-
ment that UDGs can be considered a subset of dwarf galaxies
(Conselice 2018; Benavides et al. 2023).

The sample of UDG candidates presented here will be
the subject of a follow-up analysis. Additionally, it should be
noted that all the UDGs reported here are cluster UDGs. The
actual number of UDGs in the LSBG catalogue (including low-
density environments) might be more than this, and therefore
the reported number is only a lower limit on the total number
of UDGs.

9. Conclusions

In this paper, we explore the possibility of using transformers to
distinguish LSBGs from artefacts in optical imaging data. We
implemented four transformer models that combined the use of
CNN backbone and self-attention layers to classify the labels; we
call them LSBG DETR (LSBG detection transformers) models.
Similarly, we created four transformer models that directly apply
attention to the patches of the images without any convolutions;
we call these models LSBG vision transformers. We compared
the performances of these two different architectures to that
of the LSBG identification CNN model called DeepShadows
presented in Tanoglidis et al. (2021a). We find that the trans-
former models perform better than DeepShadows.We then used
the ensemble of our transformer models to look for new LSBGs
in the DES DR1 data that the previous searches may have
missed. We follow the definition of an LSBG used by Tanoglidis
et al. (2021b); that is, we define LSBGs as galaxies with a
g-band mean surface brightness of µ̄eff > 24.2 mag arcsec−2 and
a half-light radius of r1/2 > 2.5′′. Following this definition, we
identified 4083 new LSBGs from the DES DR1, increasing the
number of identified LSBGs in DES by 17%.

A4, page 20 of 23


Thuruthipilly, H., et al.: A&A, 682, A4 (2024)

0.0 0.2 0.4 0.6 0.8 1.0
Projected cluster centric distance (R/R200)

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

g
i

Cluster LSBGs best fit: y = 0.13x + 0.76
Cluster UDGs best fit: y = 0.01x + 0.74
Cluster LSBGs
Cluster UDGs

0.0 0.2 0.4 0.6 0.8 1.0
Projected cluster centric distance (R/R200)

0

2

4

6

8

10

12

14

r 1
/2

 (k
pc

)

Cluster LSBGs best fit: y = 0.62x + 2.39
Cluster UDGs best fit: y = 0.91x + 2.82
Cluster LSBGs
Cluster UDGs

Fig. 24. g − i colour of the cluster LSBGs (black points) and r1/2 as a function of the projected distance from their cluster centre (in units of the
cluster radius R200); left and right panels, respectively. The UDGs are marked as red hollow circles. The green line and the grey-shaded region are
the linear best fit and the 1σ scatter for the cluster LSBGs, respectively. The blue dashed line is the linear best fit for the cluster UDGs.

Our sample selection and LSBG identification pipeline con-
sist of the following steps:
1. We preselect the objects from the DES Y3 Gold catalog

based on the selection criteria described in Tanoglidis et al.
(2021b) using the SourceExtractor parameters.

2. We applied the ensemble of transformer models to this sam-
ple of preselected objects. We chose the objects identified
independently by both the LSBG DETR ensemble and the
LSBG ViT ensemble for further follow-up inspection to
clarify any LSBG identifications.

3. We performed a Sérsic fitting using Galfit and reapplied
the selection cuts to further reduce the number of false pos-
itives. After this step, 4879 LSBG candidates were retained
for subsequent visual inspection.

4. Following visual inspection, we report the presence of 4083
new LSBGs identified by the transformer ensemble models.
Following Tanoglidis et al. (2021b), we divided the total

LSBG sample into two subsamples according to their g − i
colour. Among the 4083 new LSBGs presented here, 72%
were identified as blue LSBGs, which is higher than the 67%
observed in the sample presented by Tanoglidis et al. (2021b).
Additionally, we also find that we have a higher fraction of
red LSBGs with colour g − i > 0.8 compared to the sample of
LSBGs presented by Tanoglidis et al. (2021b). We speculate that
the bias might originate from the training set used by Tanoglidis
et al. (2021b) to train the SVM model to preselect the LSBG
candidate sample.

By combining the previously identified 23 790 LSBGs from
Tanoglidis et al. (2021b) with the LSBGs newly identified in
our work, the total number of known LSBGs in the DES is
increased to 27 873. This increases the number density of LSBGs
in the DES from 4.13 to 4.91 deg−2 for LSBGs with µ̄eff >
24.3 mag arcsec−2 and from 4.75 to 5.57 deg−2 for LSBGs with
µ̄eff > 24.2 mag arcsec−2. It should be stressed that this is a lower
limit to the number density, and will likely increase in the future
with the improved imaging quality and methodology of surveys
such as LSST and Euclid.

We also carried out an analysis of the clustering of LSBGs
in DES. We find that the LSBGs tend to cluster more strongly
than the HSBGs from DES, which is similar to the findings of
Tanoglidis et al. (2021b). Upon further examination, we observe
that the strong clustering tendency observed among LSBGs

primarily stems from the red LSBGs, while the behaviour of
blue LSBGs resembles that of blue HSBGs, which have weaker
clustering tendencies. Additionally, we note a decrease in the
number of red LSBGs near the centre of the galaxy clus-
ter, resulting in a flattening of the auto-correlation function on
smaller scales, which is similar to the conclusions of Wittmann
et al. (2017).

Additionally, we cross-matched the LSBGs with the X-ray-
selected galaxy cluster catalogue from the ROSAT All-Sky
Survey (RXGCC; Xu et al. 2022) to find LSBGs associated
with the clusters. Using the redshift information of the clusters,
we identify 317 UDGs, among which 276 are reported for the
first time. We also observe a colour gradient among the cluster
LSBGs, where LSBGs located towards the outskirts of clusters
exhibit a bluer colour compared to those at the centre, which
is similar to findings of Junais et al. (2022) for the Virgo clus-
ter LSBGs. However, this trend is relatively weak for the cluster
UDGs in our study, unlike the LSBGs. A clear trend can also be
seen in the half-light radius of the cluster LSBGs and UDGs as a
function of cluster-centric distance. The LSBGs and UDGs grow
in size from the cluster centre to the outskirts. These coherent
trends in colour and size are in agreement with proposed UDG
formation mechanisms, such as galaxy harassment (Conselice
2018), tidal interactions Mancera Piña et al. (2019), and ram-
pressure stripping (Conselice et al. 2003b; Buyle et al. 2005),
giving more support to the argument that UDGs are a subset of
dwarf galaxies (Conselice 2018; Benavides et al. 2023).

The upcoming large-scale surveys, LSST and Euclid, are
expected to cover around 18 000 and 14 5000 deg2 of the sky,
respectively (Ivezic