Introduction
Materials and Methods
Plant materials and experimental design
Trait investigation and data collection
Distance matrix analysis
Clustering analysis
Cluster-wise trait comparision
Correlation and geographical distribution analysis
Results
Cluster composition of shepherd’s purse accessions
Variation in morphological traits among distinct clusters
Cluster-specific trait profiles and correlation patterns
Cluster-wise regional distribution and LL diversity analysis
Discussion
Introduction
Capsella bursa-pastoris (L.), commonly known as shepherd’s purse, is an annual or biennial herbaceous species belonging to the family Brassicaceae, characterized by its cosmopolitan distribution across diverse ecological habitats (Barbara Neuffer et al., 2014; Kryvokhyzha et al., 2019). In East Asia, it has long been valued as both a food and medicinal resource. The species primarily reproduces through self-fertilization and exhibits exceptional adaptability to variable environmental conditions, resulting in pronounced morphological variations across geographic regions (Choi et al., 2014). These attributes make shepherd's purse not only an ideal model for studying ecological diversification but also a promising genetic resource for functional crop development (Bachmann et al., 2021).
Effective conservation and utilization of plant genetic resources require a comprehensive understanding of morphological variation within germplasm collections, as well as clear delineation of clustering structures among accessions (Han et al., 2015; Lee et al., 2022). However, previous research on shepherd's purse has mainly focused on localized studies, often limited to leaf morphology or genomic diversity analyses. Integrated assessments that encompass both aerial (shoot) and subterranean (root) morphological traits to evaluate overall phenotypic diversity remain scarce (Ahmed et al., 2022; Song et al., 2022).
Aerial traits, including photosynthetic efficiency, early growth rate, and visual characteristics, are critical indicators of physiological vigor and environmental adaptability (Hendrik Poorte and Nagel, 2000). In shepherd's purse, the growing period (from sowing to bolting) and leaf coloration (RGB values) serve as key determinants reflecting adaptation strategies and growth performance (Lee et al., 2024). Conversely, root traits are directly related to resource acquisition efficiency, influencing both nutrient and water uptake capacities (Villordon et al., 2014). Parameters such as root length, surface area, diameter, and fork number provide valuable insights into plant growth strategies and ecological adaptability. Despite their significance, comprehensive studies integrating both shoot and root morphological traits to elucidate trait-based clustering structures within shepherd's purse germplasm are notably limited.
In this study, a phenotypic evaluation was conducted on C. bursa-pastoris accessions collected from various regions of Korea and abroad. Quantitative assessments encompassed a wide range of shoot and root morphological traits. The clustering structure among accessions was elucidated using Factor Analysis of Mixed Data (FAMD), an analytical approach suitable for datasets comprising both quantitative and qualitative variables. Key trait profiles were compared across the identified clusters, and intra-cluster correlation patterns were visualized to highlight cluster-specific associations. Furthermore, regional distribution characteristics were analyzed using mosaic plots, while intra-cluster variation in leaf lobed morphology was quantified using Shannon’s diversity index, providing a detailed assessment of phenotypic variability within clusters.
Overall, this study establishes a systematic framework for understanding phenotypic diversity and clustering structures in shepherd’s purse by integrating shoot and root morphological data. The findings provide foundational insights for the conservation, selection, and functional utilization of shepherd’s purse genetic resources, and will inform future studies on adaptability assessment and resource development.
Materials and Methods
Plant materials and experimental design
A total of 35 Capsella bursa-pastoris (shepherd’s purse) accessions were used in this study, collected from six regions across Korea (Gyeonggi-do, Gangwon-do, Chungcheong-do, Jeolla-do, Gyeongsang-do, and Jeju-do). In addition to these domestic accessions, four exotic accessions originating from China, Canada, and Russia were included to provide a broader comparative framework for evaluating phenotypic diversity across distinct ecological and geographic backgrounds. All accessions were obtained from the National Agrobiodiversity Center (NAC), Rural Development Administration (Jeonju, South Korea). Seeds were sown in August 2024 under controlled temperature conditions (15℃) at NAC. All accessions were cultivated in three replications under identical growth conditions.
Trait investigation and data collection
The bolting date (BD, days) was recorded as the number of days from transplanting to the onset of bolting. Seeds were initially sown in 50-hole trays, and seedlings with 3–4 true leaves were individually transplanted into pots. Concurrently with BD recording, leaf and root samples were collected for phenotypic measurements. Leaf color (RGB values) was quantified by photographing fully expanded leaves under standardized focal length and lighting conditions using a digital camera, followed by image analysis with ImageJ software (ver. 1.53t). The degree of leaf-lobed (LL) was categorized into six classes (A–F) (Fig. 1) based on the depth of lobing relative to the main leaf vein. This six-grade classification system represented the full morphological spectrum observed among all accessions, ranging from entire leaves without lobing to deeply lobed leaves extending close to the main vein. Because the irregular leaf margins of shepherd’s purse can introduce subjectivity when evaluated on a continuous numerical scale, a categorical classification scheme was adopted to ensure consistency and reproducibility across replications.

Fig. 1.
Representative variation in leaf-lobed (LL) morphology in Capsella bursa-pastoris (shepherd’s purse). Panels A–F show a gradual increase in lobing depth, with the incisions extending progressively closer to the main vein. These morphological differences were used as qualitative criteria for LL trait evaluation and diversity assessment. Scale bar = 1 ㎝.
Root morphological traits were measured using the WinRHIZO image analysis system (Regent Instruments Inc., Canada). The following parameters were assessed: root length (RL, pixels), root surface area (RA, pixels2), average root diameter (RD, pixels/10), root volume (RV, pixels3), number of root tips (RT), number of root forks (RF), number of root links (RNL), and root projected area (RPA, pixels2), which represents the two-dimensional surface area of vertically projected roots. Additionally, root length per unit volume (RLPV, pixels/m3) and average root link length (RLL, pixels) were calculated to evaluate root architectural complexity. All measurements were performed in triplicate, and mean values were normalized prior to statistical analysis.
Distance matrix analysis
To assess inter-accession similarities, a comprehensive distance matrix was constructed by integrating continuous traits (BD, RGB, root-related traits) with the qualitative trait (LL). Gower’s distance was employed to calculate pairwise dissimilarities, applying scaled Euclidean distance for continuous variables and binary distance for categorical variables. Distance matrix computation was performed using the daisy function from the cluster package in R.
Clustering analysis
Non-hierarchical clustering was performed using the Partitioning Around Medoids (PAM) algorithm based on the Gower distance matrix. The optimal number of clusters (K) was determined by evaluating average silhouette width values across a range of K values (3 to 10), to ensure robust cluster separation. Subsequently, cluster-specific trait profiles were compared and visualized to identify distinguishing characteristics among groups. All clustering procedures and visualizations were conducted using the cluster, daisy, and factoextra packages in R.
Cluster-wise trait comparision
To evaluate trait differences among clusters, analysis of variance (ANOVA) was performed for all continuous variables. For traits showing significant differences (p &5), Duncan’s multiple range test (DMRT) was subsequently applied to determine specific group-wise mean separations. Both ANOVA and DMRT analyses were conducted using the agricolae package in R. For the qualitative trait (LL), intra-cluster diversity was quantified using Shannon’s diversity index (H), calculated via the vegan package in R to assess the extent of morphological variation within clusters.
Correlation and geographical distribution analysis
Pearson's correlation coefficients were computed to examine inter-trait relationships within each cluster. The resulting correlation matrices were visualized using the corrplot package in R. To investigate the distribution patterns of accessions across clusters and their corresponding collection regions, mosaic plots were generated. Statistical differences in distribution ratios between clusters and regions were evaluated using the vcd package in R.
Results
Cluster composition of shepherd’s purse accessions
Comprehensive FAMD was performed on 13 phenotypic traits (RGB, BD, LL, RL, RA, RD, RV, RT, RF, RNL, RPA, RLPV, and RLL) to evaluate clustering patterns (Table 1 and Supplementary Fig. 1) and distribution characteristics among accessions (Fig. 2). This analysis partitioned the entire germplasm collection into six distinct clusters (Cluster 1 to Cluster 6), showing clear spatial separation along Dim1 (14.9%) and Dim2 (6.0%).
Table 1.
Geographic origin and phenotypic cluster assignment of shepherd’s purse accessions.

Fig. 2.
Factorial analysis of mixed data (FAMD) plot of 35 shepherd’s purse (Capsella bursa-pastoris) based on 13 phenotypic traits. The traits include RGB, BD (bolting date), LL (leaf-lobed), RL (root length), RA (root area), RD (root diameter), RV (root volume), RT (number of root tips), RF (number of root forks), RNL (number of root links), RPA (root projected area), RLPV (root length per volume), and RLL (root link length). The first two dimensions (Dim1 and Dim2) explain 14.9% and 6.0% of the total variation, respectively. Six distinct clusters (C1–C6) were shown, each represented by different color and 95% confidence ellipses. Individual accessions are labeled with their codes and plotted according to their coordinates on Dim1 and Dim2. The spatial distribution of clusters highlights phenotypic differentiation among accessions.
Cluster 1 (C1) comprised accessions K281348, K281363, K255384, and K281376, which were tightly grouped along the positive axis of Dim1 and near the origin of Dim2, indicating strong internal cohesion. Cluster 2 (C2), including K281349, K281361, and K281375, was positioned slightly along the positive axis of Dim1 and centrally on Dim2, with accessions densely concentrated within a narrow spatial range. Cluster 3 (C3), consisting of K281350, K281351, and K281367, was mainly distributed on the negative axis of Dim1, clustering around the central region. Cluster 4 (C4), formed by K281353, K281371, K281377, and K909545, displayed a broader spatial distribution across the positive axis of Dim1 and the negative axis of Dim2, suggesting greater phenotypic variability within the group. Cluster 5 (C5), composed of K281354, K281355, and K281374, was distinctly separated from the other clusters, occupying the uppermost region of Dim2 with a compact and cohesive structure. Notably, K281354 and K281374 were positioned at the apex of Dim2, clearly delineated from all other clusters. In contrast, Cluster 6 (C6), comprising K281364, K281365, K281369, K281370, K281372, K223043, and K251370, was widely dispersed along the negative axis of Dim1. Accessions K281364 and K223043 were located at the extreme negative end of Dim1, exhibiting distinct phenotypic characteristics that strongly differentiated them from all other clusters.
Variation in morphological traits among distinct clusters
Significant differences in morphological traits were observed among the identified clusters (Fig. 3). BD was shortest in C4, whereas C3 exhibited a comparatively longer duration. RGB values were lowest in C4 but highest in C3, illustrating a clear phenotypic contrast between these two groups. RL remained consistent across C1–C4; however, accessions in C5 and C6 displayed markedly shorter root lengths, resulting in distinct group differentiation. RA was highest in C1 and lowest in C6, emphasizing substantial variation across clusters. RD exhibited the greatest thickness in C5, in contrast to C3, which had the thinnest roots, highlighting significant inter-cluster differences in root morphology. RV was notably larger in C1 and C5, whereas C6 recorded the smallest root volume among all clusters. Both RT and RPA reached their maximum values in C1, while C6 exhibited significantly reduced measurements, underscoring pronounced phenotypic separation. RLPV was consistently lower in C5 and C6 compared with other clusters. RF and RNL were most abundant in C4, whereas C6 displayed the lowest counts, reflecting strong cluster-specific variation. Conversely, RLL was longest in C6 and shortest in C1, indicating divergent patterns in root architectural traits among clusters.

Fig. 3.
Boxplots showing the distribution of 12 morphological traits across six phenotypic clusters (C1–C6) identified by FAMD. Traits include bolting date (BD), red-green-blue value (RGB), root length (RL), root surface area (RA), root diameter (RD), root volume (RV), number of root tips (RT), root projected area (RPA), root length per volume (RLPV), number of root forks (RF), number of root links (RNL), and average root link length (RLL). Different letters above boxes indicate statistically significant differences among clusters based on Duncan's Multiple Range Test (DMRT) at p &5.
Cluster-specific trait profiles and correlation patterns
Cluster-specific trait profiles and intra-cluster correlation patterns were visualized using radar charts and Pearson’s correlation matrices (Fig. 4). C1 (Fig. 4A) exhibited notably high values for root size-related traits, including RL, RA, and RPA. Structural attributes such as RF and RNL were also elevated within this group. Correlation analysis revealed strong positive associations among RL, RA, RT, and RPA (r ≥ 0.44), with an exceptionally high correlation observed between RF and RNL (r = 0.98). In contrast, RGB and RLL exhibited weak correlations with RA and RPA.

Fig. 4.
Radar plots and Pearson’s correlation matrices of 12 phenotypic traits across six clusters of shepherd's purse accessions. Each radar plot depicts the standardized mean values of root- and shoot-related traits within a cluster. The traits include RGB, BD (bolting date), RL (root length), RA (root area), RD (root diameter), RV (root volume), RT (number of root tips), RPA (root projected area), RLPV (root length per volume), RF (number of forks), RNL (number of root links), and RLL (root length of links). The corresponding correlation matrices present pairwise Pearson’s correlation coefficients among traits within each cluster, with circle size and color intensity representing the strength and direction of the correlation (red = positive, blue = negative). Panels: A, Cluster 1; B, Cluster 2; C, Cluster 3; D, Cluster 4; E, Cluster 5; F, Cluster 6.
C2 (Fig. 4B) displayed a balanced distribution of trait values, generally clustering around the overall mean. However, RV, RPV, and RLL were slightly elevated compared with relative to other traits. RV showed strong positive correlations with both RPV and RLL (r ≥ 0.67), while RPV also exhibited high positive associations with RA and RGB (r ≥ 0.82). Conversely, RT and RD were negatively correlated with RPV (r ≤ -0.64). RLL was positively correlated with RL, RLPV, RA, and RPA (r ≥ 0.67), but negatively associated with RT, RD, and BD (r ≤ -0.45). BD generally exhibited weak correlations with other traits within this cluster.
C3 (Fig. 4C) was characterized by elevated BD and RGB values. RGB displayed positive correlations with RA and RPA (r ≥ 0.51), whereas negative correlations were observed between RGB and traits such as RF, RNL, RT, RD, RL, and RLPV (r ≤ -0.30). No significant correlation was detected between RGB and RLL in this group.
C4 (Fig. 4D) exhibited higher-than-average values for RL, RA, RPA, RLPV, RF, and RNL, whereas BD, RGB, RT, RD, and RLL were comparatively lower. Correlation analysis indicated strong positive associations between RPA and RA, RD, and RV (r ≥ 0.86), while RL showed a weak negative correlation with RPA (r = -0.31). No significant correlations were identified between RPA and either RT or RGB.
C5 (Fig. 4E) was distinguished by elevated RGB, RD, and RV values. RGB was positively correlated with multiple traits, including RLL, BD, RL, RLPV, RPA, RA, RF, and RV (r ≥ 0.32). However, a strong negative correlation was observed between RGB and RNL (r = -0.96), indicating an inverse relationship between these characteristics.
C6 (Fig. 4F) exhibited generally lower values across most traits, except for BD and RLL, which were relatively elevated. RLL displayed negative correlations with RNL, RT, RF, RD, RV, and RGB (r ≤ -0.39), while no significant associations were observed between RLL and RLPV, RPA, RA, RL, or BD within this cluster.
Cluster-wise regional distribution and LL diversity analysis
Fig. 5A illustrates the proportional composition and regional distribution patterns of accessions within each cluster, visualized using a mosaic plot. C1 and C2 comprised accessions from multiple regions, including JL, GW, GS, and GG. In C1, accessions from GG constituted the largest proportion, whereas JL accessions predominated in C2. C3 was composed exclusively of JL accessions, identifying it as a region- specific group. C4 incorporated accessions evenly distributed from CC, Exotic, GW, and JJ, reflecting a balanced regional composition. C5 included a mixture of JL, GW, and CC accessions, each contributing comparably to the cluster’s composition. In contrast, C6 was primarily composed of accessions from GW, GS, Exotic, and CC, with Exotic accessions representing the highest proportion, indicating a strong influence of foreign germplasm in this cluster.

Fig. 5.
Regional origin distribution and morphological diversity across phenotypic clusters of shepherd’s purse. (A) Mosaic plot illustrating the proportional regional composition of shepherd's purse accessions across six phenotypic clusters (C1–C6). Accessions originated from eight geographic regions: CC (Chungcheong-do), Exotic (non-Korean sources including Canada, China, and Russia), GG (Gyeonggi-do), GS (Gyeongsang-do), GW (Gangwon-do), JJ (Jeju-do), and JL (Jeolla-do). Each cluster displays a distinct pattern of regional composition, ranging from region-specific to geographically diverse groupings. (B) Shannon’s diversity index (H) representing intra-cluster variation in the qualitative leaf-lobed (LL) trait. Higher H values indicate greater morphological diversity within a cluster, whereas lower values suggest the dominance of a single LL type.
Fig. 5B presents intra-cluster LL diversity, quantified using Shannon’s diversity index (H). C6 exhibited the highest LL diversity (H = 1.061), followed by C4 (H = 0.950) and C1 (H = 0.796), both demonstrating relatively high levels of morphological variation. C2 (H = 0.377) and C3 (H = 0.637) displayed moderate LL diversity. Notably, C5 recorded an H value of 0.000, indicating the complete absence of LL variation within the cluster and defining it as a monomorphic group characterized by a single LL type.
Discussion
In this study, morphological diversity and phenotypic similarity among shepherd’s purse accessions were comprehensively evaluated by integrating 13 phenotypic traits (RGB, BD, LL, RL, RA, RD, RV, RT, RF, RNL, RPA, RLPV, and RLL). The analysis aimed to delineate cluster-specific characteristics, regional distribution patterns, and inter-trait relationships. FAMD results partitioned the entire germplasm collection into six distinct clusters (C1–C6), exhibiting clear spatial differentiation along Dim1 (14.9%) and Dim2 (6.0%). Notably, C5 and C6 were distinctly separated from the other clusters, suggesting that these groups possess unique phenotypic profiles shaped by specific trait combinations or regional attributes (Agnieszka et al., 2022; Fernandez et al., 2022).
Analysis of trait variations among clusters revealed that root size-related traits (RL, RA, RPA, RV) were most pronounced in C1, whereas C6 consistently exhibited lower values. Although C5 also displayed reduced RL, it was characterized by a thicker RD, indicating the presence of unique trait combinations specific to this cluster. Structural root traits, such as RF and RNL, were highest in C4 and lowest in C6, underscoring the key role of root architecture in differentiating clusters. Additionally, RLL was longest in C6 and shortest in C1, reflecting pronounced differences in root length and structural complexity across clusters (Lynch, 2011; Funk et al., 2024). Notably, C1 and C6 occupied opposite extremes along the primary dimension of the FAMD plot, representing two contrasting growth syndromes. C1 exhibited a vigorous and highly branched root system with large root dimensions, characteristic of a fast-growing, resource-acquisitive strategy. In contrast, C6 displayed overall reduced root development, except for relatively long RLL and delayed BD–traits indicative of a conservative, stress-tolerant growth type. This divergence likely reflects variation in ecological strategy within the species rather than data discontinuity and may be linked to differences in genetic background and environmental adaptation, as C6 contained the highest proportion of exotic accessions (China and Canada). Collectively, these results suggest the presence of regionally adapted phenotypic strategies in Capsella bursa-pastoris, supporting ecological differentiation within the species.
Radar chart visualizations and correlation analyses further elucidated intra-cluster trait associations. Root size and volume-related traits (RL, RA, RPA, RV) consistently exhibited strong positive correlations across most clusters. Likewise, RF and RNL maintained robust mutual associations, suggesting coordinated structural development within root systems. In contrast, RGB, RLL, and BD showed cluster- specific correlation patterns. For instance, while both C3 and C5 exhibited elevated RGB values relative to the overall mean, RGB and RLL were uncorrelated in C3 but showed a strong positive correlation (r = 0.91) in C5. These findings indicate that inter-trait associations in shepherd’s purse are not uniform but are modulated by unique combinations of traits within each cluster (Volis et al., 2004; Lynch, 2011).
Regional distribution analysis revealed that C1 and C2 were predominantly composed of domestic accessions from GG, JL, GS, and GW, whereas C6 contained a high proportion of exotic accessions, indicating a substantial influence of foreign germplasm within this cluster. Although the inclusion of both domestic and exotic accessions broadened the phenotypic spectrum of this study, the overall ratio between the two groups was uneven, with domestic accessions forming the majority. This imbalance may have partially influenced the clustering outcome, particularly the distinct grouping of exotic accessions within C6. Therefore, the interpretation of cluster-specific differences should be considered in light of this sampling structure.
C3 consisted exclusively of JL accessions, forming a region-specific group, whereas other clusters represented a balanced mixture of accessions from multiple regions. These results suggest that although geographic origin partially influenced cluster formation, certain clusters were primarily defined by phenotypic characteristics independent of regional provenance (Frankham, 2010; Gharehaghaji et al., 2017).
Cluster-wise diversity in LL, quantified using Shannon’s diversity index, revealed that C6 exhibited the highest LL diversity (H = 1.061), with clusters containing exotic accessions generally displaying greater variability. Conversely, C3, composed solely of JL accessions, exhibited limited LL diversity, while C5 showed no variation (H = 0.000), characterizing it as a monomorphic group. These findings indicate that LL diversity is shaped not only by geographic origin and genetic background but also by complex interactions with other morphological traits within clusters (Sultan, 2000; Nicotra et al., 2010).
Collectively, this study systematically delineated morphological diversity and cluster-specific characteristics within the Capsella bursa-pastoris germplasm, providing valuable insights into accession-level similarities and distinctions. Root size and structural traits emerged as key determinants of cluster differentiation, whereas LL and RGB displayed variable combinatorial patterns depending on cluster-specific profiles. These observations underscore the significance of distinct morphological traits as indicators of both genetic and environmental diversity within C. bursa-pastoris germplasm and highlight their potential utility for future resource classification, selection, and breeding strategies.


