Philipp-Lorenz Glaser

Marion Scholz

Christian Huemer

Marco Calamo

Bernhard Rumpe

Monique Snoeck

Handle: 20.500.12708/225650; DOI: 10.1109/MODELS-C68889.2025.00012; Year: 2025; Issued On: 2025-12-10; Type: Publication; Subtype: Inproceedings; Peer Reviewed:

Keywords: Model repository, Dataset, UML, Open models, Curation, Community, Education, Machine learning
Astract: Datasets of Unified Modeling Language (UML) models are becoming increasingly valuable for education, empirical research, and tool development in model-driven engineering (MDE) and conceptual modeling. In recent years, several datasets have emerged - mostly compiled through automated crawling of open platforms such as GitHub and GenMyModel. While these efforts have improved access to real-world modeling artifacts, the resulting collections often suffer from serious quality issues: they include syntactically invalid models, semantically incorrect structures, and placeholder or dummy content. Moreover, most models are not accompanied by textual domain descriptions, which are essential for understanding the intent behind the model and assessing its semantic soundness. Therefore these model datasets are far from ideal as a source for modeling exercises or empirical MDE research. This paper presents an initial step toward a community-curated golden dataset of UML models, designed to address these limitations. Our contribution includes i) a curated set of UML models, each paired with a natural language description of the modeled domain requirements, ii) a publicly accessible web platform for exploring and querying the dataset, and iii) a structured process for community-based contribution and evaluation to support sustainable growth and quality assurance of the dataset. By fostering community involvement and providing high-quality, semantically grounded models, this work lays the foundation for a widely accepted benchmark dataset in UML-based research and education.


  Verbruggen, C. R. R., Netz, L., Glaser, P.-L., Scholz, M., Huemer, C., Calamo, M., Rumpe, B., Snoeck, M., & Bork, D. (2025). Toward a Community-Curated Golden Dataset of UML Models. In 2025 ACM/IEEE 28th International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C) (pp. 43–50). IEEE. https://doi.org/10.1109/MODELS-C68889.2025.00012

The extended EA ModelSet—a FAIR dataset for researching and reasoning enterprise architecture modeling practices

Emanuel Sallinger

Handle: 20.500.12708/221005; DOI: 10.1007/s10270-025-01278-1; Year: 2025; Issued On: 2025-02-26; Type: Publication; Subtype: Article; Peer Reviewed:

Keywords: ArchiMate, Artificial intelligence, Conceptual modeling, Dataset, Enterprise architecture, Enterprise modeling, FAIR, Machine learning
Astract: Conceptual modeling research is increasingly investigating the application of artificial intelligence (AI) and machine learning (ML) to automate tasks like model creation, completion, analysis, and processing. This trend also applies to enterprise architecture (EA) research. In contrast to its neighboring disciplines, such as business process management, EA lacks proper guidelines, patterns, and best practices to create high-quality EA models. A currently limiting factor for conducting AI-based research to bridge these gaps is the scarcity of openly available models of adequate quality and quantity. With this paper, our aim is to address this limitation by introducing the extended EA ModelSet, a curated and FAIR repository of enterprise architecture models represented in the ArchiMate modeling language that can be used by the research and practitioner community. We report on our efforts to build the EA ModelSet and elaborate on exemplary future empirical and ML-based research that can facilitate the dataset. We hope that this paper sparks a community effort toward the further development and maintenance of the EA ModelSet.


  Glaser, P.-L., Sallinger, E., & Bork, D. (2025). The extended EA ModelSet—a FAIR dataset for researching and reasoning enterprise architecture modeling practices. Software and Systems Modeling, Article 111431. https://doi.org/10.1007/s10270-025-01278-1

Encoding semantic information in conceptual models for machine learning applications

View PDF View .bib

Handle: 20.500.12708/216680; DOI: 10.34726/hss.2025.119285; Year: 2025; Issued On: 2025-01-01; Type: Thesis; Subtype: Diploma Thesis;

Keywords: conceptual modeling, encoding, machine learning
Astract: The integration of Conceptual Modeling (CM) and Machine Learning (ML) has given rise to a growing research field known as Machine Learning for Conceptual Modeling (ML4CM), where ML techniques are applied to support modeling tasks such as classifica-tion, completion, or repair. A crucial factor in these applications is the transformation of conceptual models into ML-compatible representations, called encodings. A wide variety of encoding strategies exist that draw on different information sources within conceptual models, depending on the specific use case. However, existing ML4CM studies tend to treat encodings as fixed and focus predominantly on tuning ML algorithms or hyperparameters. Consequently, encoding strategies and their internal configuration options receive limited scrutiny during evaluation, making it difficult for researchers and practitioners to select and adapt optimal encodings for specific tasks.This thesis addresses this gap by developing and evaluating a set of configurable semantic encodings for conceptual models. Specifically, it investigates how semantic information (e.g. names, types, contextualrelationships) within models can be systematically extracted and transformed into ML-compatible representations. The work adopts the Design Science Research methodology and extends the CM2ML framework with an ArchiMate parser and four semantic encoders: Bag-of-Words (BoW), Term Frequency (TF), Embeddings,and Triples. Each encoder captures distinct semantic aspects and supports extensive configurability to enable experimentation and task-specific adaptation. Furthermore, all encodings can be interactively visualized within the framework, offering real-time insight into parameter effects and traceability to link encoded features back to their source model elements.To evaluate the proposed encodings, the thesis combines a qualitative comparison based on defined criteria with a quantitative assessment through two representative ML tasks.The first task, dummy classification, employs TF encodings to distinguish dummy views from valid ones and explores the impact of common NLP parameters and weighting schemes. The second task, node classification, aims to predict element types based on local context, using triple encodings enriched with word embeddings for element names and one-hot vectors for types. The results demonstrate the suitability of the encodings for specific ML4CM tasks and that certain encoding configurations can have a substantial influence on model performance.


  Glaser, P.-L. (2025). Encoding semantic information in conceptual models for machine learning applications [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.119285

EA ModelSet – A FAIR Dataset for Machine Learning in Enterprise Modeling

Emanuel Sallinger

Handle: 20.500.12708/191926; DOI: 10.1007/978-3-031-48583-1_2; Year: 2023; Issued On: 2023-11-25; Type: Publication; Subtype: Inproceedings;

Keywords: Data set, Enterprise architecture, Enterprise modeling, FAIR, Machine learning
Astract: The conceptual modeling community and its subdivisions of enterprise modeling are increasingly investigating the potentials of applying artificial intelligence, in particular machine learning (ML), to tasks like model creation, model analysis, and model processing. A prerequisite—and currently a limiting factor for the community—to conduct research involving ML is the scarcity of openly available models of adequate quality and quantity. With the paper at hand, we aim to tackle this limitation by introducing an EA ModelSet, i.e., a curated and FAIR repository of enterprise architecture models that can be used by the community. We report on our efforts in building this data set and elaborate on the possibilities of conducting ML-based modeling research with it. We hope this paper sparks a community effort toward the development of a FAIR, large model set that enables ML research with conceptual models.


  Glaser, P.-L., Sallinger, E., & Bork, D. (2023). EA ModelSet – A FAIR Dataset for Machine Learning in Enterprise Modeling. In J. P. A. Almeida, M. Kaczmarek-Heß, A. Koschmider, & H. Proper (Eds.), The Practice of Enterprise Modeling : 16th IFIP Working Conference, PoEM 2023, Vienna, Austria, November 28 – December 1, 2023, Proceedings (pp. 19–36). Springer. https://doi.org/10.1007/978-3-031-48583-1_2

Exploring Enterprise Architecture Knowledge Graphs in Archi: The EAKG Toolkit

Syed Juned Ali

Emanuel Sallinger