|
Solving Econometric Problems Using Generative Artificial Intelligence Models: A Comparative Analysis of ChatGPT and Gemini Sosnovska Y. R., Skitsko V. I.
Sosnovska, Yevheniia R., and Skitsko, Volodymyr I. (2025) “Solving Econometric Problems Using Generative Artificial Intelligence Models: A Comparative Analysis of ChatGPT and Gemini.” The Problems of Economy 4:428–442. https://doi.org/10.32983/2222-0712-2025-4-428-442
Section: Mathematical methods and models in economy
Article is written in UkrainianDownloads/views: 0 | Download article in pdf format -  |
UDC 004.8:519.86
Abstract: The article presents a comprehensive study and comparative analysis of the capabilities of modern generative artificial intelligence models in the context of their application for solving practical tasks in econometric modeling. The study focuses on models of various architectural types: the «advanced» versions with enhanced reasoning capabilities – Google Gemini 2.5 Pro and ChatGPT-5 Thinking + Study, as well as their optimized «light» versions – Google Gemini 2.5 Flash and the basic ChatGPT-5 model. The empirical basis of the study was built using real data from the Ukrainian residential real estate market, specifically a representative sample of 100 properties, including both quantitative and qualitative variables. The experimental methodology involved the sequential execution of the full cycle of econometric research: preliminary data processing, exploratory analysis and visualization, construction of a multifactor linear regression model, diagnostics for multicollinearity and heteroscedasticity, calculation of elasticity indicators for economic interpretation, as well as testing the predictive capabilities of the model on a test sample. The verification of results obtained using generative artificial intelligence models was carried out by comparing them with benchmark calculations manually performed in the MS Excel environment. The results of the experiment revealed a significant difference in the performance of the examined models. It was found that Pro/Thinking class models (Gemini 2.5 Pro, ChatGPT-5 Thinking) demonstrate absolute mathematical accuracy, correctly calculating regression coefficients, the coefficient of determination, the F-statistic, and indicators of average and marginal efficiency. In contrast, the basic and «light» versions of the models (Gemini 2.5 Flash, ChatGPT-5) showed a tendency toward critical errors, including hallucinations in the form of generating fictitious data, loss of context when processing large datasets, and an inability to independently validate input information. A common weakness was also identified across all tested models in tasks requiring qualitative classification of heteroskedasticity types, as well as a tendency to ignore macro indicators in favor of microanalysis of individual variables. Based on the obtained data, it was concluded that at the current stage of development, generative artificial intelligence cannot fully replace humans; however, «advanced» models can be effectively used as an auxiliary tool for automating routine operations, writing code, and preliminary data processing, provided that the results are verified by a specialist.
Keywords: generative artificial intelligence, large language model, ChatGPT-5, Google Gemini 2.5, econometric modeling, data analysis, real estate market.
Fig.: 19. Tabl.: 5. Bibl.: 18.
Sosnovska Yevheniia R. – Student, Kyiv National Economic University named after Vadym Hetman (54/1 Beresteiskyi Ave., Kyiv, 03057, Ukraine) Email: sosnovska.2310937546@kneu.edu.ua Skitsko Volodymyr I. – Candidate of Sciences (Economics), Associate Professor, Associate Professor, Department of Artificial Intelligence, Modeling and Statistics, Kyiv National Economic University named after Vadym Hetman (54/1 Beresteiskyi Ave., Kyiv, 03057, Ukraine) Email: skitsko@kneu.edu.ua
List of references in article
Bigeard A., Nashold L., Krishnan R. & Wu S. (2025). Finance Agent Benchmark: Benchmarking LLMs on Real-world Financial Research Tasks. https://arxiv.org/pdf/2508.00828
Dell’Acqua F. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper. https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf
Eloundou T., Manning S., Mishkin P. & Rock D. (2023). GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models. https://arxiv.org/pdf/2303.10130
Google. Gemini. Google. https://gemini.google.com/
Huang K. A., Choudhary H. K., Hardin W. M. & Prakash N. S. (2025). Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams. Cureus, 3(17), Art. e80874. https://doi.org/10.7759/cureus.80874
Katz D. M., Bommarito M. J., Gao S. & Arredondo P. (2024). GPT-4 passes the bar exam. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2270(382), Art. 20230254. https://doi.org/10.1098/rsta.2023.0254
Kojima T., Gu S. S., Reid M., Matsuo Y. & Iwasawa Y. (2022). Large Language Models are Zero-Shot Reasoners. NIPS’22: Proceedings of the 36th International Conference on Neural Information Processing Systems, Article No. 1613, 22199–22213. https://proceedings.neurips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html
Ludwig J., Mullainathan S. & Rambachan A. (2024). Large Language Models: An Applied Econometric Framework. https://arxiv.org/pdf/2412.07031
LUN [LUN] https://lun.ua/
Modi A. (2025). Evaluating Gemini in an Arena for Learning. https://arxiv.org/pdf/2505.24477
OpenAI. ChatGPT. OpenAI. https://chatgpt.com/
Perry N., Srivastava M., Kumar D. & Boneh D. (2023). Do Users Write More Insecure Code with AI Assistants?. CCS’23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2785–2799. https://doi.org/10.1145/3576915.3623157
Saab K. (2024). Capabilities of Gemini Models in Medicine. https://arxiv.org/pdf/2404.18416
Shetty P., Upadhayaya A. & Shah P. M. (2025). Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III. https://arxiv.org/pdf/2507.02954
Singhal K., Azizi S. & Tu T. (2023). Large Language Models Encode Clinical Knowledge. Nature, 7972(620), 172–180. https://doi.org/10.1038/s41586-023-06291-2
Skitsko V. I. (2023). Analiz danykh iz vykorystanniam heneratyvnoho shtuchnoho intelektu: mozhlyvosti ta vyklyky [Data analysis using generative artificial intelligence: opportunities and challenges]. Problemy ekonomiky, 4, 217–225. https://doi.org/10.32983/2222-0712-2023-4-217-225
Valmeekam K., Sreedharan S., Marquez M., Olmo A. & Kambhampati S. (2023). On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). https://arxiv.org/pdf/2302.06706
Yu Y., Gan L., Hu Y., Wei B., Kuang K. & Wu F. (2025). Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond. Findings of the Association for Computational Linguistics: EMNLP 2025, 13759–13781. https://doi.org/10.18653/v1/2025.findings-emnlp.742
|