The studies use a dataset of 16 real-world business problems across four optimisation classes: Linear Programming (LP), Integer Programming (IP), Mixed-Integer Programming (MIP), and Nonlinear Programming (NLP). The research compares two pipelines—single-step prompting and Chain-of-Thought prompting—where the large language model (LLM) generates Pyomo solver code to find optimal solutions. Although the LLMs show an ability to model and translate problems into solver code, significant issues were identified in formulating accurate constraints and producing consistent results. These challenges must be addressed to ensure reliable use in future optimisation tasks. Nevertheless, Gemini 1.5 Pro delivered the best performance, followed by Claude 3 Opus, then Mistral 8x22B, and finally GPT-4.