Back to Articles
Treasury Trials Microsoft's AI with Mixed Results

The Canberra Times

SKIPPED

Details

Date Published
23 Feb 2025
Priority Score
2
Australian
Yes
Created
10 Mar 2025, 10:27 pm

Authors (1)

Description

Treasury staff finds Microsoft's Copilot AI tool offers limited efficiency improvements, as a trial reports mixed feedback from users.

Summary

The article evaluates the Australian Treasury's 14-week trial of Microsoft's Copilot AI tool, revealing mixed results. While some improvements were noted in routine tasks like document summarization, the AI's effectiveness in more complex tasks was limited. Participants found Copilot less beneficial compared to expectations, citing issues with accuracy and scope of functionality. The tool's inability to access wider data beyond internal systems and the time required for learning prompt engineering contributed to its limited impact. The trial underscores the need for clearer applications and adequate training to harness AI tools' potential in government sectors.

Body

Treasury staff are less than enthused about using Microsoft's Copilot AI tool after a 14-week trial of the software last year. A review of the trial, conducted by the Australian Centre for Evaluation, found a mixed response to the tool, with some improvements to basic tasks such as summarising meetings and documents but little improvement in more complex directives. Staff were initially optimistic about the impact the tool would have on their workload, but as the trial continued, fewer reported satisfaction with the AI assistant. The evaluation adds to a number of reviews of the Copilot trial, which has had mixed feedback after the six-month trial, sourced through Microsoft for an undisclosed price. The Treasury evaluation drew on surveys and focus groups with the 218 Treasury staff who participated in the Copilot trial and found that prior to its beginning, expectations were high. Two-thirds of participants thought Copilot could help with some of their tasks, with 15 per cent estimating Copilot could help with most of their work. However, these lofty expectations were not met, with more than half of the participants finding Copilot useful for little to none of their workload. "Participants had high expectations of the product which were not met," the evaluation finds. "Overall usage of the product during the trial period was lower than expected." Staff gave a range of different reasons for the less-than-expected usefulness of Copilot, including that the tool was less effective than other AI tools, such as ChatGPT. Other staff found the outputs from Copilot were wrong or would change over time, turning staff off the AI assistant. "[Copilot] often created fictional information when asking it to generate output," one participant said. "There seemed to be obvious errors which reduced my confidence in using Copilot," another said. Technical limitations also reduced the tool's usefulness. The Treasury version of Copilot was only able to access files stored on Treasury systems, rather than accessing the entire web, and could not be used across multiple Microsoft applications and between Microsoft documents and other formats such as PDFs. Other trial participants expressed that the amount of time learning how to direct the tool efficiently, known as prompt engineering, meant that any time savings were elusive. "By the time I got through working out how I could save time, I had run out of time to actually do the work," one staffer told the review. Where Copilot was effective included summarising documents so staff could focus on evaluating and analysis, as well as provisioning summaries of meetings. "One respondent found the meeting summary function of Copilot invaluable for identifying the main points in long meetings when they lost focus," the evaluation found. But while participants perceived that Copilot could speed up their work, this benefit was lost on their managers, with 59 per cent of managers recording that Copilot had "no impact" on their staff's efficiency and 80 per cent saying Copilot had "no impact" on their staff's timeliness. "Qualitative data suggests that although Copilot can achieve work outcomes more quickly, the work outcomes are not necessarily better compared to human-generated work outputs," the report highlights. The evaluation authors estimate to offset the costs of the licence, an APS6 staffer would need to use the tool to shift about 13 minutes from low-value to high-value tasks per week. "Although the data collected during this trial did not quantify time savings, the results of this trial suggest that the productivity benefits and time savings associated with Copilot are likely to offset the licence costs." Following the APS-wide trial, a number of departments have continued to purchase Copilot licences, and the report recommends that in future, its use should have clear use cases and be supported with training and education for staff. While AI boosters contend the technology could lift Australia's economic productivity by the hundreds of billions, the effects are not quite at that level yet as one trial participant noted. "It has neither been revolutionary nor has it made a negative impact."