Crypto exchange Coinbase tested Openai’s Chatgpt as a token verification tool against its standard security procedures. In more than half of the cases, the AI platform produced the same results as manual review, but also failed to recognize some high-risk assets.
Chatgpt approves 5 high risk tokens; Coinbase may use them for secondary controls
Digital asset exchange Coinbase has trialled an artificial intelligence (AI) chatbot developed by Openai to conduct automated token reviews. The U.S.-based trading platform said that while Chatgpt was not accurate enough to be immediately integrated into its asset review process, it showed sufficient potential to merit further investigation.
The experiment is part of Coinbase’s efforts to apply efficient and effective methods to review token contracts before deciding to list an asset. The exchange noted that its blockchain security team is employing an in-house automation tool developed to assist security engineers in the review of ERC20/721 smart contracts, and described this AI initiative as follows:
With the emergence of ChatGPT by OpenAI and the buzz around its ability to detect security vulnerabilities, we wanted to test how well it would work as a front-line tool applied on a large scale, rather than a one-time code review.
Coinbase explains that “Chatgpt is expected to be beneficial in increasing productivity in a wide range of development and engineering tasks. Additionally, this AI tool can be used to optimize code and identify vulnerabilities.
A major US crypto exchange conducted an experiment comparing the accuracy of a token security review conducted by Chatgpt to a standard review conducted by blockchain security engineers using internal tools. In order to produce comparable risk scores, the chatbot had to be taught how to identify risks as defined in the platform’s own security review framework.
The researchers compared the risk scores of 20 smart contracts between Chatgpt and manual security reviews; the AI tool produced the same results as manual reviews 12 times, but of the eight failures, Chatgpt failed in five of them, with high-risk assets being labeled as low-risk assets in This was a case of mislabeling.” Underestimating risk scores is far more harmful than overestimating them,” the exchange noted in a blog post.
Despite this “worst case failure” and the tool’s tendency to be inconsistent in its answers when asked the same question repeatedly,Coinbasesays the efficiency of Chatgpt’s review was remarkable. The company expects that further rapid engineering will improve the accuracy of the tool.
Currently, bots cannot rely solely on performing security reviews, Coinbase concludes. However, the company also noted that if its team is able to increase accuracy