Large language models (LLMs) such as ChatGPT and Google Gemini excel at being trained on large data-sets to generate informative responses to prompts. Yi Cao, an assistant professor of accounting at the Donald G. Costello College of Business at George Mason University, and Long Chen, associate professor and area chair of accounting at Costello, are actively exploring how individual investors can use LLMs to glean market insights from the dizzying array of available data about companies.
Their new working paper, co-authored with Jennifer Wu Tucker of the University of Florida and Chi Wan of University of Massachusetts Boston, examines AI’s ability to identify “peer firms,” or product market competitors in an industry.
Cao explains the significance of selecting peers by relating this process to the real-estate market. “The capital market is similar to the real-estate market in that a firm’s value is partially determined by the value of its peers. In the real-estate market, we price a home based on the value of comparable properties in the neighborhood, or the so-called 'comps.' In our paper, we aim to leverage the power of LLMs to identify comps for evaluating firm value.”
This task is at least as difficult as it is essential. It takes much time, skill and effort to gather, aggregate and manage data to select peers. However, the researchers reasoned that LLMs could do a lot of the heavy lifting of data aggregation and analysis for the individual investors, and produce a list of peers comparable in validity to that identified by human experts.
“The advantage is in the capability to utilize all the information potentially out there so that it is at least performing as well as other traditional methods that can help us investors and researchers,” says Cao.
For the study, Chen and Cao employed Bard from Google, now known as “Gemini,” as their LLM of choice because “Bard has a greater ability to utilize its pre-training data, which is arguably larger than ChatGPT’s and with more parameters,” says Cao.
After defining “product market competition” and forming a prompt for Bard, the researchers instructed Bard to limit its knowledge pool to a specific year within the period 1981-2023, in order to avoid “look-ahead bias,” i.e., future information scrambling the results.
They limited focal firms to large, publicly listed companies as there is less data out there for smaller or private firms. In all, the data-set comprised over 300,000 focal firm-years.
On average, the LLM could generate about seven peer firms for a focal firm, a number that is similar to the SEC recommendations on how firms should disclose their segments.
The researchers then compared the LLM’s performance to the lists generated by three human experts for a set of 40 leading computer software companies. The average overlap was a little over 40 percent, greater than expected.
They also compared the AI-identified peer lists to two alternative systems for identifying peers: the federal government’s Standard Industrial Classification (SIC) codes and Text-based Network Industry Classification (TNIC), which compares firms based on linguistic similarities in their 10-K filings. The LLM’s output overlapped significantly with TNIC’s. Plus, the peers identified by the LLM were generally a better fit than those from SIC and TNIC, as their monthly stock returns hewed closer to the focal firm.
But TNIC outperformed the LLM in identifying peers for mid-sized firms within the sample, indicating that it is not a clear-cut case of universal LLM superiority.
“We need to understand that LLMs are actually a very powerful, new tool, unmatched in their efficiency, ability to process vast amounts of information at a low cost, and accessibility to the general public,” Cao notes.
“It’s especially beneficial for individual investors—as all the cost concerns that we’re talking about are especially relevant for them,” Chen adds.
Regarding the future of LLM, Chen states, “There are always costs and benefits associated with using generative AI. It is uncertain whether current systems will soon be obsolete.” When asked about the SEC adopting an AI tool for investors, Chen emphasizes that users need to understand the pros and cons of using AI to make their informed judgments “because AI cannot be held responsible for the information it provides or for how it is utilized.”
Chen concludes, “We need to embrace this new technology, but we must recognize that it is not yet in a perfect state. Competition to improve the technology is fierce. Our findings might just represent the lower bound of the effectiveness of the technology.”