Annual Report Text Processing

For this project, there was an interest in analyzing audit reports of publicly traded companies. The goal was to isolate any audit reports that contained a “Critical Audit Matter,” which is a major item that the auditors want to highlight often involving uncertainty or judgement calls. The challenge was that not every financial report contains a Critical Audit Matter, and it’s often hard to scroll through dozens of pages to find it.
My goal was to identify annual reports that contained Critical Audit Matters, extract the relevant text, and then host it in an easily searchable database.
To accomplish those goals, I used R in conjunction with the Securities and Exchange Commission’s API to bulk download thousands of annual financial reports from publicly traded companies. I then used R to manipulate the text data into a format where carefully crafted regular expressions could identify and extract any Critical Audit Matters. Once extracted, I used R Markdown to create a PDF document for each annual report’s Critical Audit Matter. This solution cut through all the noise in annual reports, isolating only the relevant text. These PDFs were then uploaded into a searchable database that the rest of the newsroom could access and search.
My colleague was able to use the database to report on how auditors are addressing climate risks.
Because the data processing pipeline (from download to analysis to extraction) was written in R, the process is fully reproducible and able to be repeated on a regular basis.
Description
Bloomberg Industry Group
August 3, 2023
Analysis of financial reports that reveals auditors often stay silent on the impact that climate change poses to companies' long-term success and survival.