financial-reports-analysis
Version: 2
Description
The adapted AI model for financial reports analysis (preview) is a state-of-the-art small language model (SLM) based on the Phi-3-small-128k architecture, designed specifically for analyzing financial reports. It has been fine-tuned on a few hundred million tokens derived from instruction data over financial documents, including SEC filings (10-K, 10-Q, 8-K reports) and mathematical reasoning tasks.
The model is optimized to handle complex financial language and to understand data contained in tables, making it suitable for SEC report analysis, including data extraction, summarization, and common financial formulas. It can also perform more complex reasoning tasks, such as comparing companies and identifying trends across different time periods.
NOTE: This model is in preview
Model Architecture
The adapted AI model for financial reports analysis is a dense, decoder-only transformer model with 7B parameters, optimized for financial reports analysis. It supports a 128K context length, making it capable of processing long financial documents and providing coherent, context-aware completions. The model is fine-tuned with supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.
Training Datasets
The adapted AI model for financial reports analysis was fine-tuned on a highly specialized dataset totaling a few hundred million tokens, including:
Financial benchmarks (classification):
Financial benchmarks (exact match)
General knowledge benchmarks (comparison with base model):
*All evaluations were conducted using temperature 0.3
Hardware
Note that by default, the Phi-3-small-128K-Instruct model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
- SEC filings, including 10-K, 10-Q, and 8-K reports
- Textbook-like materials focusing on finance
- Verified common questions and answer pairs on SEC reports, scaled synthetically
- Mathematical reasoning tasks
- Financial table understanding
- Extraction and summarization of information from financial documents
- Answering questions related to SEC reports, such as risk assessment and analysis of companies’ financial performance
- Getting the data
- Splitting the data (chunking)
- Chunk
- Saving metadata
- Processing the text
- Adding headers
- Split your HTML filing into chunks– recommended chunk by page
- Save the page number as metadata
- Occasionally pages may contain several sections (mostly referred as Items in SEC filing).
- We recommend further chunking those by section
- Save section name as metadata
- Convert any free text (excluding tables see 4.) to markdown using any of the markdown tools available (edgartools dgunning/edgartools: Navigate SEC Edgar data in Python , Markdownify, or any other available method).
- Keep all tables in HTML format. Strip all styling attributes except colspan/rowspan attributes, as they are needed to understand if a table header covers several columns or rows.
- Due to the nature of the questions that refer to chunks from different documents across various companies and periods of time, we found that adding a header with a brief title based on the metadata of the chunk as described above (company name, reference period and type of document) into the content of the chunk as part of the prompt improves model performance.
- Quality of Service: The adapted AI model for financial reports analysis model is trained primarily on English text. Languages other than English do not perform as well. English language varieties with less representation in the training data might not perform as well as standard American English.
- Representation of Harms & Perpetuation of Stereotypes: This model can over- or under-represent groups of people, erase representation of some groups, or reinforce demeaning or negative stereotypes. Despite safety post-training, these limitations may still be present due to differing levels of representation of different groups or prevalence of examples of negative stereotypes in training data that reflect real-world patterns and societal biases.
- Inappropriate or Offensive Content: This model may produce other types of inappropriate or offensive content, which may make it inappropriate to deploy for sensitive contexts without additional mitigations that are specific to the use case.
- Information Reliability: Language models can generate nonsensical content or fabricate content that might sound reasonable but is inaccurate or outdated.
- Allocation: Models may not be suitable for scenarios that could have consequential impact on legal status or the allocation of resources or life opportunities (for example: housing, employment, credit, etc.) without further assessments and additional debiasing techniques.
- High-Risk Scenarios: Developers should assess the suitability of using models in high-risk scenarios where unfair, unreliable or offensive outputs might be extremely costly or lead to harm. This includes providing advice in sensitive or expert domains where accuracy and reliability are critical (for example: legal or health advice). Additional safeguards should be implemented at the application level according to the deployment context.
- Misinformation: Models may produce inaccurate information. Developers should follow transparency best practices and inform end-users they are interacting with an AI system. At the application level, developers can build feedback mechanisms and pipelines to ground responses in use-case specific, contextual information, a technique known as Retrieval Augmented Generation (RAG).
- Generation of Harmful Content: Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
- Misuse: Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
Benchmark | Adapted-AI-model-for-financial-reports-analysis | Phi3-small-128k | GPT-4o-mini | GPT-4o |
---|---|---|---|---|
SECQUE | 73.8 | 58.2 | 66.8 | 78.1 |
Benchmark | Adapted-AI-model-for-financial-reports-analysis | Phi3-small-128k | GPT-4o-mini | GPT-4o |
---|---|---|---|---|
Twitter SA | 85.6 | 70 | 73.9 | 80.4 |
Twitter Topics | 87 | 48.6 | 61.7 | 63.8 |
FiQA SA | 75.4 | 80.8 | 77.4 | 78.2 |
FPB | 79.6 | 72.7 | 78.4 | 82.8 |
Average F1 | 81.9 | 68 | 72.8 | 76.3 |
Benchmark | Adapted-AI-model-for-financial-reports-analysis | Phi3-small-128k | GPT-4o-mini | GPT-4o |
---|---|---|---|---|
ConvFinQA | 76.2 | 71.1 | 78.3 | 75.4 |
FinQA | 66.1 | 63.5 | 68.9 | 69.9 |
TACT | 64.5 | 58.9 | 66.1 | 71 |
Average exact match | 68.9 | 64.5 | 71.1 | 72.1 |
Benchmark | Adapted-AI-model-for-financial-reports-analysis | Phi3-small-128k | % Difference |
---|---|---|---|
TriviaQA | 76.2 | 71.1 | 0% |
MedQA | 66.1 | 63.5 | 3.5% |
MMLU | 64.5 | 58.9 | -1.3% |
PIQA | 68.9 | 64.5 | 1.1% |
WinoGrande | 79 | 80 | -1.2% |
- NVIDIA A100
- NVIDIA A6000
- NVIDIA H100
Model Specifications
LicenseMit
Last UpdatedJanuary 2025
Input TypeText
Output TypeText
PublisherMicrosoft
Languages1 Language