Automating Trade Logging from CMC Markets
Like many retail brokers, CMC Markets doesn’t offer an API or robust reporting tools. For anyone who wants clean trade history for analytics, tax, or journaling, this quickly becomes painful. My solution was to automate the process end-to-end: from receiving a trade confirmation email, to structured trade data in Google Sheets.
And yes — PDFs are the villain of this story.
The Workflow
- Trade execution → I place a trade on CMC Markets.
- Email notification → CMC emails me a trade confirmation (with an attached PDF).
- Workflow trigger → Once per day, an n8n workflow scans for new emails.
- PDF preprocessing → Since CMC’s PDFs are laid out in two columns, the text extraction would normally jumble data. I fix this by splitting the PDF into vertical halves with Stirling PDF before extracting text.
- Data extraction → Text is parsed and mapped into a clean JSON schema (ticker, price, brokerage, dates, etc.).
- Storage → A row is appended to Google Sheets with the structured trade data.
Why Use an LLM?
CMC’s PDFs aren’t static:
- The format has changed over time
- Content shifts depending on the product and trade details
A plain regex parser would constantly break. Instead, I use an LLM (Google Gemini API) to harden the parsing:
- It understands variations in layout and wording
- It can normalize into a predictable schema
- It reduces maintenance overhead when CMC inevitably tweaks their template again
The Tech Stack
- n8n (Dockerized, SQLite backend) → workflow orchestration
- If you do the same on a serverless platform, make sure to turn off CPU throttling!
- Stirling PDF (Dockerized, via API) → PDF preprocessing
- Google Gemini API → intelligent extraction into JSON schema
- Google Sheets → storage and reporting
- GCP → hosting and OAuth2 integration with Google products
The N8N Workflow
Sequence Diagram
Architecture
Preprocessing Challenge
At first, I tried converting the PDFs to images, then back to text using OCR. The results were unreliable — OCR mangled numbers, tickers, and formatting.
The breakthrough was splitting the PDF into vertical halves before extraction. This preserves reading order and makes the text parseable. From there, Gemini can do its job reliably.
_____________
Conclusion
Can we all agree: Enough with the pdfs already?

.png)
Comments
Post a Comment