Automating Trade Logging from CMC Markets

Like many retail brokers, CMC Markets doesn’t offer an API or robust reporting tools. For anyone who wants clean trade history for analytics, tax, or journaling, this quickly becomes painful. My solution was to automate the process end-to-end: from receiving a trade confirmation email, to structured trade data in Google Sheets.

And yes — PDFs are the villain of this story.

The Workflow

Trade execution → I place a trade on CMC Markets.
Email notification → CMC emails me a trade confirmation (with an attached PDF).
Workflow trigger → Once per day, an n8n workflow scans for new emails.
PDF preprocessing → Since CMC’s PDFs are laid out in two columns, the text extraction would normally jumble data. I fix this by splitting the PDF into vertical halves with Stirling PDF before extracting text.
Data extraction → Text is parsed and mapped into a clean JSON schema (ticker, price, brokerage, dates, etc.).
Storage → A row is appended to Google Sheets with the structured trade data.

Why Use an LLM?

CMC’s PDFs aren’t static:

The format has changed over time
Content shifts depending on the product and trade details

A plain regex parser would constantly break. Instead, I use an LLM (Google Gemini API) to harden the parsing:

It understands variations in layout and wording
It can normalize into a predictable schema
It reduces maintenance overhead when CMC inevitably tweaks their template again

The Tech Stack

n8n (Dockerized, SQLite backend) → workflow orchestration

If you do the same on a serverless platform, make sure to turn off CPU throttling!

Stirling PDF (Dockerized, via API) → PDF preprocessing
Google Gemini API → intelligent extraction into JSON schema
Google Sheets → storage and reporting
GCP → hosting and OAuth2 integration with Google products

The N8N Workflow

Sequence Diagram

Architecture

Preprocessing Challenge

At first, I tried converting the PDFs to images, then back to text using OCR. The results were unreliable — OCR mangled numbers, tickers, and formatting.

The breakthrough was splitting the PDF into vertical halves before extraction. This preserves reading order and makes the text parseable. From there, Gemini can do its job reliably.

_____________

Conclusion

Can we all agree: Enough with the pdfs already?

JSON Schema Provided to Gemini

{

"$schema": "http://json-schema.org/draft-07/schema#",

"$id": "https://example.com/trade-confirmation.schema.json",

"title": "Trade Confirmation",

"description": "Schema for extracted trade confirmation data from buy/sell documents",

"type": "object",

"properties": {

"price": {

"type": "number",

"description": "Price per unit/share",

"minimum": 0,

"exclusiveMinimum": true,

"examples": [31.7813, 0.045, 125.50]

"count": {

"type": "integer",

"description": "Quantity of units/shares traded",

"minimum": 1,

"examples": [31, 50000, 1000]

"buyOrSell": {

"type": "string",

"description": "Transaction type",

"enum": ["BUY", "SELL"],

"examples": ["BUY", "SELL"]

"brokerage": {

"type": "number",

"description": "Brokerage fees charged",

"minimum": 0,

"examples": [0, 10, 19.95]

"tax": {

"type": "number",

"description": "GST/tax amount",

"minimum": 0,

"examples": [0, 1, 1.99]

"total": {

"type": "number",

"description": "Total amount - for BUY: amount payable, for SELL: net proceeds",

"minimum": 0,

"exclusiveMinimum": true,

"examples": [985.22, 2239, 12550]

"ticker": {

"type": "string",

"description": "Security ticker/code symbol",

"pattern": "^[A-Z]{2,5}$",

"examples": ["QSML", "CDT", "NV1", "AAPL", "TSLA"]

"transactionDate": {

"type": "string",

"format": "date",

"description": "Date when the transaction occurred",

"pattern": "^\\d{4}-\\d{2}-\\d{2}$",

"examples": ["2025-09-01", "2022-05-05", "2024-12-15"]

"settlementDate": {

"type": "string",

"format": "date",

"description": "Date when the transaction settles",

"pattern": "^\\d{4}-\\d{2}-\\d{2}$",

"examples": ["2025-09-03", "2022-05-09", "2024-12-17"]

}

"required": [

"price",

"count",

"buyOrSell",

"brokerage",

"tax",

"total",

"ticker",

"transactionDate",

"settlementDate"

"additionalProperties": false,

"examples": [

{

"price": 31.7813,

"count": 31,

"buyOrSell": "BUY",

"brokerage": 0,

"tax": 0,

"total": 985.22,

"ticker": "QSML",

"transactionDate": "2025-09-01",

"settlementDate": "2025-09-03"

{

"price": 0.045,

"count": 50000,

"buyOrSell": "SELL",

"brokerage": 10,

"tax": 1,

"total": 2239,

"ticker": "CDT",

"transactionDate": "2022-05-05",

"settlementDate": "2022-05-09"

}

]

}

Tony Cerqui

Search This Blog

Automation of Personal Share Reporting.

Automating Trade Logging from CMC Markets

The Workflow

Why Use an LLM?

The Tech Stack

The N8N Workflow

Sequence Diagram

Architecture

Preprocessing Challenge

JSON Schema Provided to Gemini

Comments

Post a Comment