ML on Temporal Graphs — Full Deck
Click text to edit · ← → to navigate · F for fullscreen
Linguae-ML Seminar

Machine Learning on Temporal Graphs

From Foundations to Emerging Frontiers

A Bit About Myself

🎓
Bachelor and Master in Computer Science at Imperial College and Cambridge (2015-2019)
🤖
PhD on Machine Learning on Graphs between Twitter and Imperial College (2019-2023)
🧬
ML Researcher on generative models for structural biology and drug discovery at Vant AI (2024-2025)
🐬
PostDoc on ML for decoding non-human communication @ Sapienza
🤖 🎓 🧠 🧬 💊 🔬 🦜 🐘 🐬 🐠 🐙 🤿

Why Should We Care About ML on Graphs?

Networks are everywhere
And graphs are a great way to model them
Functional Networks
Interaction Networks
Social Networks
Molecules
Image Credit: wolfram.com · gatton.uky.edu · Papo et al., Frontiers in Human Neuroscience 2014 · Madhavicmu / Wikimedia Commons CC-BY-SA-4.0
Networks are everywhere
And graphs are a great way to model them
Syntax Trees
Cognitive Maps
What model should we use for graphs?
Modality
Data
Architecture
🖼️ Images
Example image
CNN
📝 Text
The cat sat on the mat
Transformer
🕸️ Graphs
Social network graph
?

(Static) Graphs and Graph Tasks

Graphs: nodes, edges, and features
G = (V, E)
V = nodes
E = edges
w x u v
A ∈ n×n
A = Adjacency Matrix
u
v
w
x
u
0
1
1
1
v
1
0
0
1
w
1
0
0
1
x
1
1
1
0
X ∈ n×d
X = node features
f1
f2
f3
u
0.8
0.4
1.7
v
0.2
0.9
1.4
w
0.6
0.1
1.9
x
0.4
0.7
1.6
Tasks on Graphs
Graph classification
Predict a label for the whole graph.
Model Drug-like? Yes No
Example: molecular property prediction, fraud detection, document classification.
Node classification
Predict labels for individual nodes using graph context.
Model Omnivore Vegetarian
Example: user attributes, protein function, paper topic prediction.
Link prediction
Score missing or future edges between pairs of nodes.
?
Example: recommendations, knowledge graph completion, social tie prediction.

(Static) Graph Neural Networks

Graph Neural Networks (GNNs)
Convolutional GNN
$\mathbf{H}^{(k)}$ = $\sigma(\tilde{\mathbf{A}} \mathbf{H}^{(k-1)} \mathbf{W}^{(k)})$ $\mathbf{H}^{(0)}$ = $\mathbf{X}$
H (k) = σ( Ã H (k-1) W (k) )
Message-Passing GNN
$\mathbf{m}_i^{(k)}$ = $\text{AGG}^{(k)}\!\left(\{\{\mathbf{h}_j^{(k-1)}: (i,j) \in E\}\}\right)$ $\mathbf{h}_i^{(k)}$ = $\text{COM}^{(k)}\!\left(\mathbf{h}_i^{(k-1)}, \mathbf{m}_i^{(k)}\right)$ $\mathbf{h}_i^{(0)}$ = $\mathbf{x}_i$
Transformers are GNNs
On the fully connected graph
This sentence is a also
Graph Transformers
Transformer-style attention, but with graph structure injected
Core idea
  • Self-attention lets every node aggregate from all other nodes
  • Graph information enters through special positional encodings
$\mathrm{Attn}(i,j) \leftarrow q_i^\top k_j + b_{\mathrm{graph}}(i,j)$
A B C D E

Dynamic Graphs

Some Examples of Dynamic Graphs
Graphs changing over time
Social Networks
Interaction Networks
From Static to Dynamic Graphs
$G=(V,E,X)$
Static Graph
• No notion of time
• Examples: molecules, syntax trees
$G_t=(V,E,X_t)$
Spatio-Temporal Graph
• Fixed topology; changing features
• Regular time intervals
• Examples: traffic, weather sensors
$G(t)=\{x_{t_1},x_{t_2},...\}\quad t_1\leq t_2\leq ...$
Continuous-Time DG (CTDG)
• Most general formulation
• Sequence of timestamped events
• Examples: social, financial
$G_t=(V_t,E_t,X_t)$
Discrete-Time DG (DTDG)
• Changing topology and features
• Regular time intervals -> sequence of snapshots
• Examples: weekly trade networks
Less General
More General
Why Learning on Dynamic Graphs is Different
Temporal graph models must exploit the event history without collapsing it into one static snapshot
What changes
  • Events arrive in order: creation, deletion, update, interaction
  • The model should use both what happened and when it happened
  • New tasks ask what happens next, or when it happens
Why static GNNs are not enough
event history
G(t ≤ T)
latest snapshot only
  • Information loss: the last snapshot hides the evolution path
  • Inefficiency: every new event can force repeated computation
  • No timing prediction: static GNNs do not support predicting when something will happen

Models

Temporal Graph Model
Graph up to time t (ordered sequence of events) t₇ t₃ t₂ t₁,t₅ t₄ t₆ 1 2 5 3 4 Encoder z₁(t) z₂(t) z₃(t) z₄(t) z₅(t) Temporal node embeddings Decoder ŷ prediction Node / Edge prediction
Memory-based Models
  • Process events in order using an RNN, with a different hidden state per node
  • Hidden state (memory) is a compressed representation of all past interactions of a node
  • Memory is directly used as the temporal node embeddings
Update time
edge (u,v) arrives message state, Δt features memory update u memory update v

Edge (u,v) builds a message from partner state, features & Δt; only the two touched nodes update their memory.

Predict time
memory node u memory node v decoder MLP ŷ prediction

Memories serve directly as node embeddings, fed into decoder with no extra computation at query time.

Memory-based Models: Pros and Cons
A strong online model, but not yet a full temporal graph encoder
Pros
  • Strong sequentiality inductive bias
  • Cheap online updates after each new event
Cons
  • Inactive nodes can become stale
  • Graph context is mostly local or indirect
  • Forced to process previous edges in sequential order
Graph-based Models
GNN on the graph of previous interactions, with timestamps as edge features
Update time
u t v G(t) new edge appended (orange)

Edge (u,v,t) is simply appended to G(t) — no computation performed at event time.

Predict time
G(t) graph GNN GAT layers Z(t) embeddings decoder MLP ŷ prediction

GNN runs on full G(t) to produce node embeddings; must re-run for every new query — expensive at inference.

Graph-based Models: Pros and Cons
Explicit graph structure mitigates staleness, but the GNN must re-run on every query
Pros
  • No need for sequential processing in training
  • Using the graph explicitly → mitigates staleness problem
Cons
  • Can only handle edge addition events
  • Need to re-run GNN after each new event → inefficient at inference
TGN: Temporal Graph Networks
Our work: a modular framework that combines memory-based and graph-based temporal learning
TGN contribution
  • General framework combining the best of memory-based and graph-based approaches
  • Updates node memories from events, then uses a GNN over the interaction graph to produce embeddings
  • Generalizes previous memory-based models and graph-based models in one notation
TGN training flow diagram
TGN: State-of-the-Art in 2020
High accuracy with much lower per-epoch cost than the previous methods
TGN: Still a Strong Baseline Today
Years of follow-up work, yet TGN remains hard to beat on standard benchmarks

Benchmarks

Temporal Graph Benchmark
Diverse, large datasets and unified evaluation
Key Findings
  • Previous datasets were too easy and saturated
  • Exposed that simple historical baselines can be surprisingly strong
  • Large new datasets make scalability part of the benchmark
9 datasets · 5 domains · up to 72M edges
Temporal Graph Benchmark 2.0
TGB but for knowledge graphs
Key findings
  • Edge and relation type information is crucial for strong performance
  • Simple heuristic baselines remain competitive with more complex methods
  • Many methods fail to run on the largest datasets, making scalability a central result
8 datasets · 5 domains · up to 53M edges

Unifying Models for Discrete and Continuous Time Graphs

UTG: Unifying Snapshot & Event-Based Models
Huang, Poursafaei, Rabbany, Rabusseau, Rossi, LoG 2024
The problem

Snapshot (DTDG) and event-based (CTDG) models developed in isolation: limited cross-comparison and no unified evaluation.

Contribution
  • Input mapper: convert CTDG to snapshots and DTDG to events, so any model can run on any data
  • UTG training: use streaming training to make snapshot-based models operate on event streams
  • Output mapper: align predictions to either continuous-time or discrete-time tasks
UTG Framework
UTG: Key Findings
Speed

Snapshot-based models are ≥ 10× faster at inference than most event-based models.

Performance

With UTG training, snapshot-based models match TGN & GraphMixer even on event-based (CTDG) datasets.

Insight

NAT & DyGFormer's edge comes from joint neighbourhood features, not from the event-based format: these can be added to snapshot models too.

Next Frontiers

Foundation Models for TG
Existing TG models are trained and tested on the same graph.
Can we train a single TG model that works on unseen graphs, ideally from different domains?
MiNT: Multi-Network Transfer Benchmark for Temporal Graph Learning
  • New dataset: 84 distinct ERC-20 token transaction networks
  • New framework to train existing TG models on multiple graphs simultaneously
  • Scaling law for TGs: model performance improves as it is trained on more networks
Can we use LMs directly on temporal graphs?
TGTalker
Translate temporal graph structure into natural language and feed it to a pre-trained LLM — no fine-tuning required.
Four prompt components
  • Background set — recent edges as context
  • Example set — 5-shot Q&A pairs
  • Query set — target edge to predict
  • Temporal neighbors — 1-hop recent neighbours
TGTalker: Results & Explainability
Results
  • Competitive without fine-tuning
  • Consistently outperforms TGN and HTGN
10 explanation categories discovered
  • Most Recent Interaction ≈ EdgeBank heuristic
  • Most Frequent Destination ≈ PopTrack heuristic
  • Novel patterns: sequence logic, analogy-based inference
TGTalker results and explanations
Resources for TGL
Blog posts, talks, surveys, libraries & datasets
💻 Libraries
TGM
Temporal Graph Models library
📊 Datasets
TGB
Temporal Graph Benchmark
TGB-Seq
Sequential Temporal Graph Benchmark
(a small taste of ongoing work)

My Current Directions

🔊 Communicating Sound Through Natural Language
A sender agent describes a sound; a receiver agent decodes the description back to audio
Original audio
Transmitted sentence
"At entry the envelope carries mid-power, punchy, and extreme-oscillation; through sustain and release, the trace returns swift-onset and aggressive..."
Reconstructed audio
Comparison
🔊 Communicating Sound Through Natural Language
A sender agent describes a sound; a receiver agent decodes the description back to audio
Original audio
Transmitted sentence
"At entry the envelope carries mid-power, punchy, and extreme-oscillation; through sustain and release, the trace returns swift-onset and aggressive..."
Reconstructed audio
Waveforms
Original and reconstructed sample 1 waveforms
🐬 OpenWhistle Dataset
  • Very rich and unique dolphin dataset collected over 5 years
  • New benchmark for whistle detection and classification
Dataset
# Whistles
Voc. hours
Time span (yrs)
Stable pod (# indiv.)
Setting
Seq. context
Open
OpenWhistle Pretraining ~180,000* 114.3 5.0 (5) Semi-nat.
OpenWhistle Expert subset 8,354 1.9 0.42 (5) Semi-nat.
DOLPHINFREE 4,600 7.3 2.0 Wild
Di Nardo et al., 2025 3,111 0.6 0.003 (7) Captive
Watkins MMSD 566 N/R 70+ Wild
Korkmaz et al., 2023 ~29,000* 6.8 0.07 Semi-nat.
Sicily Strait PAM 14,048 N/R 1.2 Wild
DCLDE 2011 6,011 0.7 4.0 Wild
SDWD N/R N/R 43+ (293) Wild (C&R)
* Estimated from total vocalization duration and mean whistle duration.
Thank You

Questions?

ML on Temporal Graphs  ·  Linguae-ML Seminar  ·