January 13, 202612 min read

TAO: How Facebook Built a Graph Database for 3 Billion People

System DesignDistributed SystemsDatabasesBig Data

In 2009, Facebook engineers faced a crisis. Their social graph — the connections between 300 million users — was stored in MySQL. Every "friend request accepted" triggered cascading queries across sharded databases. Read latency was spiking. Cache invalidation was a nightmare. The infrastructure that had scaled Facebook from a dorm room to millions was now its biggest bottleneck.

The solution would become one of the most impressive feats in distributed systems engineering: TAO — The Associations and Objects store.

TAO handles 1 billion+ queries per second with sub-millisecond latency, serving a graph with 400+ billion edges across billions of objects.

This is the story of how Facebook built a database for the social graph of humanity.


The Problem: Why MySQL Couldn't Scale

The Memcache Band-Aid

By 2009, Facebook's architecture looked like this:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Web Tier  │────▶│  Memcached  │────▶│    MySQL    │
│  (PHP/Hack) │     │   (Cache)   │     │  (Sharded)  │
└─────────────┘     └─────────────┘     └─────────────┘

The pattern was "look-aside caching":

  1. Check Memcached for data
  2. If miss: query MySQL, populate cache
  3. On write: update MySQL, invalidate cache

This worked at small scale. It collapsed at Facebook scale.

The Three Killers

1. Thundering Herds

When a cache key expired or was invalidated, hundreds of concurrent requests would simultaneously hit MySQL for the same data:

Cache Miss on popular user's friend list
    ├─▶ Request 1 → MySQL query
    ├─▶ Request 2 → MySQL query  
    ├─▶ Request 3 → MySQL query
    └─▶ ... 500 more requests → MySQL overwhelmed

2. Graph Query Complexity

Social graph queries are inherently multi-hop:

-- "Get friends of friends who liked this post"
SELECT DISTINCT u.* 
FROM users u
JOIN friendships f1 ON f1.friend_id = u.id
JOIN friendships f2 ON f2.user_id = f1.user_id
JOIN post_likes pl ON pl.user_id = u.id
WHERE f2.friend_id = @current_user AND pl.post_id = @post_id

This single query could touch millions of rows across dozens of shards.

3. Cross-Shard Consistency Hell

When Alice befriends Bob:

  • Write friendship record to Alice's shard
  • Write reverse friendship to Bob's shard
  • Invalidate cache on Alice's server
  • Invalidate cache on Bob's server
  • Invalidate "friends of Alice" cached lists
  • Invalidate "friends of Bob" cached lists

One operation triggered 6+ distributed writes and invalidations. Race conditions were inevitable.


The Insight: Model Everything as Objects and Associations

The key insight was that almost everything in Facebook could be represented as a graph:

  • Objects: Users, Posts, Photos, Pages, Comments
  • Associations: Friendships, Likes, Tags, Follows

Instead of SQL tables, TAO uses a simple graph model:

Object:
┌─────────────────────────────────────────────────┐
│  id (64-bit)  │  type  │  data (blob)  │  ver  │
│  4531234567   │ USER   │ {name:...}    │  42   │
└─────────────────────────────────────────────────┘

Association:
┌──────────────────────────────────────────────────────────┐
│  id1  │  atype  │  id2  │  timestamp  │  data            │
│  123  │  FRIEND │  456  │  1609459200 │  {since: 2021}   │
└──────────────────────────────────────────────────────────┘

The API Simplicity

TAO exposes just six core operations:

OperationDescriptionExample
obj_get(id)Get object by IDGet user profile
obj_set(id, data)Create/update objectUpdate profile
assoc_add(id1, atype, id2, data)Create associationAdd friend
assoc_del(id1, atype, id2)Delete associationUnfriend
assoc_get(id1, atype)Get all associationsGet friend list
assoc_count(id1, atype)Count associationsFriend count

This minimal API handles 99% of Facebook's data access patterns.


The Architecture: Three Tiers of Intelligence

TAO's architecture is deceptively simple but engineered for planetary scale:

┌─────────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                                  │
│   (GraphQL, Feed Service, Messenger, Search)                        │
└─────────────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────────────┐
│                       TAO LEADER CACHE                               │
│   - Handles ALL writes                                               │
│   - Owns cache invalidation                                          │
│   - One leader per shard                                             │
└─────────────────────────────────────────────────────────────────────┘
            │                                        │
┌───────────────────────┐              ┌───────────────────────┐
│   TAO Follower Cache  │              │   TAO Follower Cache  │
│   (Read replicas)     │              │   (Read replicas)     │
│   - Handles reads     │              │   - Handles reads     │
│   - Eventually        │              │   - Geographically    │
│     consistent        │              │     distributed       │
└───────────────────────┘              └───────────────────────┘
            │                                        │
┌─────────────────────────────────────────────────────────────────────┐
│                        MYSQL STORAGE                                 │
│   - Persistent storage                                               │
│   - Sharded by object_id                                             │
│   - 1000s of shards                                                  │
└─────────────────────────────────────────────────────────────────────┘

Why This Works

Leader Cache: ALL writes go through a single leader per shard. This eliminates cache consistency problems — the leader is the source of truth.

Follower Caches: Read replicas in every data center. They subscribe to the leader for invalidations.

MySQL: Still the durable storage, but TAO absorbs 99%+ of reads before they reach MySQL.


The Hard Problems (And How They Solved Them)

Problem 1: Thundering Herds

Solution: In-Flight Request Coalescing

When multiple requests ask for the same data simultaneously, TAO coalesces them:

Incoming: assoc_get(user:123, FRIEND)
  ├─▶ Request 1 ─┐
  ├─▶ Request 2 ─┼──▶ Single cache lookup / MySQL query
  ├─▶ Request 3 ─┘
  └─▶ All three get the same response

Implementation insight: TAO maintains a map of in-flight requests. New requests for the same key attach as "waiters" rather than triggering duplicate fetches.

Problem 2: Hot Spots (Celebrity Problem)

When Taylor Swift posts, millions fetch assoc_get(taylor_swift, LIKES) simultaneously. Even with coalescing, this overwhelms a single cache server.

Solution: Hierarchical Caching with Consistent Hashing

Taylor Swift's data:
  │
  ├─▶ Primary Cache Server (owner)
  │
  └─▶ N Secondary Cache Servers (read-only replicas)
      ├─▶ Server A (handles 20% of reads)
      ├─▶ Server B (handles 20% of reads)
      ├─▶ Server C (handles 20% of reads)
      ├─▶ Server D (handles 20% of reads)
      └─▶ Server E (handles 20% of reads)

Hot keys are automatically replicated across multiple cache tiers. The routing layer uses weighted random selection to spread load.

Problem 3: Association Lists (Pagination)

User with 5,000 friends — you can't load all 5,000 on every request.

Solution: Time-Ordered Association Lists

Associations are stored sorted by timestamp (descending). TAO provides:

assoc_range(id1, atype, offset, limit)

This returns associations in recency order with efficient pagination. The storage uses a B-tree index on (id1, atype, timestamp).

Problem 4: Bidirectional Edges

When Alice friends Bob:

  • Store (Alice, FRIEND, Bob) on Alice's shard
  • Store (Bob, FRIEND, Alice) on Bob's shard

These must be atomically consistent.

Solution: Two-Phase Write with Background Repair

Phase 1: Write to Alice's shard (primary)
Phase 2: Async write to Bob's shard (inverse)

If Phase 2 fails:
  - Background repair daemon detects missing inverse
  - Re-applies the write

TAO accepts brief inconsistency (seconds) for availability. Background repair heals any divergence.


The Numbers That Matter

From Facebook's 2013 TAO paper and subsequent engineering talks:

MetricValue
Objects storedBillions
Associations stored400+ billion edges
Read QPS1+ billion queries/second
Cache hit rate99.8%
Read latency (p50)under 1ms
Read latency (p99)under 5ms
Write latencyunder 10ms
Data centers6+ global regions

Why 99.8% Cache Hit Rate?

Social graph access patterns are highly skewed:

  • 80% of reads are for data written in the last week
  • Power-law distribution: 1% of objects get 50% of reads
  • Friends lists change infrequently (high cache TTL)

TAO exploits this with tiered caching and aggressive prefetching.


What Engineers Can Learn

1. Optimize for Your Access Patterns

TAO's 6-operation API isn't a limitation — it's a feature. By constraining the API, Facebook could optimize the hell out of those specific patterns.

Lesson: Generic solutions (like raw SQL) are flexible but slow. Constrained solutions (like TAO's API) can be orders of magnitude faster.

2. Embrace Eventual Consistency (Where Safe)

TAO doesn't guarantee immediate consistency across data centers. If you're in Singapore and your friend accepts a request in New York, you might not see it for a few seconds.

Lesson: Users rarely notice sub-second inconsistency. Trading strict consistency for availability and latency is often the right call for social applications.

3. Put Smart Caches in Front of Dumb Storage

MySQL is "dumb" storage in TAO — it just persists data. The TAO cache layer handles all the hard problems: coalescing, hot spots, invalidation.

Lesson: Separate your caching logic from your storage logic. Let each layer do what it's good at.

4. Design for Failure Recovery, Not Failure Prevention

TAO's two-phase writes can fail mid-way. Instead of complex distributed transactions, TAO uses background repair.

Lesson: In distributed systems, failures are inevitable. Design systems that heal themselves rather than trying to prevent all failures.


The Legacy: TAO's Influence

TAO's ideas have spread throughout the industry:

  • Google's Spanner: Similar association model for social features
  • LinkedIn's Espresso: Graph-aware caching inspired by TAO
  • Twitter's Manhattan: Adopted TAO's leader-follower cache pattern
  • Open Source: Projects like Dgraph and JanusGraph draw from TAO's design

Even if you never build at Facebook's scale, understanding TAO teaches you how to think about graph storage, caching hierarchies, and consistency trade-offs.


Conclusion

TAO is one of the most impressive pieces of infrastructure engineering in the history of computing. It transformed a collapsing MySQL-based system into a planet-scale graph database serving billions of users.

The key insight wasn't a single breakthrough — it was a series of practical engineering decisions:

  • Constrain the API to optimize for common patterns
  • Use hierarchical caching with smart invalidation
  • Accept eventual consistency where users won't notice
  • Build self-healing systems instead of failure-proof ones

For engineers, TAO is a masterclass in building systems that work at the edge of what's possible.

"Any sufficiently advanced caching is indistinguishable from a distributed database."
— Paraphrasing Arthur C. Clarke


Further Reading

Share: