This site is archived.

Indexes and denormalization: keys to scaling sites with massive content

David Strauss's picture
Submitted by David Strauss on Wed, 07/30/2008 - 22:35.

Session recording

Attached files

Placement
Session time: 
08/27/2008 - 16:00 - 08/27/2008 - 16:45

Overview

Relational databases store, index, and retrieve data using using predictable patterns. Indexing data well -- but not excessively -- requires understanding indexing overhead and usage by query execution planners.

But, even the perfect indexing plan layered on top of existing tables cannot satisfy all needs: relational databases have significant limitations for indexing data. Particularly, they cannot index data across multiple tables. While Oracle and Microsoft SQL Server provide some in-built tools (materialized views and indexed views) for alleviating this limitation, users of open-source databases like MySQL and PostgreSQL must consolidate and preprocess data to scale popular services.

The Denormalization API streamlines the process of consolidating node data into tables structured for radical scalability.

Agenda

* How is Drupal data stored?
* Overview of indexing tables
* Index data structures
* Query execution plans and indexes
* What is denormalization, and why is it necessary?
* Typical hurdles for implementing denormalization
* Using the Denormalization API

Goals

Attendees should leave with an understanding of the benefits and caveats of indexing and denormalizing data. And, if they choose to denormalize, how the Denormalization API can streamline the work.

Resources

Attendees should be familiar with Drupal's node system, SQL, and basic database administration.