You are here
Materialized Views with MySQL
Content
- What is a Materialized View?
- Implement your own Materialized Views
- Refreshing materialized views
- Hands on
- Create your own Materialized View:
- Refresh Materialized View on demand
- Refresh Materialized View immediate
- Materialized Views with snapshotting functionality
- Some performance benchmarks for our Materialized Views:
- Outlook
- Conclusion
- Literature
What is a Materialized View?
A Materialized View (MV) is the pre-calculated (materialized) result of a query. Unlike a simple VIEW the result of a Materialized View is stored somewhere, generally in a table. Materialized Views are used when immediate response is needed and the query where the Materialized View bases on would take to long to produce a result. Materialized Views have to be refreshed once in a while. It depends on the requirements how often a Materialized View is refreshed and how actual its content is. Basically a Materialized View can be refreshed immediately or deferred, it can be refreshed fully or to a certain point in time. MySQL does not provide Materialized Views by itself. But it is easy to build Materialized Views yourself.
Implement your own Materialized Views
A short example for how this could be done is the following query:
SELECT COUNT(*) FROM MyISAM_table;
returns immediate result because the counter is stored in the table header. The following query can take some seconds up to minutes:
SELECT COUNT(*) FROM innodb_huge;
A possible solution for this would be to create a table where all InnoDB row counts are stored in
CREATE TABLE innodb_row_count ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , schema_name VARCHAR(64) NOT NULL , table_name VARCHAR(64) NOT NULL , row_count INT UNSIGNED NOT NULL );
Depending on the needed correctness of this information the table can be refreshed once a day (least used resources on the system but biggest error in result), once an hour or in most extreme case after every change (slowest)!
An other possibility would be to get the data from the information schema. But this information can be up to 20% wrong!
SELECT table_schema, table_name, table_rows FROM information_schema.tables WHERE table_type = 'BASE TABLE';
Refreshing materialized views
Materialized Views can be refreshed in different kinds. They can be refreshed:
- never (only once in the beginning, for static data only)
- on demand (for example once a day, for example after nightly load)
- immediately (after each statement)
A refresh can be done in the following ways:
- completely (slow, full from scratch)
- deferred (fast, by a log table)
By storing the change information in a log table. Also some snapshots or time delayed states can be produced:
- refresh up to date
- refresh full
Hands on
To understand all this more in detail it is probably easiest to do some examples. Assume we have a table sales:
CREATE TABLE sales ( sales_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , product_name VARCHAR(128) NOT NULL , product_price DECIMAL(8,2) NOT NULL , product_amount SMALLINT NOT NULL ); INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1), (NULL, 'Apple', 2.40, 2) , (NULL, 'Apple', 4.05, 3), (NULL, 'Pear', 6.30, 2) , (NULL, 'Pear', 12.20, 4), (NULL, 'Plum', 4.85, 3) ; SELECT * FROM sales;
And now we want to know the price sold and the money earned per product:
EXPLAIN SELECT product_name , SUM(product_price) AS price_sum, SUM(product_amount) AS amount_sum , AVG(product_price) AS price_avg, AVG(product_amount) amount_agg , COUNT(*) FROM sales GROUP BY product_name ORDER BY price_sum; +-------------+-------+------+---------------+------+---------------------------------+ | select_type | table | type | possible_keys | rows | Extra | +-------------+-------+------+---------------+------+---------------------------------+ | SIMPLE | sales | ALL | NULL | 6 | Using temporary; Using filesort | +-------------+-------+------+---------------+------+---------------------------------+
On such a small table it is pretty fast but when you have hundreds of products and millions of sales transactions it can take minutes to hours!
Create your own Materialized View:
DROP TABLE sales_mv; CREATE TABLE sales_mv ( product_name VARCHAR(128) NOT NULL , price_sum DECIMAL(10,2) NOT NULL , amount_sum INT NOT NULL , price_avg FLOAT NOT NULL , amount_avg FLOAT NOT NULL , sales_cnt INT NOT NULL , UNIQUE INDEX product (product_name) ); INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount) , AVG(product_price), AVG(product_amount) , COUNT(*) FROM sales GROUP BY product_name;
This is up to now the easiest part! And, as expected we get the correct result:
mysql> SELECT * FROM sales_mv; +--------------+-----------+------------+-----------+------------+-----------+ | product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt | +--------------+-----------+------------+-----------+------------+-----------+ | Apple | 7.70 | 6 | 2.56667 | 2 | 3 | | Pear | 18.50 | 6 | 9.25 | 3 | 2 | | Plum | 4.85 | 3 | 4.85 | 3 | 1 | +--------------+-----------+------------+-----------+------------+-----------+ 3 rows in set (0.00 sec)
This would cover the refreshment mode "NEVER" But this is not what we generally want to do.
Refresh Materialized View on demand
Refreshing the Materialized View on demand can be implemented with a Stored Procedure as follows:
DROP PROCEDURE refresh_mv_now; DELIMITER $$ CREATE PROCEDURE refresh_mv_now ( OUT rc INT ) BEGIN TRUNCATE TABLE sales_mv; INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount) , AVG(product_price), AVG(product_amount) , COUNT(*) FROM sales GROUP BY product_name; SET rc = 0; END; $$ DELIMITER ;
To check if it works the following statement were used:
CALL refresh_mv_now(@rc); SELECT * FROM sales_mv; +--------------+-----------+------------+-----------+------------+-----------+ | product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt | +--------------+-----------+------------+-----------+------------+-----------+ | Apple | 7.70 | 6 | 2.56667 | 2 | 3 | | Pear | 18.50 | 6 | 9.25 | 3 | 2 | | Plum | 4.85 | 3 | 4.85 | 3 | 1 | +--------------+-----------+------------+-----------+------------+-----------+ INSERT INTO sales VALUES (NULL, 'Apple', 2.25, 3), (NULL, 'Plum', 3.35, 1) , (NULL, 'Pear', 1.80, 2); CALL refresh_mv_now(@rc); SELECT * FROM sales_mv; +--------------+-----------+------------+-----------+------------+-----------+ | product_name | price_sum | amount_sum | price_avg | amount_avg | sales_cnt | +--------------+-----------+------------+-----------+------------+-----------+ | Apple | 9.95 | 9 | 2.4875 | 2.25 | 4 | | Pear | 20.30 | 8 | 6.76667 | 2.66667 | 3 | | Plum | 8.20 | 4 | 4.1 | 2 | 2 | +--------------+-----------+------------+-----------+------------+-----------+
To make the output a little nicer we can add a VIEW on the Materialized View table as follows:
CREATE VIEW sales_v AS SELECT product_name, price_sum, amount_sum, price_avg, amount_avg FROM sales_mv;
Refresh Materialized View immediate
To do a full refresh after each statement does not make sense. But we still would like to have to proper result. To do this it is a little bit more complicated.
On every INSERT on the sales table we have to update our Materialized View. We can implement this transparently by INSERT/UPDATE/DELETE triggers on the sales table:
Now let us create the needed triggers:
DELIMITER $$ CREATE TRIGGER sales_ins AFTER INSERT ON sales FOR EACH ROW BEGIN SET @old_price_sum = 0; SET @old_amount_sum = 0; SET @old_price_avg = 0; SET @old_amount_avg = 0; SET @old_sales_cnt = 0; SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0) , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0) FROM sales_mv WHERE product_name = NEW.product_name INTO @old_price_sum, @old_amount_sum, @old_price_avg , @old_amount_avg, @old_sales_cnt ; SET @new_price_sum = @old_price_sum + NEW.product_price; SET @new_amount_sum = @old_amount_sum + NEW.product_amount; SET @new_sales_cnt = @old_sales_cnt + 1; SET @new_price_avg = @new_price_sum / @new_sales_cnt; SET @new_amount_avg = @new_amount_sum / @new_sales_cnt; REPLACE INTO sales_mv VALUES(NEW.product_name, @new_price_sum, @new_amount_sum, @new_price_avg , @new_amount_avg, @new_sales_cnt) ; END; $$ DELIMITER ;
DELIMITER $$ CREATE TRIGGER sales_del AFTER DELETE ON sales FOR EACH ROW BEGIN SET @old_price_sum = 0; SET @old_amount_sum = 0; SET @old_price_avg = 0; SET @old_amount_avg = 0; SET @old_sales_cnt = 0; SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0) , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0) FROM sales_mv WHERE product_name = OLD.product_name INTO @old_price_sum, @old_amount_sum, @old_price_avg , @old_amount_avg, @old_sales_cnt ; SET @new_price_sum = @old_price_sum - OLD.product_price; SET @new_amount_sum = @old_amount_sum - OLD.product_amount; SET @new_price_avg = @new_price_sum / @new_amount_sum; SET @new_sales_cnt = @old_sales_cnt - 1; SET @new_amount_avg = @new_amount_sum / @new_sales_cnt; REPLACE INTO sales_mv VALUES(OLD.product_name, @new_price_sum, @new_amount_sum , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0) , @new_sales_cnt) ; END; $$ DELIMITER ;
DELIMITER $$ CREATE TRIGGER sales_upd AFTER UPDATE ON sales FOR EACH ROW BEGIN SET @old_price_sum = 0; SET @old_amount_sum = 0; SET @old_price_avg = 0; SET @old_amount_avg = 0; SET @old_sales_cnt = 0; SELECT IFNULL(price_sum, 0), IFNULL(amount_sum, 0), IFNULL(price_avg, 0) , IFNULL(amount_avg, 0), IFNULL(sales_cnt, 0) FROM sales_mv WHERE product_name = OLD.product_name INTO @old_price_sum, @old_amount_sum, @old_price_avg , @old_amount_avg, @old_sales_cnt ; SET @new_price_sum = @old_price_sum + (NEW.product_price - OLD.product_price); SET @new_amount_sum = @old_amount_sum + (NEW.product_amount - OLD.product_amount); SET @new_sales_cnt = @old_sales_cnt; SET @new_price_avg = @new_price_sum / @new_sales_count; SET @new_amount_avg = @new_amount_sum / @new_sales_cnt; REPLACE INTO sales_mv VALUES(OLD.product_name, @new_price_sum, @new_amount_sum , IFNULL(@new_price_avg, 0), IFNULL(@new_amount_avg, 0) , @new_sales_cnt) ; END; $$ DELIMITER ;
And now let us see what the result is:
INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1); INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2); INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3); INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2); INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4); INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3); DELETE FROM sales WHERE sales_id = 5; DELETE FROM sales WHERE sales_id = 4; UPDATE sales SET product_amount = 3 where sales_id = 2; SELECT * from sales_v;
Materialized Views with snapshotting functionality
The difference to the example above is, that the changes will not be applied immediately. The changes will be stored in a log table and the Materialized View is refreshed after a certain time period for a certain amount of time.
In addition to the example above we need an additional log table:
CREATE TABLE sales_mvl ( product_name VARCHAR(128) NOT NULL , product_price DECIMAL(8,2) NOT NULL , product_amount SMALLINT NOT NULL , sales_id INT UNSIGNED NOT NULL , product_ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP() );
Instead of updating the mv the log is filled:
DROP TRIGGER sales_ins; DELIMITER $$ CREATE TRIGGER sales_ins AFTER INSERT ON sales FOR EACH ROW BEGIN INSERT INTO sales_mvl VALUES (NEW.product_name, NEW.product_price, NEW.product_amount , NEW.sales_id, NULL); END; $$ DELIMITER ;
DROP TRIGGER sales_del; DELIMITER $$ CREATE TRIGGER sales_del AFTER DELETE ON sales FOR EACH ROW BEGIN DELETE FROM sales_mvl WHERE sales_id = OLD.sales_id; END; $$ DELIMITER ;
DROP TRIGGER sales_upd; DELIMITER $$ CREATE TRIGGER sales_upd AFTER UPDATE ON sales FOR EACH ROW BEGIN UPDATE sales_mvl SET product_name = NEW.product_name , product_price = NEW.product_price , product_amount = NEW.product_amount , sales_id = NEW.sales_id , product_ts = CURRENT_TIMESTAMP() WHERE sales_id = OLD.sales_id; END; $$ DELIMITER ;
And a Stored Procedure for refreshing the Materialized View is built. Possible modes are:
- REFRESH FULL (up to now)
- REFRESH (up to a certain time stamp)
- REBUILD (fully rebuild and clean MV log table)
DELIMITER $$ CREATE PROCEDURE refresh_mv ( IN method VARCHAR(16) , IN ts TIMESTAMP , OUT rc INT ) BEGIN IF UPPER(method) = 'REBUILD' THEN TRUNCATE TABLE sales_mvl; TRUNCATE TABLE sales_mv; INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount), AVG(product_price) , AVG(product_amount), COUNT(*) FROM sales GROUP BY product_name ; ELSEIF UPPER(method) = 'REFRESH FULL' THEN REPLACE INTO sales_mv SELECT product_name, SUM(price_sum), SUM(amount_sum) , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt) , SUM(sales_cnt) FROM ( SELECT product_name, price_sum, amount_sum, sales_cnt FROM sales_mv UNION ALL SELECT product_name , SUM(product_price), SUM(product_amount), COUNT(*) FROM sales_mvl GROUP BY product_name ) x GROUP BY product_name ; TRUNCATE TABLE sales_mvl; SET rc = 0; ELSEIF UPPER(method) = 'REFRESH' THEN REPLACE INTO sales_mv SELECT product_name, SUM(price_sum), SUM(amount_sum) , SUM(price_sum)/SUM(sales_cnt), SUM(amount_sum)/SUM(sales_cnt) , SUM(sales_cnt) FROM ( SELECT product_name, price_sum, amount_sum, sales_cnt FROM sales_mv UNION ALL SELECT product_name , SUM(product_price), SUM(product_amount), COUNT(*) FROM sales_mvl WHERE product_ts < ts GROUP BY product_name ) x GROUP BY product_name ; DELETE FROM sales_mvl WHERE product_ts < ts ; SET rc = 0; ELSE SET rc = 1; END IF; END; $$ DELIMITER ;
And now let us test if it works correctly...
INSERT INTO sales VALUES (NULL, 'Apple', 1.25, 1); wait some time INSERT INTO sales VALUES (NULL, 'Apple', 2.40, 2); wait some time INSERT INTO sales VALUES (NULL, 'Apple', 4.05, 3); wait some time INSERT INTO sales VALUES (NULL, 'Pear', 6.30, 2); wait some time INSERT INTO sales VALUES (NULL, 'Pear', 12.20, 4); wait some time INSERT INTO sales VALUES (NULL, 'Plum', 4.85, 3); SELECT * from sales_mvl; SELECT * from sales_v; CALL refresh_mv('REFRESH', '2006-11-06 10:57:55', @rc); SELECT * from sales_v; CALL refresh_mv('REFRESH FULL', NULL, @rc); SELECT * from sales_v; CALL refresh_mv('REBUILD', NULL, @rc); SELECT * from sales_v;
Some performance benchmarks for our Materialized Views:
For the performance benchmark a set of 100 k sales rows was created. This set of rows should simulate a continuous INSERT flow into our database. The whole test was done when all files and all database table were cache in memory. To avoid side effects during measurement no other activities should happen on this machine.
Load into table sales without any triggers as baseline:
Test | time [s] | Ref. |
---|---|---|
LOAD DATA INFILE | 0.90 | [1] |
Multi row INSERT | 2.85 | [2] |
Single row INSERT | 13.2 | [3] |
Single row INSERT without LOCK TABLE | 15.9 | [4] |
FULL REFRESH of sales_mv | 0.64 | [5] |
SELECT on MV :-) | 0.00 | [6] |
Load into table sales with some Materialized View functionality:
Test | time [s] | Ref. |
---|---|---|
LOAD DATA INFILE with REFRESH IMMEDIATE |
40.8 | [1] |
Single row INSERT without LOCK TABLE with REFRESH IMMEDIATE |
109 | [4] |
Single rows INSERT without LOCK TABLE with REFRESH DEFERRED |
22.8 | [4] |
Refresh MV with about 40% of the data | 0.82 | |
Refresh MV with next 40% of the data | 0.98 | |
Refresh MV with last 20% of the data | 0.14 |
Outlook
- If you have a continuous data flow and/or concurrent transactions it may be better to use InnoDB instead of MyISAM tables.
- Locking the table in the Triggers/Stored Procedure may prevent wrong data this has to be tested.
- Delayed INSERT may help to speed up the load process.
- Eventually it makes sense to build/refresh the Materialized Vied in parallel?
Conclusion
- Triggers in MySQL (5.0.27) are not terribly fast.
- Materialized Views can help to speed up queries which rely heavily on some aggregated results.
- If INSERT speed is not a matter this feature can help to decrease read load on the system.
- It is a little tricky to implement.
- MV come only in play if huge amount of data are used and database are not memory based anymore.
Literature
[ 1 ]LOAD DATA INFILE '/tmp/sales_outfile_100k.dmp' INTO TABLE sales;
[ 2 ]
time mysql -u root test << EOF; TRUNCATE TABLE sales; SOURCE /tmp/sales_multirowinsert_100k.sql EOF
[ 3 ]
time mysql -u root test << EOF; TRUNCATE TABLE sales; SOURCE tmp/sales_extended_100k.sql EOF
[ 4 ]
time mysql -u root test << EOF; TRUNCATE TABLE sales; SOURCE /tmp/sales_extended_nolock_100k.sql EOF
[ 5 ]
INSERT INTO sales_mv SELECT product_name , SUM(product_price), SUM(product_amount) , AVG(product_price), AVG(product_amount) , COUNT(*) FROM sales GROUP BY product_name;
[ 6 ]
SELECT * FROM sales_mv;
[ 7 ] Wikipedia: Materialized Views
Comments
For real MySQL materialized views, try LeapDB
I used to maintain Flexviews, but I ported it to use a native MySQL syntax. Now CREATE INCREMENTAL MATERIALIZED VIEW ... is a reality. Inner join and all aggregation functions are supported. Check out http://www.leapdb.com