PHP - MySQL running OOM with large result sets
BearbeitenThe issue
BearbeitenWhen fetching large MySQL result sets, PHP webservice is running into out-of-memory (OOM) errors. The default limit for webservices is currently 4 GB vmem, but this doesn't matter here.
- Question: What is causing PHP to consume this amount of memory? And what can be done ?
- Hypothesis: SQL result sets are too large. To save PHP memory, it's better to use unbuffered queries to retrieve one data row after another from the server, instead of fetching all at once into client memory (= buffered = default for plain mysqli queries).
Conclusion
Bearbeiten- The high memory consumptiom of PHP is mainly caused by the resulting array, not by the retrieved sql result set.
- Enforcing the use of unbuffered queries, doesn't help anything as long as you have to store the results. Memory usage just switches from memory (buffer) to memory (variable).
- As an alternative, an unbuffered query can be processed row by row, without storing it. But this will greatly slow down things.
- Buffering is a way to quickly execute SQL queries and free up resources on the MySQL server, as the server is much faster than the client.
- The inital difference between normal mysql query and prepared statement is caused by the way datatypes are retrieved. With mysql query MySQL server converts all SQL datatypes to string before sending them to client, while prepared statement tries to keep SQL datatypes (as long as they match to PHP datatypes). After casting the variables, this difference is gone.
* Include only columns you really need * Reduce dimensions/size of storage array(s) * Cast variables to numeric type (if applicable)
Test setup
Bearbeiten- randomly chosen editor with 107,803 edits on dewiki
- result stored in 2-dimensional PHP array
- script as shown below
Test results
Bearbeiten- size of query result exported as plain file: 11 MB
Test modifications | normal mysql query | prepared statement |
---|---|---|
buffered (store_result) | 161.00 MB | 155.25 MB |
unbuffered (use_result) | 161.00 MB | 155.25 MB |
casting variables in PHP result array 1 | 148.00 MB | 148.00 MB |
reducing result array to 1 dimension 2 | 38.25 MB | 38.25 MB |
no array as result, but one large string 3 | 11.25 MB | 11.25 MB |
adding a column with blank content 4 | 187.50 MB | 181.75 MB |
1 Modification: casting suitable variables to number type integer
$foo[] = array(
(int)$row->rev_timestamp,
$row->page_title,
(int)$row->page_namespace,
$row->rev_comment,
);
2 Modification: reducing result array to 1 dimension
$foo[] = $row->rev_timestamp . $row->page_title .
$row->page_namespace . $row->rev_comment;
3 Modification: no array as result, but one large string
$foo .= $row->rev_timestamp . $row->page_title .
$row->page_namespace . $row->rev_comment;
4 Modification: adding a column with blank content
SELECT UNIX_TIMESTAMP(rev_timestamp) as rev_timestamp, page_title, page_namespace, rev_comment, '' as blank_column
$foo[] = array(
$row->rev_timestamp,
$row->page_title,
$row->page_namespace,
$row->rev_comment,
$row->blank_column,
);
Script
BearbeitenFor simplification, all error handlers are removed.
<?php
// Set memory limit ( default = 128 MB )
ini_set("memory_limit", "512M" );
// Read DB credentials
$inifile = "../../replica.my.cnf";
$iniVal = parse_ini_file($inifile);
$dbUser = $iniVal["user"];
$dbPwd = $iniVal["password"];
unset($iniVal);
// Create new mysqli Object
$mysqli = new mysqli("s5.labsdb",$dbUser, $dbPwd, "dewiki_p");
$mysqli->set_charset("utf8");
// Define the SQL query
$query = "
SELECT UNIX_TIMESTAMP(rev_timestamp) as rev_timestamp, page_title, page_namespace, rev_comment
/*SLOW_OK RUN_LIMIT 60 NM*/
FROM revision_userindex
JOIN page ON page_id = rev_page
WHERE rev_user = '226562'
ORDER BY rev_timestamp ASC
";
// Get memory usage before query
$m1 = memory_get_usage(true);
// The alternative calls. commented out if not to be run
$ff = sqli_query( $mysqli, $query, MYSQLI_USE_RESULT );
$ff = sqli_query( $mysqli, $query, MYSQLI_STORE_RESULT );
$ff = sqli_stmt( $mysqli, $query, $store_result=false );
$ff = sqli_stmt( $mysqli, $query, $store_result=true );
// Get memory usage after query
$m2 = memory_get_usage(true);
// Output the results
printf("\n num: ".count($ff)." records \n\n");
printf("$m1 \n $m2");
// Functions
// A. Plain vanilla mysqli query
function sqli_query( $mysqli, $query, $mode ){
if ( $result = $mysqli->query( $query, $mode) ){
while( $row = $result->fetch_object() ){
$foo[] = array(
$row->rev_timestamp,
$row->page_title,
$row->page_namespace,
$row->rev_comment,
);
}
$result->close();
}
return $foo;
}
// B. prepared statement
function sqli_stmt( $mysqli, $query, $mode ){
if ( $stmt = $mysqli->prepare( $query ) ){
# $stmt->bind_param('i', $user_id ); // in this example no param to bind
$stmt->execute();
$stmt->bind_result($rev_timestamp, $page_title, $page_namespace, $rev_comment);
if ( $mode ) {
$stmt->store_result();
}
while( $stmt->fetch() ){
$foo[] = array(
$rev_timestamp,
$page_title,
$page_namespace,
$rev_comment
);
}
$stmt->free_result();
$stmt->close();
}
return $foo;
}