Skip to main content
Version: 3.5

Perform SQL-based Analysis on Audit Logs

Query and analyze audit information about all the queries made against your data within your PhoenixAI cluster.

PhoenixAI supports streaming the cluster audit log files into your cluster, allowing you to perform SQL-based analysis on audit logs directly within your cluster.

Common use cases include:

  • Auditing client connections
  • Analyzing how, what, and where queries are coming from
  • Monitoring changes to your databases, tables, and materialized views
  • Tracking the efficiency of your materialized views
  • Observing cluster resource consumption

Enable audit log streaming

Follow these steps to enable streaming for audit logs:

  1. Sign in to the PhoenixAI Cloud console.
  2. In the left-side navigation pane, choose Clusters.
  3. On the Clusters page, click the cluster for which you want to enable audit log streaming.
  4. On the cluster details page, click Open cluster.
  5. On the Connect to your Cluster page, enter your Username and Password for accessing the cluster, and click Sign in to open PhoenixAI Studio.
  6. On the Administrator tab of PhoenixAI Studio, turn on the switch in the Query Audit logs section.

Query audit logs

After the audit log streaming is enabled, PhoenixAI will create a database and table in your cluster to store the audit log records. The catalog, database, and table are default_catalog, starrocks_audit_db__, and starrocks_audit_tbl__. The full path is default_catalog.starrocks_audit_db__.starrocks_audit_tbl__.

You can query the audit data using the PhoenixAI Studio, or the SQL tool of your choice.

important

The audit logs are batch loaded every 60 seconds, so there will be a 60 second delay before audit data is available.

The CREATE TABLE statement for starrocks_audit_tbl__ is as follows:

CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ (
`queryId` VARCHAR(64) COMMENT "Unique ID of the query",
`timestamp` DATETIME NOT NULL COMMENT "Query start time",
`queryType` VARCHAR(12) COMMENT "Query type (query, slow_query, connection)",
`clientIp` VARCHAR(32) COMMENT "Client IP",
`user` VARCHAR(64) COMMENT "Query username",
`authorizedUser` VARCHAR(64) COMMENT "Unique identifier of the user, i.e., user_identity",
`resourceGroup` VARCHAR(64) COMMENT "Resource group name",
`catalog` VARCHAR(32) COMMENT "Catalog name",
`db` VARCHAR(96) COMMENT "Database where the query runs",
`state` VARCHAR(8) COMMENT "Query state (EOF, ERR, OK)",
`errorCode` VARCHAR(512) COMMENT "Error code",
`queryTime` BIGINT COMMENT "Query execution time (milliseconds)",
`scanBytes` BIGINT COMMENT "Number of bytes scanned by the query",
`scanRows` BIGINT COMMENT "Number of rows scanned by the query",
`returnRows` BIGINT COMMENT "Number of rows returned by the query",
`cpuCostNs` BIGINT COMMENT "CPU time consumed by the query (nanoseconds)",
`memCostBytes` BIGINT COMMENT "Memory consumed by the query (bytes)",
`stmtId` INT COMMENT "Incremental ID of the SQL statement",
`isQuery` TINYINT COMMENT "Whether the SQL is a query (1 or 0)",
`feIp` VARCHAR(128) COMMENT "FE IP that executed the statement",
`stmt` VARCHAR(1048576) COMMENT "Original SQL statement",
`digest` VARCHAR(32) COMMENT "Fingerprint of slow SQL",
`planCpuCosts` DOUBLE COMMENT "CPU usage during query planning (nanoseconds)",
`planMemCosts` DOUBLE COMMENT "Memory usage during query planning (bytes)",
`pendingTimeMs` BIGINT COMMENT "Time the query waited in the queue (milliseconds)",
`candidateMVs` VARCHAR(65533) NULL COMMENT "List of candidate materialized views",
`hitMvs` VARCHAR(65533) NULL COMMENT "List of matched materialized views",
`QueriedRelations` ARRAY<VARCHAR(65533)> NULL COMMENT "List of directly referenced tables and views",
`warehouse` VARCHAR(32) NULL COMMENT "Warehouse name"
) ENGINE = OLAP
DUPLICATE KEY (`queryId`, `timestamp`, `queryType`)
COMMENT "Audit log table"
PARTITION BY date_trunc('day', `timestamp`)
PROPERTIES (
"replication_num" = "1",
"partition_live_number" = "30"
);

Perform SQL-based analysis on audit logs

This section provides some examples of SQL-based analysis on audit logs.

Analyze CPU consumption

The following SQL aggregates total CPU time per user (cpuCostNs) over the last 30 days, converts it to seconds, and calculates the percentage of total CPU usage.

SELECT
user,
SUM(cpuCostNs) / 1e9 AS total_cpu_seconds, -- Query the total CPU time.
(
SUM(cpuCostNs) /
(
SELECT SUM(cpuCostNs)
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF','OK')
AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
)
) * 100 AS cpu_usage_percentage -- Calculate the percentage of total CPU usage per user.
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF','OK') -- Include queries that are finished only.
AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Query the data of the last 30 days.
GROUP BY user
ORDER BY total_cpu_seconds DESC
LIMIT 20; -- List the top 20 users with the most CPU resource consumption.

Analyze memory usage

The following SQL computes the maximum memory usage per user (memCostBytes) for a single query over the last 30 days.

SELECT
user,
MAX(memCostBytes) / (1024 * 1024) AS max_mem_mb -- Max memory usage (in MB) per query.
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF','OK') -- Include queries that are finished only.
AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Query the data of the last 30 days.
GROUP BY user
ORDER BY max_mem_mb DESC
LIMIT 20; -- List the top 20 users with the most memory resource consumption.

Analyze query concurrency

The following SQL analyzes per-minute query concurrency over the last 30 days and extracts the maximum observed concurrency per user.

WITH UserConcurrency AS (
SELECT
user,
DATE_FORMAT(timestamp, '%Y-%m-%d %H:%i') AS minute_bucket,
COUNT(*) AS query_concurrency
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF', 'OK') -- Include queries that are finished only.
AND timestamp >= DATE_SUB(NOW(), INTERVAL 30 DAY) -- Query the data of the last 30 days.
AND LOWER(stmt) LIKE '%select%' -- Include SELECT statements only.
GROUP BY user, minute_bucket
HAVING query_concurrency > 1 -- Exclude scenarios where concurrency is less than one query per minute.
)
SELECT
user,
minute_bucket,
query_concurrency / 60.0 AS query_concurrency_per_second -- Query the per-second concurrency.
FROM (
SELECT
user,
minute_bucket,
query_concurrency,
ROW_NUMBER() OVER (
PARTITION BY user
ORDER BY query_concurrency DESC
) AS rn
FROM UserConcurrency
) ranked
WHERE rn = 1 -- Keep the highest record for each user.
ORDER BY query_concurrency_per_second DESC
LIMIT 50; -- List the top 50 users with the highest concurrency.

Analyze memory usage of materialized view refresh operations

The following SQL identifies memory-intensive materialized view refresh operations, typically characterized by INSERT OVERWRITE statements.

SELECT
user,
MAX(memCostBytes) / (1024 * 1024) AS max_mem_mb -- Max memory usage (in MB) per query.
FROM starrocks_audit_db__.starrocks_audit_tbl__
WHERE state IN ('EOF','OK') -- Include queries that are finished only.
AND timestamp >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Query the data of the last 30 days.
AND LOWER(stmt) LIKE '%insert overwrite%' -- Include materialized view refresh operations only.
GROUP BY user
ORDER BY max_mem_mb DESC
LIMIT 20; -- List the top 20 users with the most memory resource consumption.