TiDB in OLAP
Note: all links are in Chinese
-
Use case: aggregate isolated MySQL clusters for OLAP. They run TiSpark on top of it
-
Production TiDb cluster has tens of nodes, with tens of TBs of data.
-
To keep the production table size in check, they regularly purge production tables and move old entries to separate archive tables
-
Need to watch out for the performance issues of DDL operations in production and archive tables
-
For migration, they use the official TiDB syncer tool to run TiDB as a Mysql read slave
-
Daily archived > 100 mil rows, > 100 G. Current TiDB volume “tens of TBs”
-
As the post was written, TiDB Binlog relies on Kafka
-
Use TiDB as dataware house. It replaces AliCloud’s ADS and ES
-
ADS has cost issue, and ES has difficulty handling complex queries with high dev/op costs.
-
Current cluster 5 nodes, each with 16 cores and 32G ram
-
Sync data from Alicloud’s DRDS to TiDB, and runs TiSpark against TiKV, TiDB’s storage layer
-
2T raw data incoming per day.
A Restaurant Merchant/Order/Cashier Saas
-
TiDB supports the operational datastore. Check the link to see the test queries they run.
-
Before: RDS -> Mongo via Kafka -> Hive. After: RDS -> TiDB via Kafka -> Hive. TiSpark queries both TiDB and Hive
-
Cluster setup: 8 nodes. 5 of them are for storage layer. Each TiKV/Storage node is 16 core/128 G ram with 2 1.8T SSD
-
Peak QPS 23K. Data volume “couple of Ts”
Another Restaurant Merchant/Cashier SaaS
-
Near real time complex queries. TiDB replaces Solr.
-
Current deployment 8 nodes. Storage layer on 16 cores and 64G ram
-
They need real time analysis capabilities.
-
Their raw data source is Mysql databases, but they don’t want to spend too much effort on Mysql -> Hive/Hbase ETL pipeline.
-
They need Spark support, so they choose TiSpark + TiDb