Note: all links are in Chinese

Mobike

  1. Use case: aggregate isolated MySQL clusters for OLAP. They run TiSpark on top of it

  2. Production TiDb cluster has tens of nodes, with tens of TBs of data.

Ele.me

  1. To keep the production table size in check, they regularly purge production tables and move old entries to separate archive tables

  2. Need to watch out for the performance issues of DDL operations in production and archive tables

  3. For migration, they use the official TiDB syncer tool to run TiDB as a Mysql read slave

  4. Daily archived > 100 mil rows, > 100 G. Current TiDB volume “tens of TBs”

  5. As the post was written, TiDB Binlog relies on Kafka

Ping++ - Payment SaaS

  1. Use TiDB as dataware house. It replaces AliCloud’s ADS and ES

  2. ADS has cost issue, and ES has difficulty handling complex queries with high dev/op costs.

  3. Current cluster 5 nodes, each with 16 cores and 32G ram

A Truck Fleet Management SaaS

  1. Sync data from Alicloud’s DRDS to TiDB, and runs TiSpark against TiKV, TiDB’s storage layer

  2. 2T raw data incoming per day.

A Restaurant Merchant/Order/Cashier Saas

  1. TiDB supports the operational datastore. Check the link to see the test queries they run.

  2. Before: RDS -> Mongo via Kafka -> Hive. After: RDS -> TiDB via Kafka -> Hive. TiSpark queries both TiDB and Hive

  3. Cluster setup: 8 nodes. 5 of them are for storage layer. Each TiKV/Storage node is 16 core/128 G ram with 2 1.8T SSD

  4. Peak QPS 23K. Data volume “couple of Ts”

Another Restaurant Merchant/Cashier SaaS

  1. Near real time complex queries. TiDB replaces Solr.

  2. Current deployment 8 nodes. Storage layer on 16 cores and 64G ram

An IoT company

  1. They need real time analysis capabilities.

  2. Their raw data source is Mysql databases, but they don’t want to spend too much effort on Mysql -> Hive/Hbase ETL pipeline.

  3. They need Spark support, so they choose TiSpark + TiDb