Custom SQL Scripts in PostgreSQL PgBench
Franck Pachot
by Franck Pachot
4h ago
PgBench is a popular tool for testing PostgreSQL database performance. Still, its default 'TPC-B-like' workload, which involves many roundtrips and context switches, may not effectively reflect actual performance as it could skew results. To overcome this, users can employ custom SQL within pgbench to design a load test that more accurately simulates real-world scenarios and identifies potential bottlenecks. This applies to PostgreSQL and Postgres-compatible databases like Aurora or YugabyteDB. I’ll demo on YugabyteDB. You can skip to “Step 3: Create the schema” if you already have a database ..read more
Visit website
Snapshot too old in YugabyteDB
Franck Pachot
by Franck Pachot
4h ago
SQL databases store the current state and enough information to read about a previous state with Multi-Version Concurrency Control. Keeping all change records would not be scalable, so we only keep enough history for the oldest ongoing transaction. There are two possibilities when long ongoing transactions are running: Let history grow. During VACUUM, PostgreSQL keeps old records that are more recent than the database transaction horizon, allowing bloat to persist. Fail long transactions after a time limit, like Oracle undo_retention or YugabyteDB timestamp_history_retention_interval_sec The ..read more
Visit website
Simulate Clock Skew in Docker Container
Franck Pachot
by Franck Pachot
3d ago
In real deployments, without atomic clocks, the time synchronized by NTP can drift, and servers in a distributed system can show a clock skew of hundreds of milliseconds. A simple way to test this in a Docker lab is to fake the clock_gettime function. Here is an example with a 2-node RF1 YugabyteDB cluster (PostgreSQL-compatible Distributed SQL database). I create a yb network and start the first node, yb1 in the background: docker network create yb docker run -d --rm --network yb --hostname yb1 -p 7000:7000 yugabytedb/yugabyte yugabyted start --background=false --tserver_flags="TEST_docdb_ ..read more
Visit website
MongoDB Associate Data Modeler Exam
Franck Pachot
by Franck Pachot
1w ago
I am a fan of SQL and relational databases, but data modeling should begin with a platform-independent analysis, and the objectives of any database, SQL or NoSQL, are usually the same. As an AWS Data Hero, I was fortunate to receive a complimentary voucher for MongoDB certification exams, and decided to pass the MongoDB Associate Data Modeler Exam. I'm sharing my experience and feedback on the exam preparation and the exam itself. The Preparation The MongoDB Data Modeling Path is the best place to start. It provides some courses, questions, and labs for free. Because I have a lot of experience ..read more
Visit website
"ERROR: Perform RPC timed out after 600.000s" in YSQL
Franck Pachot
by Franck Pachot
2w ago
When executing a SQL statement, the YugabyteDB query layer (known as YSQL and using PostgreSQL) sends read and write operations to the storage layer (DocDB). They are remote calls (RPC) and must time out if it takes too long, for whatever reasons. This is where you can experience such error: ERROR: Perform RPC (request call id 6171) to <ip_redacted>:9100 timed out after 602.000s. The parameters that control this timeout: client_read_write_timeout_ms cluster flag for YCQL and YSQL ysql_client_read_write_timeout_ms cluster flag for YSQL statement_timeout session parameter (PostgreSQL) har ..read more
Visit website
PostgreSQL with modern storage: what about a lower random_page_cost?
Franck Pachot
by Franck Pachot
3w ago
The topic was widely discussed last week, in favor of a lower value for random_page_cost: 100x Faster Query in Aurora Postgres with a lower random_page_cost when we migrated ~1TB DB from heroku -> AWS, everything broke Use random_page_cost = 1.1 on modern servers The random_page_cost parameter accounts for the latency incurred by random reads, particularly from indexes with poor correlation factors. Its default value is 4. On the other hand, the seq_page_cost parameter accounts for the lower latency incurred by contiguous page reads, where many pages are read without additional seek. Its ..read more
Visit website
B-Tree vs. LSM-Tree: measuring the write amplification on Oracle Database and YugabyteDB
Franck Pachot
by Franck Pachot
1M ago
Databases maintain indexes to find a value or a range of values without scanning the whole table. The index is physically organized and ordered by the values you may look for so that each value has a logical place to find it. With a tree structure over the index entries, the database can find this place with a few random reads for a minimal read amplification. There are two major index structures used in databases: B-Tree: The tree's small height, which increases logarithmically with the number of index entries, minimizes the read amplification required to find the index entry. However, maint ..read more
Visit website
YugabyteDB Transactional Load with Non-transactional COPY
Franck Pachot
by Franck Pachot
1M ago
By default, YugabyteDB COPY does intermediate commits every 20000 rows: yugabyte=# show yb_default_copy_from_rows_per_transaction; yb_default_copy_from_rows_per_transaction ------------------------------------------- 20000 (1 row) yugabyte=# show yb_disable_transactional_writes; yb_disable_transactional_writes --------------------------------- on (1 row) Let's take an example with the following table: yugabyte=# create table loaded_data ( id bigserial, data text ); CREATE TABLE I set statement_timeout to 5 seconds to simulate a failure before the end, and load some rows: yugabyt ..read more
Visit website
Out of Range statistics with PostgreSQL & YugabyteDB
Franck Pachot
by Franck Pachot
2M ago
There can be issues with optimizer statistics when a query contains a predicate with a value that is out of range, such as higher than the maximum value gathered during the last ANALYZE. If statistics were updated in real-time, it would be easy for the planner to estimate the number of rows to zero or one (the planner never sets zero). However, statistics are gathered with ANALYZE or Auto-Analyze and can become stale until the next run. Typically, there are always a few rows above the known maximum on columns that are constantly increasing, such as sequences or current timestamps. Oracle Datab ..read more
Visit website
Best Practice: use the same datatypes for comparisons, like joins and foreign keys
Franck Pachot
by Franck Pachot
2M ago
It is essential to ensure that columns in two tables with referential integrity or joined together have the same data type. Even if there is an implicit casting and the performance appears good, some corner cases may cause issues. For instance, in YugabyteDB, buffering is used to maintain high performance and overcome the inherent latency of distributed transactions. However, it is crucial to ensure that the behavior is not changed when buffering reads and writes. Special care must be taken to buffer the operations when a function, expression, or type casting occurs. Here are two examples of b ..read more
Visit website

Follow Franck Pachot on FeedSpot

Continue with Google
Continue with Apple
OR