Franck Pachot on Feedspot

Custom SQL Scripts in PostgreSQL PgBench

Franck Pachot

by Franck Pachot

4h ago

PgBench is a popular tool for testing PostgreSQL database performance. Still, its default 'TPC-B-like' workload, which involves many roundtrips and context switches, may not effectively reflect actual performance as it could skew results. To overcome this, users can employ custom SQL within pgbench to design a load test that more accurately simulates real-world scenarios and identifies potential bottlenecks. This applies to PostgreSQL and Postgres-compatible databases like Aurora or YugabyteDB. I’ll demo on YugabyteDB. You can skip to “Step 3: Create the schema” if you already have a database ..read more

Visit website

Snapshot too old in YugabyteDB

Franck Pachot

by Franck Pachot

4h ago

SQL databases store the current state and enough information to read about a previous state with Multi-Version Concurrency Control. Keeping all change records would not be scalable, so we only keep enough history for the oldest ongoing transaction. There are two possibilities when long ongoing transactions are running: Let history grow. During VACUUM, PostgreSQL keeps old records that are more recent than the database transaction horizon, allowing bloat to persist. Fail long transactions after a time limit, like Oracle undo_retention or YugabyteDB timestamp_history_retention_interval_sec The ..read more

Visit website

Simulate Clock Skew in Docker Container

Franck Pachot

by Franck Pachot

3d ago

In real deployments, without atomic clocks, the time synchronized by NTP can drift, and servers in a distributed system can show a clock skew of hundreds of milliseconds. A simple way to test this in a Docker lab is to fake the clock_gettime function. Here is an example with a 2-node RF1 YugabyteDB cluster (PostgreSQL-compatible Distributed SQL database). I create a yb network and start the first node, yb1 in the background: docker network create yb docker run -d --rm --network yb --hostname yb1 -p 7000:7000 yugabytedb/yugabyte yugabyted start --background=false --tserver_flags="TEST_docdb_ ..read more

Visit website

MongoDB Associate Data Modeler Exam

Franck Pachot

by Franck Pachot

1w ago

I am a fan of SQL and relational databases, but data modeling should begin with a platform-independent analysis, and the objectives of any database, SQL or NoSQL, are usually the same. As an AWS Data Hero, I was fortunate to receive a complimentary voucher for MongoDB certification exams, and decided to pass the MongoDB Associate Data Modeler Exam. I'm sharing my experience and feedback on the exam preparation and the exam itself. The Preparation The MongoDB Data Modeling Path is the best place to start. It provides some courses, questions, and labs for free. Because I have a lot of experience ..read more

Visit website

"ERROR: Perform RPC timed out after 600.000s" in YSQL

Franck Pachot

by Franck Pachot

2w ago

When executing a SQL statement, the YugabyteDB query layer (known as YSQL and using PostgreSQL) sends read and write operations to the storage layer (DocDB). They are remote calls (RPC) and must time out if it takes too long, for whatever reasons. This is where you can experience such error: ERROR: Perform RPC (request call id 6171) to <ip_redacted>:9100 timed out after 602.000s. The parameters that control this timeout: client_read_write_timeout_ms cluster flag for YCQL and YSQL ysql_client_read_write_timeout_ms cluster flag for YSQL statement_timeout session parameter (PostgreSQL) har ..read more

Visit website

PostgreSQL with modern storage: what about a lower random_page_cost?

Franck Pachot

by Franck Pachot

3w ago

The topic was widely discussed last week, in favor of a lower value for random_page_cost: 100x Faster Query in Aurora Postgres with a lower random_page_cost when we migrated ~1TB DB from heroku -> AWS, everything broke Use random_page_cost = 1.1 on modern servers The random_page_cost parameter accounts for the latency incurred by random reads, particularly from indexes with poor correlation factors. Its default value is 4. On the other hand, the seq_page_cost parameter accounts for the lower latency incurred by contiguous page reads, where many pages are read without additional seek. Its ..read more

Visit website

B-Tree vs. LSM-Tree: measuring the write amplification on Oracle Database and YugabyteDB

Franck Pachot

by Franck Pachot

1M ago

Databases maintain indexes to find a value or a range of values without scanning the whole table. The index is physically organized and ordered by the values you may look for so that each value has a logical place to find it. With a tree structure over the index entries, the database can find this place with a few random reads for a minimal read amplification. There are two major index structures used in databases: B-Tree: The tree's small height, which increases logarithmically with the number of index entries, minimizes the read amplification required to find the index entry. However, maint ..read more

Visit website

YugabyteDB Transactional Load with Non-transactional COPY

Franck Pachot

by Franck Pachot

1M ago

By default, YugabyteDB COPY does intermediate commits every 20000 rows: yugabyte=# show yb_default_copy_from_rows_per_transaction; yb_default_copy_from_rows_per_transaction ------------------------------------------- 20000 (1 row) yugabyte=# show yb_disable_transactional_writes; yb_disable_transactional_writes --------------------------------- on (1 row) Let's take an example with the following table: yugabyte=# create table loaded_data ( id bigserial, data text ); CREATE TABLE I set statement_timeout to 5 seconds to simulate a failure before the end, and load some rows: yugabyt ..read more

Visit website

Out of Range statistics with PostgreSQL & YugabyteDB

Franck Pachot

by Franck Pachot

2M ago

There can be issues with optimizer statistics when a query contains a predicate with a value that is out of range, such as higher than the maximum value gathered during the last ANALYZE. If statistics were updated in real-time, it would be easy for the planner to estimate the number of rows to zero or one (the planner never sets zero). However, statistics are gathered with ANALYZE or Auto-Analyze and can become stale until the next run. Typically, there are always a few rows above the known maximum on columns that are constantly increasing, such as sequences or current timestamps. Oracle Datab ..read more

Visit website

Best Practice: use the same datatypes for comparisons, like joins and foreign keys

Franck Pachot

by Franck Pachot

2M ago

It is essential to ensure that columns in two tables with referential integrity or joined together have the same data type. Even if there is an implicit casting and the performance appears good, some corner cases may cause issues. For instance, in YugabyteDB, buffering is used to maintain high performance and overcome the inherent latency of distributed transactions. However, it is crucial to ensure that the behavior is not changed when buffering reads and writes. Special care must be taken to buffer the operations when a function, expression, or type casting occurs. Here are two examples of b ..read more

Visit website

Follow Franck Pachot on FeedSpot