Loading...

Follow Percona Database Performance Blog | MongoDB on Feedspot

Continue with Google
Continue with Facebook
or

Valid

A couple of weeks ago I presented at Percona University São Paulo about the new features in PostgreSQL that allow the deployment of simple shards. I’ve tried to summarize the main points in this post, as well as providing an introductory overview of sharding itself. Please note I haven’t included any third-party extensions that provide sharding for PostgreSQL in my discussion below.

Partitioning in PostgreSQL

In a nutshell, until not long ago there wasn’t a dedicated, native feature in PostgreSQL for table partitioning. Not that that prevented people from doing it anyway: the PostgreSQL community is very creative. There’s a table inheritance feature in PostgreSQL that allows the creation of child tables with the same structure as a parent table. That, combined with the employment of proper constraints in each child table along with the right set of triggers in the parent table, has provided practical “table partitioning” in PostgreSQL for years (and still works). Here’s an example:

Using table inheritance

CREATE TABLE temperature (
  id BIGSERIAL PRIMARY KEY NOT NULL,
  city_id INT NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  temp DECIMAL(5,2) NOT NULL
);

Figure 1a. Main (or parent) table

CREATE TABLE temperature_201901 (CHECK (timestamp >= DATE '2019-01-01' AND timestamp <= DATE '2019-01-31')) INHERITS (temperature);
CREATE TABLE temperature_201902 (CHECK (timestamp >= DATE '2019-02-01' AND timestamp <= DATE '2019-02-28')) INHERITS (temperature);
CREATE TABLE temperature_201903 (CHECK (timestamp >= DATE '2019-03-01' AND timestamp <= DATE '2019-03-31')) INHERITS (temperature);

Figure 1b. Child tables inherit the structure of the parent table and are limited by constraints

CREATE OR REPLACE FUNCTION temperature_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.timestamp >= DATE '2019-01-01' AND NEW.timestamp <= DATE '2019-01-31' ) THEN INSERT INTO temperature_201901 VALUES (NEW.*);
    ELSIF ( NEW.timestamp >= DATE '2019-02-01' AND NEW.timestamp <= DATE '2019-02-28' ) THEN INSERT INTO temperature_201902 VALUES (NEW.*);
    ELSIF ( NEW.timestamp >= DATE '2019-03-01' AND NEW.timestamp <= DATE '2019-03-31' ) THEN INSERT INTO temperature_201903 VALUES (NEW.*);
    ELSE RAISE EXCEPTION 'Date out of range!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

Figure 1c. A function that controls in which child table a new entry should be added according to the timestamp field

CREATE TRIGGER insert_temperature_trigger
    BEFORE INSERT ON temperature
    FOR EACH ROW EXECUTE PROCEDURE temperature_insert_trigger();

Figure 1d. A trigger is added to the parent table that calls the function above when an INSERT is performed

The biggest drawbacks for such an implementation was related to the amount of manual work needed to maintain such an environment (even though a certain level of automation could be achieved through the use of 3rd party extensions such as pg_partman) and the lack of optimization/support for “distributed” queries. The PostgreSQL optimizer wasn’t advanced enough to have a good understanding of partitions at the time, though there were workarounds that could be used such as employing constraint exclusion.

Declarative partitioning

About 1.5 year ago, PostgreSQL 10 was released with a bunch of new features, among them native support for table partitioning through the new declarative partitioning feature. Here’s how we could partition the same temperature table using this new method:

CREATE TABLE temperature (
  id BIGSERIAL NOT NULL,
  city_id INT NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  temp DECIMAL(5,2) NOT NULL
) PARTITION BY RANGE (timestamp);

Figure 2a. Main table structure for a partitioned table

CREATE TABLE temperature_201901 PARTITION OF temperature FOR VALUES FROM ('2019-01-01') TO ('2019-02-01');
CREATE TABLE temperature_201902 PARTITION OF temperature FOR VALUES FROM ('2019-02-01') TO ('2019-03-01');
CREATE TABLE temperature_201903 PARTITION OF temperature FOR VALUES FROM ('2019-03-01') TO ('2019-04-01');

Figure 2b. Tables defined as partitions of the main table; with declarative partitioning, there was no need for triggers anymore.

It still missed the greater optimization and flexibility needed to consider it a complete partitioning solution. It wasn’t possible, for example, to perform an UPDATE that would result in moving a row from one partition to a different one, but the foundation had been laid. Fast forward another year and PostgreSQL 11 builds on top of this, delivering additional features like:

  • the possibility to define a default partition, to which any entry that wouldn’t fit a corresponding partition would be added to.
  • having indexes added to the main table “replicated” to the underlying partitions, which improved declarative partitioning usability.
  • support for Foreign Keys

These are just a few of the features that led to a more mature partitioning solution.

Sharding in PostgreSQL

By now you might be reasonably questioning my premise, and that partitioning is not sharding, at least not in the sense and context you would have expected this post to cover. In fact, PostgreSQL has implemented sharding on top of partitioning by allowing any given partition of a partitioned table to be hosted by a remote server. The basis for this is in PostgreSQL’s Foreign Data Wrapper (FDW) support, which has been a part of the core of PostgreSQL for a long time. While technically possible to implement, we just couldn’t make practical use of it for sharding using the table inheritance + triggers approach. Declarative partitioning allowed for much better integration of these pieces making sharding – partitioned tables hosted by remote servers – more of a reality in PostgreSQL.

CREATE TABLE temperature_201904 (
  id BIGSERIAL NOT NULL,
  city_id INT NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  temp DECIMAL(5,2) NOT NULL
);

Figure 3a. On the remote server we create a “partition” – nothing but a simple table

CREATE EXTENSION postgres_fdw;
GRANT USAGE ON FOREIGN DATA WRAPPER postgres_fdw to app_user;
CREATE SERVER shard02 FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (dbname 'postgres', host 'shard02', port
    '5432');
CREATE USER MAPPING for app_user SERVER shard02
    OPTIONS (user 'fdw_user', password 'secret');

Figure 3b. On the local server the preparatory steps involve loading the postgres_fdw extension, allowing our local application user to use that extension, creating an entry to access the remote server, and finally mapping that user with a user in the remote server (fdw_user) that has local access to the table we’ll use as a remote partition.

CREATE FOREIGN TABLE temperature_201904 PARTITION OF temperature
    FOR VALUES FROM ('2019-04-01') TO ('2019-05-01')
    SERVER remoteserver01;

Figure 3c. Now it’s simply a matter of creating a proper partition of our main table in the local server that will be linked to the table of the same name in the remote server

You can read more about postgres_fdw in Foreign Data Wrappers in PostgreSQL and a closer look at postgres_fdw.

When does it make sense to partition a table?

There are a several principle reasons to partition a table:

  1. When a table grows so big that searching it becomes impractical even with the help of indexes (which will invariably become too big as well).
  2. When data management is such that the target data is often the most recently added and/or older data is constantly being purged/archived, or even not being searched anymore (at least not as often).
  3. If you are loading data from different sources and maintaining it as a data warehousing for reporting and analytics.
  4. For a less expensive archiving or purging of massive data that avoids exclusive locks on the entire table.
When should we resort to sharding?

Here are a couple of classic cases:

  1. To scale out (horizontally), when even after partitioning a table the amount of data is too great or too complex to be processed by a single server.
  2. Use cases where the data in a big table can be divided into two or more segments that would benefit the majority of the search patterns. A common example used to describe a scenario like this is that of a company whose customers are evenly spread across the United States and searches to a target table involves the customer ZIP code. A shard then could be used to host entries of customers located on the East coast and another for customers on the West coast.

Note though this is by no means an extensive list.

How should we shard the data?

With sharding (in this context) being “distributed” partitioning, the essence for a successful (performant) sharded environment lies in choosing the right shard key – and by “right” I mean one that will distribute your data across the shards in a way that will benefit most of your queries. In the example above, using the customer ZIP code as shard key makes sense if an application will more often be issuing queries that will hit one shard (East) or the other (West). However, if most queries would filter by, say, birth date, then all queries would need to be run through all shards to recover the full result set. This could easily backfire on performance with the shard approach, by not selecting the right shard key or simply by having such a heterogeneous workload that no shard key would be able to satisfy it.

It only ever makes sense to shard if the nature of the queries involving the target table(s) is such that distributed processing will be the norm and constitute an advantage far greater than any overhead caused by a minority of queries that rely on JOINs involving multiple shards. Due to the distributed nature of sharding such queries will necessarily perform worse if compared to having them all hosted on the same server.

Why not simply rely on replication or clustering?

Sharding should be considered in those situations where you can’t efficiently break down a big table through data normalization or use an alternative approach and maintaining it on a single server is too demanding. The table is then partitioned and the partitions distributed across different servers to spread the load across many servers. It doesn’t need to be one partition per shard, often a single shard will host a number of partitions.

Note how sharding differs from traditional “share all” database replication and clustering environments: you may use, for instance, a dedicated PostgreSQL server to host a single partition from a single table and nothing else. However, these data scaling technologies may well complement each other: a PostgreSQL database may host a shard with part of a big table as well as replicate smaller tables that are often used for some sort of consultation (read-only), such as a price list, through logical replication.

How does sharding in PostgreSQL relates to sharding in MongoDB®?

MongoDB® tackles the matter of managing big collections straight through sharding: there is no concept of local partitioning of collections in MongoDB. In fact, the whole MongoDB scaling strategy is based on sharding, which takes a central place in the database architecture. As such, the sharding process has been made as transparent to the application as possible: all a DBA has to do is to define the shard key.

Instead of connecting to a reference database server the application will connect to an auxiliary router server named mongos which will process the queries and request the necessary information to the respective shard. It knows which shard contains what because they maintain a copy of the metadata that maps chunks of data to shards, which they get from a config server, another important and independent component of a MongoDB sharded cluster. Together, they also play a role in maintaining good data distribution across the shards, actively splitting and migrating chunks of data between servers as needed.

In PostgreSQL the application will connect and query the main database server. There isn’t an intermediary router such as the mongos but PostgreSQL’s query planner will process the query and create an execution plan. When data requested from a partitioned table is found on a remote server PostgreSQL will request the data from it, as shown in the EXPLAIN output below:

…
   Remote SQL: UPDATE public.emp SET sal = $2 WHERE ctid = $1
   ->  Nested Loop  (cost=100.00..300.71 rows=669 width=118)
     	Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, (emp.sal * '1.1'::double precision), emp.comm, emp.deptno, emp.ctid, salgrade.ctid
     	Join Filter: ((emp.sal > (salgrade.losal)::double precision) AND (emp.sal < (salgrade.hisal)::double precision)) ->  Foreign Scan on public.emp  (cost=100.00..128.06 rows=602 width=112)
           	Output: emp.empno, emp.ename, emp.job, emp.mgr, emp.hiredate, emp.sal, emp.comm, emp.deptno, emp.ctid
…

Figure 4: excerpt of an EXPLAIN plan that involves processing a query in a remote server

Note in the above query the mention “Remote SQL”. A lot of optimizations have been made in the execution of remote queries in PostgreSQL 10 and 11, which contributed to mature and improve the sharding solution. Among them is support for having grouping and aggregation operations executed on the remote server itself (“push down”) rather than recovering all rows and having them processed locally.

What is missing in PostgreSQL implementation?

There is, however, still room for improvement. In terms of remote execution, reports from the community indicate not all queries are performing as they should. For example, in some cases the PostgreSQL planner is not performing a full push-down, resulting in shards transferring more data than required. Parallel scheduling of queries that touch multiple shards is not yet implemented: for now, the execution is taking place sequentially, one shard at a time, which takes longer to complete. When it comes to the maintenance of partitioned and sharded environments, changes in the structure of partitions are still complicated and not very practical. For example, when you add a new partition to a partitioned table with an appointed default partition you may need to detach the default partition first if it contains rows that would now fit in the new partition, manually move those to the new partition, and finally re-attach the default partition back in place.

But that is all part of a maturing technology. We’re looking forward to PostgreSQL 12 and what it will bring in the partitioning and sharding fronts.


Image based on photos by Leonardo Quatrocchi from Pexels

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

In this blog post, I’ll walk through using Docker containers and discuss the advantages of using them to run multiple versions of Percona Server for MongoDB (PSMDB) instances in the same server. This is particularly useful for local testing. Lots of companies have started to use containers for their applications, due to the advantages of this model, and due to their hassle-free deployment. I’ll show you how to configure variables for  MongoDB® when using docker containers. Hopefully, this will be a useful introduction that’ll help you to start working with Dockers yourself. Let’s get into the topic…

Experiments and testing…

It is always been fun to work with test cases for our databases. When I do that, I generally use VMs, Dockers or a test a machine – hmm! I know what you think… your boss should give you one if he wants you to solve problems by testing. There are still more things that you could try.  As we are working inside Percona, this might be MongoDB operators (see early release here:  https://www.percona.com/blog/2019/02/21/percona-server-for-mongodb-operator-0-2-1-early-access-release-is-now-available/) which is light weight packed MongoDB instance and these could be used with kubernetes + minikube or docker too to deploy and using it. You can expect more information about how this might work from our folks very soon!

What is Docker?

Here’s a basic Docker architecture diagram:

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Please join Percona’s Product Manager Michael Coburn as he presents his talk Monitoring MongoDB with Percona Monitoring and Management (PMM) on May 16th, 2019 at 10:00 AM PDT (UTC-7) / 1:00 PM EDT (UTC-4).

Register Now

In this talk, learn how to monitor MongoDB using Percona Monitoring and Management (PMM) so that you can:

– Gain greater visibility of performance and bottlenecks MongoDB
– Consolidate your MongoDB servers into the same monitoring platform you already use for MySQL and PostgreSQL
– Respond more quickly and efficiently in Severity 1 issues

We’ll also show how to use PMM’s native support for MongoDB so that you can have MongoDB integrated in only minutes!

In order to learn more, watch our webinar.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

We are pleased to announce the launch of PMM 2.0.0-alpha2, Percona’s second Alpha release of our long-awaited PMM 2 project! In this release, you’ll find support for MongoDB Metrics and Query Analytics – watch for sharp edges as we expect to find a lot of bugs!  We’ve also expanded our existing support of MySQL from our first Alpha to now include MySQL Slow Log as a data source for Query Analytics, which enhances the Query Detail section to include richer query metadata.

  • MongoDB Metrics – You can now launch PMM 2 against MongoDB and gather metrics and query data!
  • MongoDB Query Analytics – Data source from MongoDB Profiler is here!
  • MySQL Query Analytics
    • Queries source – MySQL Slow Log is here!
    • Sorting and more columns – fixed a lot of bugs around UI

PMM 2 is still a work in progress – expect to see bugs and other missing features! We are aware of a number of issues, but please report any and all that you find to Percona’s JIRA.

This release is not recommended for Production environments. PMM 2 Alpha is designed to be used as a new installation – please don’t try to upgrade your existing PMM 1 environment.

MongoDB Query Analytics

We’re proud to announce support for MongoDB Query Analytics in PMM 2.0.0-alpha2!

Using filters you can drill down on specific servers (and other fields):

MongoDB Metrics

In this release we’re including support for MongoDB Metrics, which means you can add a local or remote MongoDB instance to PMM 2 and take advantage of the following view of MongoDB performance:

MySQL Query Analytics Slow Log source

We’ve rounded out our MySQL support to include Slow log – and if you’re using Percona Server with the Extended Slow Log format, you’ll be able to gain deep insight into the performance of individual queries, for example, InnoDB behavior.  Note the difference between the detail available from PERFORMANCE_SCHEMA vs Slow Log:

PERFORMANCE_SCHEMA:

Slow Log:

Installation and configuration

The default PMM Server credentials are:

username: admin
password: admin

Install PMM Server with docker

The easiest way to install PMM Server is to deploy it with Docker. You can run a PMM 2 Docker container with PMM Server by using the following commands (note the version tag of 2.0.0-alpha2):

docker create -v /srv --name pmm-data-2-0-0-alpha2 perconalab/pmm-server:2.0.0-alpha2 /bin/true
docker run -d -p 80:80 -p 443:443 --volumes-from pmm-data-2-0-0-alpha2 --name pmm-server-2.0.0-alpha2 --restart always perconalab/pmm-server:2.0.0-alpha2

Install PMM Client

Since PMM 2 is still not GA, you’ll need to leverage our experimental release of the Percona repository. You’ll need to download and install the official percona-release package from Percona, and use it to enable the Percona experimental component of the original repository.  See percona-release official documentation for further details on this new tool.

Specific instructions for a Debian system are as follows:

wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb

Now enable the correct repo:

sudo percona-release disable all
sudo percona-release enable original experimental

Now install the pmm2-client package:

apt-get update
apt-get install pmm2-client

Users who have previously installed pmm2-client alpha1 version should remove the package and install a new one in order to update to alpha2.

Please note that having experimental packages enabled may affect further packages installation with versions which are not ready for production. To avoid this, disable this component with the following commands:

sudo percona-release disable original experimental
sudo apt-get update

Configure PMM

Once PMM Client is installed, run the pmm-admin setup command with your PMM Server IP address to register your Node within the Server:

# pmm-agent setup --server-insecure-tls --server-address=<IP Address>:443

We will be moving this functionality back to pmm-admin config in a subsequent Alpha release.

You should see the following:

Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.

Adding MySQL Metrics and Query Analytics (Slow Log source)

The syntax to add MySQL services (Metrics and Query Analytics) using the new Slow Log source:

sudo pmm-admin add mysql --use-slowlog --username=pmm --password=pmm

where username and password are credentials for accessing MySQL.

Adding MongoDB Metrics and Query Analytics

You can add MongoDB services (Metrics and Query Analytics) with the following command:

pmm-admin add mongodb --use-profiler --use-exporter  --username=pmm  --password=pmm

You can then check your MySQL and MongoDB dashboards and Query Analytics in order to view your server’s performance information!

We hope you enjoy this release, and we welcome your comments on the blog!

About PMM

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL®, MongoDB®, and PostgreSQL performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL®, MongoDB®, and PostgreSQL® servers to ensure that your data works as efficiently as possible.

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Percona announces the release of Percona Server for MongoDB 3.6.12-3.2 on May 3, 2019. Download the latest version from the Percona website or the Percona software repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 3.6 Community Edition. It supports MongoDB 3.6 protocols and drivers.

Percona Server for MongoDB extends Community Edition functionality by including the Percona Memory Engine storage engine, as well as several enterprise-grade features. Also, it includes MongoRocks storage engine, which is now deprecated. Percona Server for MongoDB requires no changes to MongoDB applications or code.

Bug Fixes
  • PSMDB-343: Building from sources could fail with openssl version 1.1.1.

The Percona Server for MongoDB 3.6.12-3.2 release notes are available in the official documentation.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

We have successfully used ZFS for MySQL backups and MongoDB is no different. Normally, backups will be taken from a hidden secondary, either with

mongodump
 , WT hot backup or filesystem snapshots. In the case of the latter, instead of LVM2, we will use ZFS and discuss potential other benefits. Preparation for initial snapshot

Before taking a ZFS snapshot, it is important to use

db.fsyncLock()
. This allows a consistent on disk copy of the data by blocking writes. It gives the server the time it needs to commit the journal to disk before the snapshot is taken.

My MongoDB instance below is running a ZFS volume and we will take an initial snapshot.

revin@mongodb:~$ sudo zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
zfs-mongo        596M  9.04G    24K  /zfs-mongo
zfs-mongo/data   592M  9.04G   592M  /zfs-mongo/data
revin@mongodb:~$ mongo --port 28020 --eval 'db.serverCmdLineOpts().parsed.storage' --quiet
{
    "dbPath" : "/zfs-mongo/data/m40",
    "journal" : {
        "enabled" : true
    },
    "wiredTiger" : {
        "engineConfig" : {
            "cacheSizeGB" : 0.25
        }
    }
}
revin@mongodb:~$ mongo --port 28020 --eval 'db.fsyncLock()' --quiet
{
    "info" : "now locked against writes, use db.fsyncUnlock() to unlock",
    "lockCount" : NumberLong(1),
...
}
revin@mongodb:~$ sleep 0.6
revin@mongodb:~$ sudo zfs snapshot zfs-mongo/data@full
revin@mongodb:~$ mongo --port 28020 --eval 'db.fsyncUnlock()' --quiet
{
    "info" : "fsyncUnlock completed",
    "lockCount" : NumberLong(0),
...
}

Notice the addition of sleep on line 23 of my command above. This is to ensure that even with the maximum

storage.journal.commitIntervalMs
of 500ms we allow enough time to commit the data to disk. This is simply an extra layer of guarantee and may not be necessary if you have very low journal commit interval.
revin@mongodb:~$ sudo zfs list -t all
NAME                  USED  AVAIL  REFER  MOUNTPOINT
zfs-mongo             596M  9.04G    24K  /zfs-mongo
zfs-mongo/data        592M  9.04G   592M  /zfs-mongo/data
zfs-mongo/data@full   192K      -   592M  -

Now I have a snapshot…

At this point, I have a snapshot I can use for a number of purposes.

  • Replicate a full and delta snapshot to a remote storage or region with tools like zrepl. This allows for an extra layer of redundancy and disaster recovery.
  • Use the snapshots to rebuild, replace or create new secondary nodes or refresh test/development servers regularly.
  • Use the snapshots to do point in time recovery. ZFS snapshots are relatively cost free so it is possible to take snapshots even at five minutes interval! This is actually my favorite use case and feature.

Let’s say we take snapshots every five minutes. If a collection was accidentally dropped or even just a few rows were deleted, we can mount the last snapshot before this event. If the event was discovered in less than five minutes (perhaps that’s unrealistic) we only need to replay less than five minutes of oplog!

Point-in-Time-Recovery

To start a PITR, first clone the snapshot. Cloning the snapshot like below will automatically mount it. We can then start a temporary mongod instance with this mounted directory.

revin@mongodb:~$ sudo zfs clone zfs-mongo/data@full zfs-mongo/data-clone
revin@mongodb:~$ sudo zfs list -t all
NAME                   USED  AVAIL  REFER  MOUNTPOINT
zfs-mongo              606M  9.04G    24K  /zfs-mongo
zfs-mongo/data         600M  9.04G   592M  /zfs-mongo/data
zfs-mongo/data@full   8.46M      -   592M  -
zfs-mongo/data-clone     1K  9.04G   592M  /zfs-mongo/data-clone
revin@mongodb:~$ ./mongodb-linux-x86_64-4.0.8/bin/mongod \
	--dbpath /zfs-mongo/data-clone/m40 \
	--port 28021 --oplogSize 200 --wiredTigerCacheSizeGB 0.25

Once mongod has started, I would like to find out the last oplog event it has completed.

revin@mongodb:~$ mongo --port 28021 local --quiet \
>     --eval 'db.oplog.rs.find({},{ts: 1}).sort({ts: -1}).limit(1)'
{ "ts" : Timestamp(1555356271, 1) }

We can use this timestamp to dump the oplog from the current production and use it to replay on our temporary instance.

revin@mongodb:~$ mkdir ~/mongodump28020
revin@mongodb:~$ cd ~/mongodump28020
revin@mongodb:~/mongodump28020$ mongodump --port 28020 -d local -c oplog.rs \
>     --query '{ts: {$gt: Timestamp(1555356271, 1)}}'
2019-04-16T23:57:50.708+0000	writing local.oplog.rs to
2019-04-16T23:57:52.723+0000	done dumping local.oplog.rs (186444 documents)

Assuming our bad incident occurred 30 seconds from the time this snapshot was taken, we can apply the oplog dump with mongorestore. Be aware, you’d have to identify this from your own oplog.

revin@mongodb:~/mongodump28020$ mv dump/local/oplog.rs.bson dump/oplog.bson
revin@mongodb:~/mongodump28020$ rm -rf dump/local
revin@mongodb:~/mongodump28020$ mongo --port 28021 percona --quiet --eval 'db.session.count()'
79767
revin@mongodb:~/mongodump28020$ mongorestore --port 28021 --dir=dump/ --oplogReplay \
>     --oplogLimit 1555356302 -vvv

Note the

oplogLimit
  above shows a 31 seconds difference from the snapshot’s. Since we want to apply the next 30 seconds from the time the snapshot was taken,
oplogLimit
  takes a value before the specified value.
2019-04-17T00:06:46.410+0000	using --dir flag instead of arguments
2019-04-17T00:06:46.412+0000	checking options
2019-04-17T00:06:46.413+0000		dumping with object check disabled
2019-04-17T00:06:46.414+0000	will listen for SIGTERM, SIGINT, and SIGKILL
2019-04-17T00:06:46.418+0000	connected to node type: standalone
2019-04-17T00:06:46.418+0000	standalone server: setting write concern w to 1
2019-04-17T00:06:46.419+0000	using write concern: w='1', j=false, fsync=false, wtimeout=0
2019-04-17T00:06:46.420+0000	mongorestore target is a directory, not a file
2019-04-17T00:06:46.421+0000	preparing collections to restore from
2019-04-17T00:06:46.421+0000	using dump as dump root directory
2019-04-17T00:06:46.421+0000	found oplog.bson file to replay
2019-04-17T00:06:46.421+0000	enqueued collection '.oplog'
2019-04-17T00:06:46.421+0000	finalizing intent manager with multi-database longest task first prioritizer
2019-04-17T00:06:46.421+0000	restoring up to 4 collections in parallel
...
2019-04-17T00:06:46.421+0000	replaying oplog
2019-04-17T00:06:46.446+0000	timestamp 6680204450717499393 is not below limit of 6680204450717499392; ending oplog restoration
2019-04-17T00:06:46.446+0000	applied 45 ops
2019-04-17T00:06:46.446+0000	done

After applying 45 oplog events, we can see additional documents has been added to the

percona.session
  collection.
revin@mongodb:~/mongodump28020$ mongo --port 28021 percona --quiet --eval 'db.session.count()'
79792

Conclusion

Because snapshots are immediately available and because of its support for deltas, ZFS is quite ideal for large datasets that would otherwise take hours for other backup tools to complete.


Photo by Designecologist from Pexels

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Please join Percona’s Senior Support Engineer Vinodh Krishnaswamy as he presents Deep Dive In the Config Database (Shardings).

Watch the Recorded Webinar

In MongoDB, we know the Sharded Cluster can migrate chunks across the Shards and also route a query. But where is this information stored, and how is it used by MongoDB to maintain consistency across the Sharded Cluster?

In this webinar, we will look at the config database in detail as well as the related metadata.

Watch our webinar to join us in taking a deep dive into the MongoDB config database.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

We are pleased to announce the launch of PMM 2.0.0-alpha1, Percona’s first Alpha release of our long-awaited PMM 2 project! We focused exclusively on MySQL support in our first Alpha (because we wanted to show progress sooner rather than later), and you’ll find updated MySQL dashboards along with enhancements to Query Analytics. We’ve also added better visibility regarding which services are registered with PMM Server, the client-side addition of a new agent called pmm-agent, and finally PMM Server is now exposing an API!

  • Query Analytics
    • Support for large environments – default view all queries from all instances
    • Filtering – display only the results matching filters – MySQL schema name, MySQL server instance
    • Sorting and more columns – now sort by any visible column. Add a column for any field exposed by the data source, for example add rows_examined, lock_time to your Overview
    • Queries source – MySQL PERFORMANCE SCHEMA (slow log coming in our next alpha around May 1st, 2019)
  • Labels – Prometheus now supports auto-discovered and custom labels
  • Inventory Overview Dashboard – Displays the agents, services, and nodes which are registered with PMM Server
  • API – View versions and list hosts using the API
  • pmm-agent – Provides secure remote management of the exporter processes and data collectors on the client

PMM 2 is still a work in progress – expect to see bugs and other missing features! We are aware of a number of issues, but please report any and all that you find to Percona’s JIRA.

Query Analytics Dashboard

Query Analytics Dashboard now defaults to display all queries on each of the systems that are configured for MySQL PERFORMANCE_SCHEMA, Slow Log, and MongoDB Profiler (this release includes support for MySQL PERFORMANCE SCHEMA only), and includes comprehensive filtering capabilities.

Query Analytics Overview

You’ll recognize some of the common elements in PMM 2 Query Analytics such as the Load, Count, and Latency columns, however there are new elements such as the filter box and more arrows on the columns which will be described further down:

Query Detail

Query Analytics continues to deliver detailed information regarding individual query performance:

Filter and Search By

Filtering panel on the left, or use the search by bar to set filters using key:value syntax, for example, I’m interested in just the queries that are executed in MySQL schema db3, I could then type d_schema:db3:

Sort by any column

This is a much requested feature from PMM Query Analytics and we’re glad to announce that you can sort by any column! Just click the small arrow to the right of the column name and

Add extra columns

Now you can add a column for each additional field which is exposed by the data source. For example you can add Rows Examined by clicking the + sign and typing or selecting from the available list of fields:

Labels

An important concept we’re introducing in PMM 2 is that when a label is assigned it is persisted in both the Metrics (Prometheus) and Query Analytics (Clickhouse) databases.  So when you browse a target in Prometheus you’ll notice many more labels appear – particularly the auto-discovered (replication_set, environment, node_name, etc.) and (soon to be released) custom labels via custom_label.

Inventory Dashboard

We’ve introduced a new dashboard with several tabs so that users are better able to understand which nodes, agents, and services are registered against PMM Server.  We have an established hierarchy with Node at the top, then Service and Agents assigned to a Node.

  • Nodes – Where the service and agents will run. Assigned a node_id, associated with a machine_id (from /etc/machine-id)
    • Examples: bare metal, virtualized, container
  • Services – Individual service names and where they run, against which agents will be assigned. Each instance of a service gets a service_id value that is related to a node_id
    • Examples: MySQL, Amazon Aurora MySQL
    • You can also use this feature to support multiple mysqld instances on a single node, for example: mysql1-3306, mysql1-3307
  • Agents – Each binary (exporter, agent) running on a client will get an agent_id value
    • pmm-agent is the top of the tree, assigned to a node_id
    • node_exporter is assigned to pmm-agent agent_id
    • mysqld_exporter & QAN MySQL Perfschema are assigned to a service_id
    • Examples: pmm-agent, node_exporter, mysqld_exporter, QAN MySQL Perfschema

You can now see which services, agents, and nodes are registered with PMM Server.

Nodes

In this example I have PMM Server (docker) running on the same virtualized compute instance as my Percona Server 5.7 instance, so PMM treats this as two different nodes.

Services

This example has two MySQL services configured:

Agents

For a monitored Percona Server instance, you’ll see an agent for each of:

  1. pmm-agent
  2. node_exporter
  3. mysqld_exporter
  4. QAN Perfschema

Query Analytics Filters

Query Analytics now provides you with the opportunity to filter based on labels. We’ are beginning with labels that are sourced from MySQL Performance Schema, but eventually will include all fields from MySQL Slow Log, MongoDB Profiler, and PostgreSQL views.  We’ll also be offering the ability to set custom key:value pairs which you’ll use when setting up a new service or instance with pmm-admin during the add ... routine.

Available Filters

We’re exposing four new filters in this release, and we show where we source them from and what they mean:

Filter name Source Notes
d_client_host MySQL Slow Log MySQL PERFORMANCE_SCHEMA doesn’t include client host, so this field will be empty
d_username MySQL Slow Log MySQL PERFORMANCE_SCHEMA doesn’t include username, so this field will be empty
d_schema MySQL Slow Log

MySQL Perfschema

MySQL Schema name
d_server MySQL Slow Log

MySQL Perfschema

MySQL server instance
API

We are exposing an API for PMM Server! You can view versions, list hosts, and more!

The API is not guaranteed to work until we get to our GA release – so be prepared for breaking changes during our Alpha and Beta releases.

Browse the API using Swagger at /swagger

Installation and configuration Install PMM Server with docker

The easiest way to install PMM Server is to deploy it with Docker. You can run a PMM 2 Docker container with PMM Server by using the following commands (note the version tag of 2.0.0-alpha1):

docker create -v /srv --name pmm-data-2-0-0-alpha1 perconalab/pmm-server:2.0.0-alpha1 /bin/true
docker run -d -p 80:80 -p 443:443 --volumes-from pmm-data-2-0-0-alpha1 --name pmm-server-2.0.0-alpha1 --restart always perconalab/pmm-server:2.0.0-alpha1

Install PMM Client

Since PMM 2 is still not GA, you’ll need to leverage our experimental release of the Percona repository. You’ll need to download and install the official percona-release package from Percona, and use it to enable the Percona experimental component of the original repository. Specific instructions for a Debian system are as follows:

wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb

Now enable the correct repo:

sudo percona-release disable all
sudo percona-release enable original experimental

Now install the pmm2-client package:

apt-get update
apt-get install pmm2-client

See percona-release official documentation for details.

Please note that having experimental packages enabled may affect further packages installation with versions which are not ready for production. To avoid this, disable this component with the following commands:

sudo percona-release disable original experimental
sudo apt-get update

Configure PMM

Once PMM Client is installed, run the pmm-admin setup command with your PMM Server IP address to register your Node within the Server:

# pmm-agent setup --server-insecure-tls --server-address=<IP Address>:443

We will be moving this functionality back to pmm-admin config in a subsequent Alpha release.

You should see the following:

Checking local pmm-agent status...
pmm-agent is running.
Registering pmm-agent on PMM Server...
Registered.
Configuration file /usr/local/percona/pmm-agent.yaml updated.
Reloading pmm-agent configuration...
Configuration reloaded.

You then add MySQL services (Metrics and Query Analytics) with the following command:

# pmm-admin add mysql --use-perfschema --username=pmm --password=pmm

where username and password are credentials for the monitored MySQL access, which will be used locally on the database host.

After this you can view MySQL metrics or examine the added node on the new PMM Inventory Dashboard:

You can then check your MySQL dashboards and Query Analytics in order to view your server’s performance information!

We hope you enjoy this release, and we welcome your comments on the blog!

About PMM

Percona Monitoring and Management (PMM) is a free and open-source platform for managing and monitoring MySQL®, MongoDB®, and PostgreSQL performance. You can run PMM in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL®, MongoDB®, and PostgreSQL® servers to ensure that your data works as efficiently as possible.

Help us improve our software quality by reporting any Percona Monitoring and Management bugs you encounter using our bug tracking system.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Percona announces the release of Percona Server for MongoDB 4.0.9-4 on April 17, 2019. Download the latest version from the Percona website or the Percona software repositories.

Percona Server for MongoDB is an enhanced, open source, and highly-scalable database that is a fully-compatible, drop-in replacement for MongoDB 4.0 Community Edition. It supports MongoDB 4.0 protocols and drivers.

Percona Server for MongoDB extends the functionality of the MongoDB 4.0 Community Edition by including the Percona Memory Engine storage engine, encrypted WiredTiger storage engineaudit loggingSASL authenticationhot backups, and enhanced query profiling. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release includes all features of MongoDB 4.0 Community Edition 4.0. Most notable among these are:

Note that the MMAPv1 storage engine is deprecated in MongoDB 4.0 Community Edition 4.0.

Percona Server for MongoDB 4.0.9-4 is based on MongoDB 4.0.9 and contains the following bug fixes:

Bugs Fixed
  • PSMDB-343: Building from sources could fail with openssl version 1.1.1.

The Percona Server for MongoDB 4.0.9-4 release notes are available in the official documentation.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

There’s just one week to go before the first of this year’s Percona University events in South America. We’re really pleased with the lineup for all three events. We’re also incredibly happy with the response that we have had from the community. While we realize that a free event is… well… free… you are still giving up your time and travel. We place great value on that and we’re making sure that you’ll be hearing quality technical talks from Percona and our guest speakers. Most of the talks will be presented in Spanish – Portuguese in Brazil – although slides will be made available in English.

In fact, the events have been so popular that it’s quite possible that by the time you read this we’ll be operating a wait list for Montevideo (Tuesday April 23), Buenos Aires (Thursday, April 25), and São Paulo (Saturday, April 27).

A request…

So that others don’t miss out on this (rare!) opportunity, if you have reserved a place and can no longer attend, please can you cancel your booking? You can do that after logging into your eventbrite (or MeetUp) account.

After all… we want to make sure that these events are so successful that our CEO, Peter Zaitsev, will find the idea of returning to South America to present at a future series of events completely irresistible.

To wrap up this blog post, let me mention that for the Montevideo event, we’ll have some guest speakers from other companies, to make the content even better; and remember there is a raffle at the end, in which you can get a signed copy of the High Performance MySQL book signed by Peter himself (and other goodies to be discussed ;)).

Looking forward to seeing you next Tuesday!

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview