Last week we had a clusterware issue on one of the critical 3 node RAC environment. In the first node, network resource is restarted by ending up killing all sessions on that node abnormally. Oracle VIP that was running on that node failed over to the third node. The first node was up and running, but didn’t accept connections because it was trying to register the instance using LOCAL_LISTENER parameter where the oranode1-vip was specified that was not running on that node. We tried to relocate it back to the first node, but it failed because it couldn’t stop it. Everytime we tried to stop or relocate it, the cleaning process started and failed in a few minutes.
Neither support, nor us didn’t find any readable information in the clusterware log files. Despite the fact that there were 2 instance up and running, as load was so high, they were barely handle all connections. The ping succeeded to the oranode1-vip, but it wasn’t able to stop it even with force mode. We couldn’t able to start it as well, because it didn’t stop successfully and wasn’t able to clean up successfully. The status was “enabled” and “not running”, but ping was ok
db-bash-$ srvctl status vip -i oranode1-vip
VIP oranode1-vip is enabled
VIP oranode1-vip is not running
From crsctl stat res command we could see that it’s OFFLINE and failed over to the node3
db-bash-$ crsctl stat res -t
oranode1-vip 1 OFFLINE UNKNOWN node03
And it failed when we tried to start it:
db-bash-$ srvctl start vip -i oranode1-vip
PRCR-1079 : Failed to start resource oranode1-vip
CRS-2680: Clean of 'oranode1-vip ' on 'node03' failed
CRS-5804: Communication error with agent process
We cleared socket files of the first node from /var/tmp/.oracle folder, restart the CRS and checked if it failed back, but it didn’t. Support asked us to stop the second node, clear the socket files and start it to see if something changed, but we didn’t do it, because the single node wouldn’t be able to handle all connections.
At the end, we checked the interface of virtual up on OS level, and found it on node03
db-bash-$ netstat -win
lan900:805 1500 #### #### 2481604 0 51 0 0
Instead of restarting the CRS of production database (which takes 10 minutes), we decided to bring that interface down using on OS level. For HP-UX, it’s ifconfig … down command
Before running this command on production environment, we tried it on the test environment and realized that the down parameter is not enough. We have to provide 0.0.0.0 ip address with along the down parameter to bring down that interface. So we run the following command to bring it down:
ifconfig lan900:805 0.0.0.0 down
And it disappeared from the list. Next, we started the vip using srvctl start vip command and it succeeded!
Perform all actions on the test environment (if you are not sure what can happen) before trying it on production environment
Don’t try to “restart” or “reboot” the instance, cluster or the node. Sometimes it just doesn’t solve your problem. Even after restart, the system can’t startup correctly (because of changed parameters, configurations and etc.)
In 24 hours, severity #1 SR was assigned to 6 different engineers. It takes a lot of time to gather log files, submit them and have it reviewed by Oracle engineer until his/her shift is changed. Sometimes you just don’t have time to get answer from Oracle, you have to do it by your own and take all risks. It requires an experience.
Last month, after long brainstorm, I decided to take my chance and accepted my participation at Indian Oracle ODev Yathra tour. Despite the fact that I’ve visited India (Hyderabad) 2 times in the past for the Sangam conferences, I wanted to discover India more and decided to take 4 cities out of 7.
For the Yathra Tour I submitted 2 papers:
The first one was about “8 ways to migrate your On-Premis database to Oracle Cloud” where I was talking about different ways to migrate the database based on the downtime and the migration requirements to the Oracle Cloud.
The second session “Create, configure and manage Disaster Recovery in Oracle Cloud for On-Premises database” was about creating, configuring and managing DR on the Oracle Cloud using different techniques as well as configuring high level database, backup and network security.
When the agenda was published, I got a lot of messages from the DBAs of the cities at which I was not supposed to participate – that they are looking forward to meet me. So I talked to Sai Ram, the organizer of the Yathra Tour and he managed to put me into the agenda of the rest cities and I accepted one of the hardest decisions of my life and took all cities. I was having (and still have) a lot of ongoing projects in my company and had health issues that were blocking me to travel a long distance for two weeks. But I decided to push my limits and go beyond it.
So, finally, the travel started. I took my first flight to Abu Dhabi, and from Abu Dhabi to Chennai. Landed in Chennai, took a cab to the hotel, have some rest and was in the lobby at 7.30 AM next morning. Yes, this time was the common checkout time from the hotel every day I met Oracle Fusion expert Basheer Khan, Machine Learning PM Sandesh, Exadata PM Gurmit in the morning and we had a breakfast together. Then we took a cab and went to the venue.
So the daily routine for the conference was 7.30AM checkout from the hotel, cab drive to the venue, registration, introduction speech of Sai and other AIOUG members, then delivering presentations, having launch (most of the time spicy Indian launch ), closing ceremony at 6.00PM, driving to the airport, flying to the next city, bunch of security checks and etc., driving to the hotel, check-in and off to bed at 1.00AM and then checkout at 7.30 AM and off to the next venue again. Scary, right?
The next city was Bengaluru. As we had one extra day there, I decided to have a lunch outside in a random restaurant. The place was near the hotel and I ordered biryani as always Although I asked for “less spicy” biryani, I was served with the spicy one. My tongue was burned out and I was hardly drinking the tea for the next 2 days But it was very delicious. In the evening I took a small trip to MG (Mahatma Gandhi) road. It was too crowded, fascinating place and I was hardly got rid of a man who was chasing me and trying to sell a chess for 1500 Rupes (which originally was for 600 Rupee) He didn’t know I train JiuJitsu
The next city was Ahmedabad. And I was not the only person who was visiting this city for the first time. Actually none of us (mostly Indian speakers) visited Ahmedabad so far The roads of this city were wide, and I was told that the Ahmadabad guys are coolest guys in India )) My session was after the launch and I managed to sleep a little bit more and attended the venue later. But unfortunately didn’t manage to visit the barber in the open air whom I was filming with curiosity. He yelled me with his hand and invited me to try his service, but I was late to my session
Next city was Hyderabad and the airport was very familiar to me. I already visited Hyderabad 2 times before. Again, was fortunate to have only one session after the launch and attended the venue a little bit later, met lot of friends that I met in my previous visits and all of us were off to the airport right after the conference.
And we headed to the Pune. I was happy, because we had an extra day in Pune. We arrived to the city in the evening, and the next day after having launch in the hotel, I missed city tour with speakers who were more energized than me and found a Starbucks coffee shop and spend few hours reading book (Ikigai – Japanese concept that means “a reason for being.”) and relaxed a lot. The next day, we checked out from the hotel early in the morning and went to the Oracle office that was bit far from the hotel, and fortunately did a city tour in parallel )) The venue was huge and beautiful and there was a coffee machine that I used a lot to drink a coffee to stay alive. We had a very interactive sessions and after the conference the bus was waiting for us to take us to the Mumbai! It took approximately 4 hours for us to reach to Mumbai, but we enjoyed the travel a lot. In the following link you can see part of our trip in Connor’s video shoot
Mumbai meetup was awesome. I got more questions in just a single session than the rest of the tour J and it ended up finishing the 45 minute session in 1.30 hour! But it was not just a presentation, because of those questions the session was like a discussion which I liked a lot!
And after the conference, we headed to airport to take the last city – Gurgaon! The next morning I was extremely tired, barely was walking and standing straight. But got a lot of positive energy from the attendees and did 2 sessions successfully. As my flight was on the next day at 4.00 AM, I returned back to the hotel, had some rest and headed to the airport and returned back to my lovely country, Azerbaijan.
So overall, the trip was awesome! It was hard, but it was worth it. I made a new friendships, met online friends that were using my blog posts for years and got a lot of positive feedback, listened stories about how my blog posts saved their lives and etc. and it motivated me to write more blog posts in the future. I also attended sessions of other speakers and learned a lot both in terms of presentations and technical skills
I would like to thank to the ODev Yathra Tour organizers, especially Sai Ram for all he had done to make us feel like home, to AIOUG staff, to ACE program – especially Jennifer and Lori for supporting us, to all attendees for taking time and attending our sessions. I love India and the community a lot and looking forward to visit the amazing and incredible India again!
If you are a production DBA of mission critical system, then you might have already seen the following critical, I would say mortal messages in your alert.log file.
When your database was up and running, you shutdown it and open and it fails to MOUNT the database and abort
The database was hanged with millions of online transactions, and aborted. You start the instance, switch to the MOUNT mode, do some maintenance tasks and try to open the database and …. wait …. wait …. wait …..
system01.dbf contains corrupted blocks
When it takes 15 hours to restore the database, you run the recover database command and get the following errors:
When you’ve done with restore/recover and open the database with RESETLOGS option and see the following errors:
When you have missing datafiles of a tablespace with 10Tb size due to hard disk corruption and don’t have a backup
Incomplete recovery due to missing archived log files and most probably you are going to fail using *.allow_resetlogs_corruption parameter as well
When your database hangs, you get a hard disk corruption and lose some datafiles, and it takes an hour and half to perform and instance recovery and you just wait for that time of period for the database to be opened:
Aaaand most annoying message during the recovery
I will keep updating this post with your and my screenshots. Feel free to send me screenshot of cases where you stressed, but eventually succeeded to solve the database issue
I don’t want to scare you, but the exam is hard enough. The bad thing is – you fail the entire exam if you fail one of the sections. This means that you have to be well prepared for all 3 parts. For me, I was good at ASM and RAC Administration, and was not comfortable with Grid Infrastructure Installation and Administration part which I passed barely.
You may be Oracle high availability expert and fail the exam. You might have an experience but can fail because of useless (or may be uncommon) features and topics that you didn’t practice, or didn’t read or read superficial. Because most of the questions were not checking your practical experience, but theoretical knowledge. I manage high available cluster databases for last 8 years, and it was really hard to answer some of the questions that I haven’t ever faced and I didn’t see the reason to try.
There were a lot of questions like “Choose four option, where blah blah blah ….” And you have to choose 4 options out of 7. You might know 3 correct answers, but because of that 1 wrong option you might fail.
Next, you have to achieve a minimum score for all 3 sections in order to pass the entire exam. You might complete 2 sections with 100% and fail from the one and end up failing the entire exam.
How to prepare for the exam?
You have to read the documentation and play with ASM, RAC database and Grid Infrastructure A LOT!
If you want to learn Oracle 12c Grid Infrastructure installation, check the following video tutorial:
During the exam, I felt regret skipping reading some chapters in the documentation and viewing some of them superficial. I highly recommend to check ASM, RAC and Grid Infrastructure documentation and make sure you went through the entire documentation at least once. Here are the links to the documentations:
Real Application Clusters Administration and Deployment Guide
Most of you (including me) postpone the exam and don’t put deadlines for the preparation and for the exam itself. My advice – set an approximate date for the exam and make a plan for each month, week and day. Then set a date and book the exam! Yes, book it – as you have a chance to rebook if you don’t feel ready unless it’s 24 hours before the exam. Registering for the exam weeks before the exam date will push you to make your preparation completed on time.
I booked the exam for Tuesday, rebooked it to Wednesday, then to Thursday, and then to Friday :). On Wednesday I decided to reschedule it to the next Monday and in the evening I was shocked when I saw that I didn’t actually rescheduled it on Friday. It will happen tomorrow! (on Thursday) Just in a few hours!
I didn’t feel that I’m ready and still having few incomplete sections where I was feeling weak, even was about to cancel the exam and don’t attend, but then decided to push hard and try. And if I lose, I decided to lose like a champ
So I stayed awake till 3am, took a nap till 6am and made last preparations till 9am. Attended exam at 10am and was completely exhausted, overworked and sleepy.
Fortunately I passed the exam successfully and wish you the same.
This is my experience with Oracle Database 12c: RAC and Grid Infrastructure Administration exam (1Z0-068). Let me know if you plan to take the exam, so I guide you through it in more detail.