What is a DBA

The lines between a system administrator and DBA have gotten blurred of late. The question of that a UNIX or Windows admin does and what a DBA does has gotten a little confusing. Let�s take a look at the differences and see what is required. We will base the analysis first on job requirements as posted on monster.com. We will then look at what it takes to be certified for a DBA and Linux admin.


 


DBA capabilities


 


First, what does monster.com list as the requirements for an Oracle DBA. We started this process by searching for an Oracle DBA in the keyword description. Fortunately, there were over 1500 matches that indicate that there is a broad spectrum of descriptions to choose from. It also shows that this is a viable job opportunity with a relatively high demand.


 


Some of the responsibilities that are commonly listed are (ranked by importance):



  • Make sure existing database instances are running and backed up
  • Plan growth of database infrastructure
  • Respond to database support requests (Build/Alter tables, run stats, load tables, data recovery)
  • Provide third level support for help desk
  • Assist resolution of production issues
  • Clone existing production environments for test and development
  • Mentor system admins and operators
  • Lead projects and implementations
  • Evaluate, define, and document standards, processes, and procedures
  • Document network topology, database configuration, and site configurations
  • Evaluate tools
  • Evaluate and recommend purchases
  • Assist in development of disaster recovery and business continuity plan

The skills that are required are



  • Good communication skills
  • Time and multiple project management
  • Advanced knowledge of Oracle 9i and newer versions
  • Comfort with all aspect of database administration (installation, configuration, performance tuning, and security)
  • College degree of some type (typically engineering, bachelor or masters)
  • Familiarity with data modeling, entity relationships, referential integrity, and cardinality
  • Familiarity with creating schemas, triggers, stored procedures, sequences, synonyms, and user defined data types
  • Experience with database backup and restore
  • Experience with database replication
  • Experience with SQL, PL/SQL, Java/JDBC, ODBC
  • Working knowledge of Microsoft SQL Server, MySQL
  • Ability to write batch jobs and scripts
  • Experience with database log shipping and standby databases
  • Experience with service packs and security updates as well as rolling upgrades
  • Experience with troubleshooting HA and highly redundant configurations
  • Five or more years experience as DBA

Supplemental skills desired



  • Knowledge of IIS and Apache configurations and adminsitration
  • Knowledge of operating systems (Windows, Unix)
  • Knowledge of network hardware, firewalls, switches, and routers
  • Knowledge of business applications (SAP, Oracle E-Business Suite, PeopleSoft)
  • Experience with a programming language (Perl, Java, C##, ASP, .NET)
  • Experience with operation center software (HP OpenView, BMC Remedy, Tivoli, CA Unicenter, Nagios/NetSaint)
  • Experience with storage network hardware vendors
  • Knowledge of data warehousing tools (like ETL and Warehouse builder)
  • Knowledge of business analytics software
  • Knowledge of federal and state regulations (FERPA, HIPPA, SOX) and how it relates to data in the database

DBA Certification


 


If we shift our view to the certification process and look at what is being tested we see a slightly different list of topics. The 10g Admin Workshop 1 focuses on the following topics:



  • Installation
  • Creating a database
  • Managing a database instance
  • Managing database storage structures
  • Administering user security
  • Managing schema objects
  • Managing data and concurrency
  • Managing undo data
  • Implementing security
  • Configuring networks
  • Proactive maintenance (stats, AWR, ADDM, alerts)
  • Performance management
  • Backup and recovery
  • Flashback
  • Moving data (loading and migrating)

The 10g Admin Workshop 2 shifts the focus a little deeper into recovery and less of basic administration and user creation.



  • Configuring recovery manager
  • Recovering from user errors
  • Dealing with database corruption
  • Automating management (SQL Tuning Advisor, AWR, ADDM)
  • Using rman
  • Recovering temp tables, redo logs, index, read-only tables, and passwords
  • Monitoring and managing storage
  • Monitoring and managing memory
  • Database recovery
  • Flashback database
  • Managing resources
  • Job scheduling and management

Additional certification can be obtaines for high availability



  • Overview of RAC
  • Installing clusterware
  • Installing RAC
  • Database creation
  • RAC DB administration
  • Backup and recovery of RAC
  • Monitoring and tuning RAC
  • Managing services in HA solutions
  • Managing connections in HA solutions
  • Administering clusterware
  • Troubleshooting clusterware
  • Designing for maximum availability architecture

System Administrator requirements


 


If we shift our focus to what is a systems administrator the job description found on monster.com is a little different



  • Setup and configure hardware boxes with operating systems
  • Patch management for operating systems and applications
  • Support customer requests for operating system and application loads
  • Tune application and operating system configurations
  • College degree (computer science or equivalent)
  • 5 years experience
  • Focus on network, systems, and application security
  • Support and maintain web sites, intranet, and project web sites


  • Provide third level support for help desk
  • Assist resolution of production issues
  • Clone existing production environments for test and development
  • Mentor system admins and operators
  • Lead projects and implementations
  • Evaluate, define, and document standards, processes, and procedures
  • Document network topology, database configuration, and site configurations
  • Evaluate tools
  • Evaluate and recommend purchases


  • Assist in development of disaster recovery and business continuity plan
  • Support network team

Must have proficiency in



  • Good communication skills
  • Time and multiple project management


  • Operating system (Linux, Windows, pick one)
  • DNS, DHCP, IMAP, POP, SMTP, RADIUS, PHP, LDAP, Active Directory, Samba
  • Programming and scripting languages (Perl, csh, sh, java, c, c##)
  • Backup and restore procedures for operating system
  • Experience with VMWare


  • Experience with blades or partitioning of hardware


  • Phone systems (POTS and VoiP)
  • Printer and copier maintenance
  • Experience with firewall and network security

System Admin certification


 


If we shift our focus to what is part of the certification process we see:



  • User environements and logging in
  • Command line basics
  • File system management (disk partitioning, file system types and management, LVM, RADI, Quotas)
  • Users, groups, and permissions (NIS, LDAP, Active Directory, files, ACL)
  • Shell basics
  • Standard I/O and pipes
  • Shell scripting and text processing
  • Managing processes
  • Network administration (device recognition, kernel drivers, DHCP, DNS, TCP, IP, filtering, firewalls)
  • System installation and configuration (BOOTP, jumpstart, kickstart, flash install)
  • System initialization and services (boot, BIOS, kernel procedures, init scripts)
  • Kernel services (/proc, kernel modules, system monitoring)
  • Package and application management (RPM, patch management)
  • Printer administration
  • Backup and restore basics (dump/restore, tape, cpio)
  • X window configuration and security
  • Fault analysis (rougue processes, priorities, memory leaks)
  • Network services (FTP, NFS, Samba)
  • Web services (Web server, CGI, SSL, certificates, email, SMTP)
  • Encryption technologies and key management, PAM, Kerberos

Commonalities between DBA and System Administrator


 


The two jobs do have some overlap. Mainly education, communication skills, and support of help desk. If we list the job functions that are the same we see:



  • Provide third level support for help desk
  • Assist resolution of production issues
  • Clone existing production environments for test and development
  • Mentor system admins and operators
  • Lead projects and implementations
  • Evaluate, define, and document standards, processes, and procedures
  • Document network topology, database configuration, and site configurations
  • Evaluate tools
  • Evaluate and recommend purchases
  • Assist in development of disaster recovery and business continuity plan

Skills that are similar



  • Good communication skills
  • Time and multiple project management
  • College degree of some type (typically engineering, bachelor or masters)


  • Programming knowledge
  • Experience with firewall and network security

Differences between DBA and System Administrator


 




























DBA


System Administrator


Make sure existing database instances are running and backed up


Make sure existing operating systems and applications are running and backed up


Plan growth of database infrastructure


Plan growth of hardware and network required to run applications


Respond to database support requests (Build/Alter tables, run stats, load tables, data recovery)


Respond to user support requests (load operating systems, add applications, printer problems)


Patch management for database


Patch management for operating system and applications


Tune database


Tune operating system and applications running on system


Focus on database and application security


Focus on network, system, and applicaion security


Support developers in accessing data stored in database


Support and maintain web sites, intranet, and project web sites


 


The training required for both jobs are radically different. The DBA focuses more on administration of data and users. The sys admin focuses more on components of the operating system and applications. The DBA is aligned a little more with the business units while the sys admin is more aligned with the help desk. Both are important relevant jobs but few people cross the boundaries between the two because the training required to become an expert in both is time consuming.

storage selection for a database

It is amazing how friendships help business relations. I have a friend who is working at NetApps and we are both calling on the same account. We started to recommend conflicting information to the customer based on the knowledge of our own worlds. Once we talked, I understand his perspective and I think that he understands mine. We are now talking about how we can jointly present to this customer and even coordinate our messages with each other and through our recommended consulting organization.

Here’s the situation. We have a customer that is running an old version of Oracle (9i) on an old operating system (Solaris 8) on old hardware (Fujitsu some older model) attached to a NetApps filer (three generations ago). The customer is experiencing significant performance problems. They have two applications that they are running against the database. One is an ERP system that has a relatively low transaction count. The other is a business objects application that is consuming all of the resources. Normally this would not be a problem but the business objects app is consuming all of the resources and causing the ERP system to lock up and interrupt business.

Simple solution right? Find the bottleneck and fix it. Well, it isn’t that simple. NetApps did a detailed analysis and found that the disk is 90% busy. The network is 60% busy. If they double the disk speed and reduce the latency by half, the network will become a problem. Fortunately, the business objects app is configured to be a read only analysis of the database. If we can split the database out, we can significantly improve the ERP performance.

Given a computer science/electrical engineering background, my first impression was that NFS mounting the data was the biggest problem. If they could move to direct attach disks, it would improve performance. This would get rid of the network loading and allow them to purchase faster disks at the same time. What I didn’t realize was that they would also need to purchase a fiber channel switch and learn how to manage this as well. I did know that NFS management was easy, I’ve been doing it for years. I didn’t realize that SAN management is an art to itself and is significantly more complex and expensive to manage. Looking into this subject a little more I was amazed to find that NFS does not substantially effect performance if tuned properly. An NFS system is only 12% slower than a fiber attached system and 2% slower than iSCSI.

The recommendation that we are moving forward with is a second database system that replicates the primary data using DataGuard with physical replication. Once this is done, redirect the business object app to hit the standby database and purchase a second netapp filer to offload the original system. The added benefit is that this configuration is the foundation for disaster recovery since both apps are currently running on one machine. Even though the standby system will be in the same datacenter it will provide redundancy for the primary system.

The performance results and complexity of SAN storage amazed me. Both were contrary to my thinking. Once I looked at it from this perspective, it makes sense. Live and learn…..

database upgrade

so for the past few days I have been playing with contentDB installation. It has been an eye opener. I am relatively new to Oracle (10 months now) and have little experience as a DBA. I have been a Unix admin, software developer, and system architect for a variety of companies. I was a Solaris expert from the early years and remember how difficult it was to install, configure, and upgrade Solaris (at the time called SunOS). I have played with Tomcat at a variety of jobs and even used the iPlanet/Sun Application Server. I am surprised to say that the Oracle suite of products suffers from the same problem that most public domain code and Linux installations suffer from; poor documentation and inadequate dependency checks.

I decided to install contentDB. To do this, I installed the database, version 10.2.0.1 which is the full image that can be downloaded. I got the install working, the os with the right packages, and the accounts and kernel parameters correctly configured. I download and unzip the soa suite and find out that I need to upgrade the database to at least 10.2.0.2. No problem, right? I didn’t initialize any data elements other than the sample database. The upgrade should go easy. WRONG! I downloaded the 800M+ bundle and unzipped it. I execute the runInstaller and watch as the messages go by. I assumed that it would halt the services and upgrade what was needed. It posts a message saying that I need to stop the services managed by the ORACLE_HOME that I have defined. It didn’t say what the processes were or how to see what was running. It just said that I needed to stop something and it was up to me to figure it out. I eventually figured out that I needed to stop the listener, the sql service, the enterprise manager, and the database. It seems like there would be a stopall command that would make this happen.

Once I got everything stopped, I started the upgrade package and it finishes without any warning. Since I only installed the demo tables I thought that I was finished and could connect to the database. WRONG! I forgot to upgrade the database once I upgraded the binary packages. It seems like the upgrade process should have done this for me or at least asked me if I wanted it done. It took me a few hours to figure this one out but I waded through the manuals and forums and got it working.

Now that I have the database upgraded to 10.2.0.3 and OEM reports that it is the right version and everything is properly running. I started the runInstaller for the soa suite. This comes up with a few questions and starts checking for dependencies. I appreciate this but the error message that comes up is of little or no help. Some required prerequisite checks have failed…… What prereqs? A little more detail here would be helpful. A link to a web page? A list of failures?

Am I off base asking for a little more detail here? In the early days of Solaris, these were common error messages. It almost harkens back to Windows messages. “Error: You can’t do anything now” with a Reboot button. Ok, I was a little harsh on the error message, I actually do get an error message in the universal installer screen. It seems like the error window should drop me into the reporting 1 error, 1 requirements to be verified screen that lists the error. The error appears to be easily fixed, gcc-c++ is not found. Fortunately, I had to upgrade the gcc installation for my version of Red Hat Enterprise 4 so I have the iso already mounted. Once I got the prereqs resolved, it complains that there isn’t a ORABPEL schema on the database that I am using. I guess it is back to the manuals to figure out the schema elements that I need to install and configure.

Enterprise Manager

I have been playing with Oracle Enterprise Manager this morning. I have OEM 10.2.0.1 installed on a virtual machine. Unfortunately, I don’t have my virtual image patched to take the daylight savings time problem into account. The following is what I saw when I tried to start the OEM.

$ ./emctl start dbconsole
TZ set to US/Central
Oracle Enterprise Manager 10g Database Control Release 10.2.0.1.0
Copyright……
http://localhost.localdomain:1158/em/console/aboutApplication

Timezone mismatch: the agentTZRegion value (US/Central) in /home/oracle/oracle/product/10.2.0/db_1/localhost.localdomain_orcl/sysman/config/emd.properties does not match the current environment TZ settings(US/Central).
The dbconsole cannot run with this mismatch

To fix this problem I have two choices. The first is obvious, patch the software installation. Being lazy, I didn’t want to do that. If I reset the time to be April 1st, the time zone problem is not an issue. This is done with the following command:

# date 04010915

The command must be run as root and if of the format MMDDHHmm or Month|Day|Hour|Minute with each element being two digits. Once I do this, OEM starts properly and begins recording data as expected. Unfortunately, the date will be set to April 1st.

library schema design

Let’s get back to the library system. First let’s look at the design. I have scanned in most of the books that we have on our bookshelves at home. I was surprised to find out that we have over 300 books. I didn’t scan the kids books in their room. My guess is that we have 600 books in the house. How did that happen? I imported these books into www.librarything.com so that I can auto populate most of the data by entering the ISBN number. I purchased a bar code scanner and found that the majority of the books include the ISBN number on the bar code. Many books have two bar code numbers, one for inventory control and one for ISBN. I selected librarything because it allows me to search the Library of Congress as well as Amazon to populate the library information. It allows you to enter the ISBN number, the book title, or the author name and it presents a list of matches. If there is no match in the Library of Congress, it allows you to search Amazon. I like this because between the two I found data for each book in our house. The biggest problem is that at times I found multiple copies because many books have multiple editions. A significant amount of the matches included a picture of the book cover which helped significantly.

The biggest benefit of this service is that it allows me to export the book information to a comma separated or tab separated file and import it into a database or Excel program.  The fields in this export are:

  • book id
  • title
  • author first and last name
  • author last and first name
  • other authors
  • publisher
  • date of publication
  • ISBN number
  • series
  • source (Amazon or Library of Congress)
  • language
  • language 2
  • original language
  • LC Call number
  • Dewey Decimal number
  • BCID
  • date entered
  • date entered
  • date entered
  • date entered
  • tags
  • rating
  • review
  • summary
  • comments
  • encoding

Looking at the data in this collection brings up a few questions. Why can’t I use Excel as the basis for this information? First, how do I record what is checked out and when it is checked out? Using one of the date entered and comments fields? I could but it would be difficult. The user who checks in and checks out books would need to be familiar with Excel and have a uniform way of checking in and checking out books. If multiple people do this using different fields or different formats, the data will become corrupted. We need to create a browser front end to make the data entry uniform and the way that data changes the same.

Second, if I wanted to sort the data using some index and perform a some simple analytics, an Excel spreadsheet will not work. If someone checks out a book and checks it back in, the record will be cleansed and lost if it is done in one spreadsheet. It would be nice at some time to find out what the most popular book is. It would also be nice to present a list of books that students have checked out and make it visible to parents and teachers. It gives both a tool to help educate the child and the librarian a starting point when they go to purchase more books. Analytics like this require a transactional log of book usage and who used them. Using a spreadsheet does not allow us to record these transactions or easily coordinate between two worksheets and archive this data. Some type of database is required for this information.

The discussion of checking in and checking out books requires another table schema. We need something simple to track the students.

  • Student id
  • Student first and last name
  • Homeroom teacher
  • Grade

With this information we can record when a student checks out a book by correlating a book to a student id. We can also let a parent look at the current books checked out by presenting the student id and searching for books checked out. The librarian can also tag a class of books for not to be checked out by grade levels. We might use either the series or rating elements in the book schema. For example, if a kindergartener wants to check out a book that is high school rated, the check out program will flag the process and require a verification that this child can check out this book.

We will also need a mechanism to record book status. This will require a specific schema element to track this.

  • book id
  • date checked out
  • student id

Some elements that are missing from this library schema are the cost of a book and the location of a book in the library. Since this library is a small library of a few thousand books with less than 20 book shelves, there is really no need to record this information. If we had multiple libraries or libraries in multiple rooms at the same site, we might want to record book location. If we were going to charge for books based on our cost of purchasing the book, we would want to record the cost of the books. Since our library is a volunteer library where the vast majority of books have been donated or purchased in bulk from Half Price Books, we don’t track the cost of each individual book. We do record how much has been spent on all of the books in the library. We have been collecting books for the past 8 months and have spent more on bookshelves than we have on books. We have just over 7000 books and have spent $500 on bookshelves (the cost of wood, nails, and screws) and $200 on books. The funding for these books came from a book fair where we got a percentage of profits from all books sold. During this book fair we had the teachers list books that they were interested in and books that they wanted in the library. The parents purchased a few hundred books and donated them to the library.

next topic on the library, use cases and user interface design. I will be using three different tools to design this library interface; Apps Express(HtmlDB), JDesigner, and NetBeans. The purpose of using three is to compare and contrast the three different tools for creating the same product. Once we have the design completed, we will look at what it takes to import our spreadsheet data into a database. The databases that we will evaluate are OracleXE, mySQL, and SQL Server Express. We will look at these three because they are all free databases.

topics for the next few weeks

for the next few weeks I am going to look at two topics. The first is installation and configuration of the different Oracle database products. The second is different ways of managing and keeping a server and database operational. The base platform that I am working from is a vmware image. The different operating systems that I will work with are Windows XP, Windows 2000, Windows 2003, Red Hat Enterprise Linux Release 4, and Oralce Unbreakable Linux Release 4 Update 4.

I am going to look at these different operating system platforms as well as four different database platforms (Oracle – all flavors, mySQL, SQL Server, and Postgress). The intent is to look at the basic functionality of the database, the footprint required, and a basis for benchmarking all of the functions and configurations needed for a database.

I am also going to look at the different management tools like Oracle Enterprise Manager, Nagios/NetSaint, CA Unicenter, BMC Patrol, and Quest. The purpose here is to look at the different management tools and how well it integrates into an ITIL environment, how well it functions, and how difficult it is to configure and construct.

To begin our tests, I have created a suite of VMWare images on a hard disk. These images are stock installations of the various operating systems. These systems use a file resident virtual disk and not an attached disk to perform the operations. All of the images run on an external Buffalo USB 1/2 TByte disk. I am doing all of the tests on a Dell Latitude D620 with 2G or RAM.

The configuration matrix for the operating systems looks like:

Windows XP   –  requirements: 300Mhz CPU, 128 M RAM.
                           consumed 1.24G disk with install
Windows 2000 – requirements: 200Mhz CPU, 64 M RAM.
                           consumed 2.1G disk with install
Windows 2003 – requirements: 200Mhz CPU, 256 M RAM.
                           consumed 2.19G disk with install
RHEL4             – requirements: 300Mhz CPU, 256 M RAM.
                           consumed 0.97G disk with install
UnbrLinux R4U4 – requirements: 300Mhz CPU, 256 M RAM.
                              consumed 1.2G disk with install

next entry…. memory and disk footprint for databases