Freitag, 9. Dezember 2011

Who created that process?

Figure 2-7
Connection to
a Dedicated
Server Process
For some reason I was really curios who created that process. It's not about a particular process in detail, mir a well known kind of processes. At least well known for DBAs.
Which process? 
It's one of these:

oracle   13096     1  0 20:05 ?        00:00:00 oracleTTT071 (LOCAL=NO)

Yes, it's a simple server process, nothing spectacular. Nevertheless, the Concepts guide is not very specific, who created that process. So I tried to find out in more detail.
On my linux sandbox the first column of ps -ef shows the UID, the second is the PID, the third is the PPID. Unfortunately it's 1 here, and I'm quite sure, this process was not created by init. So this proces is somewhat orphaned, as the direct parent disappeared. Very sad!
I decided to follow Figure 2-7 from the concepts guide. I used strace -f -p <PID_of_listener> to see what's going on. -f follows all forks, so also their actions are traced.
The first 3 lines are
Process 2979 attached with 3 threads - interrupt to quit
[pid  2981] futex(0xae8dee4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2980] restart_syscall(<... resuming interrupted call ...> <unfinished ...>


So we have 3 listener processes - it's good to know and probably worth to investigating this segregation of duties - but not in this post. There are so many interesting lines, but I'm searching for a process, so let's continue with

[pid  2979] clone(Process 27028 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aedd9914b80) = 27028
[pid  2979] wait4(27028, Process 2979 suspended
 <unfinished ...>
[pid 27028] clone(Process 27029 attached (waiting for parent)
Process 27029 resumed (parent 27028 ready)
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aedd9914b80) = 27029
[pid 27028] exit_group(0)               = ?
Process 2979 resumed
Process 27028 detached
[pid  2979] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 27028
[pid 27029] close(15 <unfinished ...>
[pid  2979] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 27029] <... close resumed> )       = 0
[pid  2979] close(14 <unfinished ...>
[pid 27029] close(16 <unfinished ...>
[pid  2979] <... close resumed> )       = 0
[pid 27029] <... close resumed> )       = 0
[pid  2979] close(17)                   = 0


Here the listener ([pid  2979]) creates a new process by the first clone call. This new Process has the PID 27028. This new process has only one purpose: again clone a new Process: PID 27029 and use exit_group(0) to terminate directly afterwards. By this trick the listener is not shown as parent process for PID 27029. Directly after it's creation PID 27029 closes some file handles. As by the sequence of clone calls the new process inherited a table of all open file (and network) handles it seems it tries to get rid of any it does not need as early as possible. The next part
[pid  2979] fcntl(16, F_SETFD, FD_CLOEXEC) = 0
[pid 27029] setsid( <unfinished ...>
[pid  2979] fcntl(15, F_SETFD, FD_CLOEXEC <unfinished ...>
[pid 27029] <... setsid resumed> )      = 27029
[pid  2979] <... fcntl resumed> )       = 0
[pid 27029] geteuid()                   = 5831
[pid  2979] fcntl(13, F_SETFD, FD_CLOEXEC) = 0
[pid 27029] setsid()                    = -1 EPERM (Operation not permitted)
[pid  2979] poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=16, events=POLLIN|POLLRDNORM}, {fd=15, events=0}], 5, -1 <unfinished ...>

makes sure the file descriptos 16, 15 and 13 will remain after an execve(2) call.
And here it goes:
[pid 27029] execve("/appl/oracle/product/rdbms_112022_a/bin/oracle", ["oracleTTT051", "(LOCAL=NO)"], [/* 109 vars */]) = 0
from the man page if execve:
execve() executes the program pointed to by filename.
...
execve() does not return on success, and the text, data, bss, and stack of the calling process are overwritten by that of  the  program  loaded.   The  program invoked inherits the calling process’s PID, and any open file descriptors that are not set to close-on-exec.  Signals pending on the calling process are cleared.  Any signals set to be caught by the calling process are reset  to  their default behaviour.  The SIGCHLD signal (when set to SIG_IGN) may or may not be reset to SIG_DFL.
       If the current program is being ptraced, a SIGTRAP is sent to it after a successful execve().
       If  the  set-user-ID  bit  is set on the program file pointed to by filename, and the calling process is not being ptraced, then the effective user ID of the calling process is changed to that of the owner of the program file.  i Similarly,  when  the  set-group-ID bit of the program file is set the effective group ID of the calling process is set to the group of the program file.
From that point on there you can see how the server process comes to life. It's very interesting in some details, but not scope of this post. After some conversation between listener and server process using file descriptors 15 and 16 (I assume these are just sockets) both close these file descriptors. The listener also closes file descriptor 13 which seems to be the TCP connection to the client. From that point the 2 processes seems to be independent.

Well, now I know (at least on my test-system) the simplest way, the listener creates the process - and it uses execve to do so. There still are many questions open, like what's going on at this redirection as shown in Figure 2-8.

Montag, 28. November 2011

bzip2 twice?


To check the performance of RMAN backup I recently started to trace it a little bit. As most of the time was not spent in any reading from disk or writing to media manager library event, it was on CPU. It's good to know the CPUs are of any good, but as I still want to know what's going on I tried to dig any deeper. CPU cycles are not just a magic black box where we put in a problem and the answer comes out after some times. At an abstraction layer it's a chain of functions where one is called by another, and only the last is the one doing anything. There is not much information in that fact per se, but developers are humans also, and they are giving the functions they code meaningful names.


So I had just to find these names (and where most of the time is spent) to figure out what's going on. To save my time I remembered Tanel Poders Advanced Oracle Troubleshooting Guide, Part 9 – Process stack profiling from sqlplus using OStackProf. There he described his tool ostackprof. This did all the job for me, I just had to find a rman session.

Here's the shortstack where most of the time was spent:
(This backup was done with COMPRESSION ALGORITHM ‘BASIC’)
->__libc_start_main()->main()->ssthrdmain()->opimai_real()->sou2o()->opidrv()->opiodr()->opiino()->opitsk()->ttcpip()->opiodr()->kporpc()->kkxrpc()->prient()->prient2()->pricbr()->pricar()->plsql_run()->pfrrun()->pfrrun_no_tool()->pfrinstr_ICAL()->pevm_icd_call_common()->krbibpc()->krbbpc()->krbb3crw()->krbbcdo()->kgccdo()->kgccbz2pseudodo()->kgccbz2do()->kgccm()->kgccbuf()->kgccgmtf()->__sighandler()->->

The naming convention for functions is not public documented by oracle, but for some reasons I'm sure functions starting with krb are related to backup, whereas kgcc is used for compression. Especially the working function kgccgmtf reads like generate Move To Front.

At that point I had a lot more information than before, still I had no way how to improve the backup speed. As we have licensed advanced compression for that particular node, we tested with different other compression methods. LOW and MEDIUM where faster, with less compression than our previous BASIC. But HIGH was even slower!

So again I used ostackprof and that's the topmost stack trace - for HIGH:
->__libc_start_main()->main()->ssthrdmain()->opimai_real()->sou2o()->opidrv()->opiodr()->opiino()->opitsk()->ttcpip()->opiodr()->kporpc()->kkxrpc()->prient()->prient2()->pricbr()->pricar()->plsql_run()->pfrrun()->pfrrun_no_tool()->pfrinstr_ICAL()->pevm_icd_call_common()->krbibpc()->krbbpc()->krbb3crw()->krbbcdo()->kgccdo()->__PGOSF209_kgccbzip2pseudodo()->kgccbzip2do()->BZ2_bzCompress()->handle_compress()->BZ2_compressBlock()->generateMTFValues()->__sighandler()->->


Do you see the difference? Until kgccdo there is no! And even afterwards, the functions are somewhat similar. One more thing is worth to mention: the bzip2 implementation for HIGH does not use oracle internal naming convention. So it's worth to search for these names on the internet. one of my best hits was a compress.c File Reference.

Did Oracle reinvent the wheel? No. For me it looks as if they tried their best first (by doing their own kgcc implementation) and afterwards preferred simple copy&paste. Maybe they should just skip either of these 2 - they still can use parameters to achieve different compression quality. 


If someone is interested in our results:
for a single datafile of 30GB (with 100% usage) we achieved on a production system - with all it ongoing tasks:


Typeminbackup-size
BASIC13:325.8
LOW5:178
MEDIUM8:526.14
HIGH65:294.25

We decided to choose MEDIUM.

Mittwoch, 5. Oktober 2011

non-Exadata HCC

Oracles Hybrid Columnar Compression was one of the big new features of Oracle 11.2. Unfortunately someone decided this feature should only be available in Exadata. Even Oracle tried to explain it with technical limitations, it was more or less obvious that's just wrong. There are some reasons for this:

  • The Database is doing the HCC compression all the time
  • The Database must be able doing the HCC decompression in case the storage cell can not or want not.
  • Beta-testers where very sure, HCC worked there without any special hardware.
  • Jonathan Lewis shows there are situations, where also an ordinary database creates tables with HCC
But fact is: Oracle decided to disable HCC for general usage. As there is no different database software in Exadata database servers, the decision whether process the statement or throw a ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage must be done by any switch within the software.

Here my collection of informations I have about these switch:

  • Kerry Osborne described in the book Expert Oracle Exadata in pages 46 to 48 how the ASM DiskGroup attribute cell.smart_scan_capable=TRUE is only possible on Exadata, and necessary for any kind of smart scan - so also for HCC.
  • Cern has published a paper about Compression in Oracle - in the Appendix (pages 42 to 44) they show how to change this attribute. Not so easy and it corrupts the ASM DG. 
  • Jonathan Lewis mentioned there might be a switch in DBMS_COMPRESSION.GET_COMPRESSION_RATIO which disables the switch for the purpose of the temporary compressed tables. He did not go into details, but I decided to investigate into that direction.
The package DBMS_COMPRESSION uses prvt_compression, and there in GET_COMPRESSION_RATIO it calls PRVT_COMPRESSION.CHECK_HLI(1); to disable this switch and PRVT_COMPRESSION.CHECK_HLI(0); to enables it at the end again. CHECK_HLI just calls the kernel function KDZCHECKHI with it's parameter, nothing more. Unfortunately it can not be called from outside of PRVT_COMPRESSION. That's the way I started to investigate: I removed the line   PROCEDURE CHECK_HLI (HLID    IN NUMBER); from the package body and inserted it into the package header. (by doing so, I left the path of supported system - don't do this if you care your system!). Now I can call CHECK_HLI:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production 
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,Data Mining and Real Application Testing options

SQL> exec sys.prvt_compression.CHECK_HLI(1);

PL/SQL procedure successfully completed.

SQL> create table bx_dba_objects Compress For Archive Low as select * from dba_objects;

Table created.

prvt_compression.CHECK_HLI works per session; so a logon-trigger comes to my mind.
To enable the check again, the parameter is 0 instead of 1.

Update:
parallel processes does not inherit this feature:
(a slightly different testcase, but same setup in general)
SQL> select /*+ PARALLEL 8 */ count(*) from test_user.DBMS_TABCOMP_TEMP_CMP; 
select /*+ PARALLEL 8 */ count(*) from test_user.DBMS_TABCOMP_TEMP_CMP
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P008, instance av2l904t:VAX1
(1)
ORA-64307: hybrid columnar compression is only supported in tablespaces
residing on Exadata storage

Mittwoch, 7. September 2011

when does PMON register to remote listeners

I had a complex problem today: I tried to setup a connection manager, but unlike Arup, I did not like to to use SOURCE_ROUTE. So I had to make the pmon register itself to the cman. As we have already an entry in spfile for remote_listener=REMOTE, I just enhanced this alias in tnsnames.ora by the additional line for the cmans HOST and PORT.
Unfortunately the services did not show up in the cmans show services. Not even an alter system register; did any good, still no service.
After checking with tcpdump (there where really no communication to the cman) and oradebug event 10246 I had still no clue how to find out why my pmon does not like to contact the cman. At a short ask for help on twitter, Martin Nash pointed me to the Note How to Trace Dynamic Registration from PMON ? [ID 787055.1]. There I found the event
alter system set events='immediate trace name listener_registration level 3';
With this, (beside a lot of other useful information) I found the pmon just not knowing about the new entries.
As a solution I had to tell it about the new entries in tnsnames.ora by
alter system set remote_listener=REMOTE;
This made pmon to re-read the tnsnames.ora and accept the new values. All my services shows up in cman now.
Yong Huang has some more Informations about the different trace levels here:
Actually, trc_level 4 (user) is enough to show info about load stats. Other levels are:
0,1: off
2,3: err
4,5: user
6-14:admin
15: dev
16-: support

Freitag, 19. August 2011

incomplete list of SRVCTL commands

As I have to dig into srvctl more than I liked to do, I figured the documentation is not complete (at least for my installation of 11.2.0.2):
the Documentation for srvctl upgrade claims
The srvctl upgrade database command upgrades the configuration of a database and all of its services to the version of the database home from where this command is run.

But there is a 2nd option missing totally:
Usage: srvctl upgrade model -s <source-version> -d <destination-version> -p {first|last} [-e <name>=<value>[,<name>=<value>, ...]


in more detail:
srvctl upgrade model -h


Upgrade the Oracle Clusterware resource types and resources.

Usage: srvctl upgrade model -s <source-version> -d <destination-version> -p {first|last} [-e <name>=<value>[,<name>=<value>, ...]
-s <source-version> Version from which to upgrade
-d <destination-version> Version to which to upgrade
-p {first|last} Whether the command is called from the first upgraded node or the last upgraded node
-e <name>=<value>[,<name>=<value>, ...] Additional name value pairs for upgrade
-h Print usage

In general thsi should only be needed during an CRS upgrade, as part of root.sh script. Nevertheless, as it's there it should be documented. Especially the -e parameter seems to be worth more information than the -h docu provides.

Montag, 18. Juli 2011

estimated plan stability

Sometimes I am searching for any method to solve a problem. And after some investigations, mailing lists, direct contact of much smarter people, I come to the conclusion:
It's just not possible!
(Or at least not within reasonable effort).

One of these problems, or more precise questions is:
How likely is the current explain plan for a given SQL statement to change?
I call this

estimated plan stability


Unfortunately there is currently no such feature but at least I can give some examples, what I would like to have:
  • E-rows vs. A-rows
    If they differ a lot (in any line of the execution plan) it might be a hint the plan is far away from reality, or in risk to change?
    Of course for A-rows gather_plan_statistics or similar is needed.

  • Best so far in 10053 trace

    If you ever have analysed a 10053 trace, you might know the line starting with Best so far ....
    If the 2nd best is not far from the 1st, I assume small changes in the data might lead to a different execution plan.

  • Binds outside histogram boundaries

    If a bind variable is outside of the min/max values of a histogram, the optimiser tries to guess how many rows it will get from this predicate/filter. Of course this can be horrible wrong, and should be also shown by my 1st suggestion.


These are only 3 possibilities. They should show some areas of information where I'd like Oracle to collect and provide more data than they do at the moment. Probably they would also be valuable for others? Any other suggestions out there?


Mittwoch, 13. Juli 2011

avoid implicit COMMIT at DDL

Some times, I'd like to have a transaction open, even there is a DDL somewhere in between.
Oracle might have some reasons for the implicit COMMIT, but I have mine to avoid them in my transactions.

Here a small example, how I do it (don't just copy it, it's really small, just an example)!

Let's prepare a package and two tables



CREATE OR REPLACE PACKAGE my_ddl_package AS
FUNCTION do_ddl(p_ddl varchar2) return varchar2;
END my_ddl_package;
/

CREATE OR REPLACE PACKAGE BODY my_ddl_package AS
function do_ddl(p_ddl varchar2) return varchar2
as
PRAGMA AUTONOMOUS_TRANSACTION;
begin

execute immediate p_ddl;
return NULL;
end;
END my_ddl_package;
/

create table t1 as select * from dual;

create table t2 as select * from dual;


And now the testcase


I try to update t1, drop t2 and afterwards rollback the update on t1.
Let's give it a try.


SQL> update t1 set dummy ='U';

1 row updated.

SQL> select my_ddl_package.do_ddl('drop table t2') from dual;

MY_DDL_PACKAGE.DO_DDL('DROPTABLET2')
--------------------------------------------------------------------------------


SQL> select * from t1;

D
-
U

SQL> rollback;

Rollback complete.

SQL> select * from t1;

D
-
X

SQL> select * from t2;
select * from t2
*
ERROR at line 1:
ORA-00942: table or view does not exist


Everything works as expected: update - DDL - rollback

Did I miss something?

If you did not want to read (or think) too much, just the shortcut:
What saved my day? It's PRAGMA AUTONOMOUS_TRANSACTION!

Montag, 20. Juni 2011

ORA-02393 on purpose

Today I got an email by a developer about an error on a test database.
(I shortened the conversation, but tried not to change the content):


Dev:
We get an ORA-02393: exceeded call limit on CPU usage almost immediatedly after executing the following stmt: ...Could you pls adjust the limits?



After some hours of meetings I replied:
Me:

These limits where introduced in test to find exactly those statements which are running too slow for an OLTP application. Can we help you in tuning the statement?

Dev:
I already exchanged details on the issue with [another DBA] and found the cause of the 'longer than usual' execution time (index missing ;-)
Nonetheless 100ms max exec time is a bit too strict for a dev platform imho - but as the ... is going to be replaced soon we won't request any changes to a system/db in the last chapter of its life cycle.


I did not reply yet. Just try to explain my world.

Is a CPU_PER_CALL limit really needed on a test system?


In theory: no.

If the developers implement a perfect code instrumentation, they would know the runtime of every statement. In test it would be evaluated for a proper per-statement performance as well as for an over all performance.

But if they would do so, I would never have received this email. So the world is not perfect. The code isn't, also.

Are 100ms enough time? seems really tough!


Yes, they are.

Even on an 8 years old UltraSparc III+ you can do a lot of CPU cycles within 100ms. And only CPU is count there, no time for IO or other WAITs.

But the biggest argument for this limit: I never got any complaint about it by a good statement. Only by those which needed urgent suport, anyhow.

Montag, 6. Juni 2011

Access to the database

There is a kind of discussion ongoing who should have access to a database, and under what circumstances.
I have never done either of Jeffs or Chets job, so I cannot write anything reasonable about these. But I am sometimes a kind of DBA. This makes me my best source of knowledge about this job, and the attitude I created during the years.

I developed a certain expectation about the differences between an operation guy and others, who are not responsible for productive systems. I even have my private opinion about a sales related job, but that's not my topic today.

From my point of view it's all about control.
For an operational focused person, my work is about control. If there is uncontrolled behaviour somewhere, this will lead to an incident. And that will causes a call. As I like to sleep at least 8 hours per day, that means one out of 3 calls will disturb this sleep (on average).
Of course every incident potentially costs my company money. Or reputation. This will come down to me again. Writing reports. Doing management presentations. Nonproductive paperwork.
After all this additional work, I will do my best to avoid this situation. Analyzing what went out of bounds, led to this uncontrolled behaviour. At the end I even try to change the environment so this will not happen again.
It's all about control. Freak!

So who should get access to my systems? Only those whose I can control. Or at least I can trust?

It's hard to decide whom to trust. At least to which specific level.
Jeffs driving license seems to be a try to formalise this need. But similar to a driving license, it only shows you have the allowance to drive. It rarely tells anything about the skills.

Talking about skills brings one more dimension here: People living in an ecosystem where every action is controlled by a QualityAssurance team are used to go to the limits and beyond. That's great, it's what is expected from them. Otherwise they would do their job bad.
It's just not what I want in a productive environment. There boundaries are, to never be reached.

I put a more pragmatic approach: If someone takes responsibility for the work, it's fine for me. As an example: if the person who added a big bug during a small hotfix at 5pm is called at 2am the next morning to fix this bug, I'm fine.
And just one more dimension: I favour to grant access to people who know what they can break with this: Just ask me for any access to v$ views in my DB; you will get it, just after you show and explain me how you can halt the application if you abuse it.

Need a short summary?
I like to control access to my database, so I limit it to people who I trust.
To gain it, show me your responsibility and knowledge.

Mittwoch, 1. Juni 2011

Artist of Farmer?

This morning I read a twitter message by Jeff Smith (aka @hillibillitoad):
OH: "Once we get our system running, we don't touch it." Yeah, that generally works pretty good.
I like Jeffs tweets, blogs and comments as he is a very smart guy and still keeps his mind open for other ideas.
In this particular case I have to contradict, but I was not able to condense it into 140 chars.



Most people in the IT business seems to follow the sentence "never touch a running system" like a commandment. For me this often sounds like the "please don't touch" in an art museum.



This brings me to an interesting question: Do these 'most people' see themselves as artists, and their work as art?


Let me give you another picture:


Imagine a farmer which seeds the crops in spring, did really everything right and then sits down and does not touch his running system. You might guess his harvest?



So what's the big difference here?
Artwork most of the time is first created for a dedicated purpose. As long as the purpose does not change, it's expected to satisfy this purpose.
The purpose of an artwork slowly changes, I'm quite sure most of the time it can be measured in decades or centuries.
Also plants are seeded for a dedicated purpose. But their purpose is the rapid change. To live, to growth, to get harvested after some time. So the farmer has to look after his crops all the time. In the best case, he even can improve and steer the change to his advance.
For me the big difference between an artist ('don't touch') and a farmer ('care and steer') is the timescale of the changes. If you expect your work to be never changed (and in a definition of 'life' things which do not change are just dead) prevent them from any interaction.
I prefer living systems. So I take the duty and care for them.

Mittwoch, 16. März 2011

who owns that directory?

I'm talking here about the DIRECTORY object in Oracle, not the thing you might know from several OSes.

Most objects in a current database can be created for any user, but a DIRECTORY can't. Please check the CREATE DIRECTORY definition, there you can see Description of create_directory.gif follows
No optional schema. anywhere, as you can see with tables, indices and similar objects.

But that's not the end of the story. There is a view which shows you all the objects in a database: DBA_OBJECTS. And there is a owner. allways!
So let's check all the owners of directory objects there:
select distinct owner from DBA_OBJECTS where object_type ='DIRECTORY';

OWNER
------------------------------
SYS

As there is a owner defined somewhere, we also can change it. It's only a question of effort.
Unfortunately there is no ALTER DIRECTORY between ALTER DIMENSION and ALTER DISKGROUP. But it is still defined somewhere, so it can be changed!

Now I'm leaving the well prepared area of Oracle Documentation, but that just increases the fun.

Let's start with the definition of DBA_DIRECTORIES. It's somewhat easier as DBA_OBJECTS and stil shows an owner(!) What a nice sense of humor.
CREATE OR replace FORCE VIEW "SYS"."DBA_DIRECTORIES"
("OWNER", "DIRECTORY_NAME", "DIRECTORY_PATH")
AS
SELECT u.NAME,
o.NAME,
d.os_path
FROM sys.user$ u,
sys.obj$ o,
sys.dir$ d
WHERE u.user# = o.owner#
AND o.obj# = d.obj#

For the following tests I created a DIRECTORY D1 with the os-path /tmp/d1.

Checking sys.dir$ I found obj# 42985 for this directory.

Just to make sure, jet's check the owner of that object. I don't trust oracle, there is too much hardcoded magic inside. But not at this point:

select owner# from sys.obj$ where obj#=42985;

OWNER#
----------
0

Now I create another user and try to give him the ownership of this particular object - without any grants. The owner of an object should be able to access the object anyhow.

The user berx is equiped with CONNECT and CREATE PROCEDURE:


With user berx let's try to access that DIRECTORY:

declare
f utl_file.file_type;
begin
f := utl_file.fopen('D1', 'something.txt', 'w');
utl_file.put_line(f, 'line one: some text');
utl_file.fclose(f);
end;
/
You will expect something like

ERROR at line 1:
ORA-29289: directory access denied
ORA-06512: at "SYS.UTL_FILE", line 41
ORA-06512: at "SYS.UTL_FILE", line 478
ORA-06512: at line 5

Now let's give user berx the ownership of that directory.
In my case user berx has user$ 41.

So I can (as sys)

update sys.obj$ set owner#=41 where  obj#=42985;

back as user berx - same testcase ....
-- still ORA-29289

Ok. Now we know Oracle does not check the owner of that particular object to check permissions. One of their shortcuts in the code.
So let's give poor user berx the GRANTs the ordinary way:

SQL> grant read, write on directory D1 to berx;

grant read, write on directory D1 to berx
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error

Not the effect you expected? Well, maybe one of the side-effects of my unkind interaction with the data dictionary and Oracles shortcuts.

But as I know what I did before, I can revert all the things:
SQL> update sys.obj$ set owner#=0 where  obj#=42985;

1 row updated.

SQL> commit;

Commit complete.

SQL> grant read, write on directory D1 to berx;

Grant succeeded.


And with a last test as user berx:

PL/SQL procedure successfully completed.

Everything is fine again!



Comment:

This post was created out of some older documents and my holey brain. But the main theme should be clear.

2 spfiles - some confusion

At the moment I'm in a big migration project from 9i to 11gR2.
Most of the migrations are done like
  • create new DB

  • exp/imp

  • delete old DB


Even we have templates for all these steps, sometimes something goes wrong.

Today I hit such a something:
As we use Clusters for most of our DBs, cluster_database=TRUE should be set, otherwise it's generating problems. At the Creation of one DB, the 2nd instance did not come up. After some research I found out cluster_database was set to FALSE. No big deal, let's change it:
alter database set cluster_database=TRUE scope=spfile;
But after a srvctl stop instance -d DB_UNIQUE_NAME -i INSTANCE_NAME

srvctl start instance -d DB_UNIQUE_NAME -i INSTANCE_NAME
a show parameter cluster_database still gave me FALSE. This confused me!
So I checked $ORACLE_HOME/dbs/initINSTANCE_NAME and there I found spfile=+ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.268.745343891 +ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.269.745406153 there where 2 spfiles in the pfile-definition!
After a

create pfile='/tmp/pfile1' from spfile='+ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.268.745343891';
create pfile='/tmp/pfile2' from spfile='+ASM_DG/DB_UNIQUE_NAME/PARAMETERFILE/spfile.269.745406153';

and a diff /tmp/pfile1 /tmp/pfile2 I decided to use only the 1st spfile in the $ORACLE_HOME/dbs/initINSTANCE_NAME and apply all the differences manually.
Now both instances are up and fine - I just have to delete the correct spfile in ASM.