Maxim, For ad-hoc activity, your best bet for a data model is dimensional <http://en.wikipedia.org/wiki/Dimensional_modeling> (a.k.a. star schema) with facts and dimesions. Range-partition the fact table(s) by date. Use the EXCHANGE-PARTITION load technique described in this white paper <http://www.evdbt.com/TGorman%20TD2009%20DWScale.doc> and presentation <http://www.evdbt.com/OOW09%20DWScaling%20TGorman%2020091013.ppt>, allowing you to use INSERT+APPEND, nologging if necessary, TRUNCATE to remove data. Mostly direct-path loads for table data and index creation, very little "conventional" DML -- everything you're wishing for. :-) If you have any slowly-changing dimensions, first load the "type 2" (time-variant) dimension table(s) using the EXCHANGE-PARTITION load technique, then re-load the "type 1" (point-in-time) dimension table(s) using the exchange-partition technique illustrated on slides 25-29 of the presentation <http://www.evdbt.com/OOW09%20DWScaling%20TGorman%2020091013.ppt>. Again, no need for expensive "conventional" operations; large volume data manipulation is faster with INSERT+APPEND in parallel. No need for all this is the dimension is not slowly-changing, by the way. That's it in a nutshell. I've been using this stuff for ~14 years and talking to the world about it for over 8 years now. It's not easy to do, but it rocks. ------------------------------------------------------------------------ By the way, I'll be doing my 2-day seminar on this topic (a.k.a. "/Scaling To Infinity: Partitioning Data Warehouses on Oracle Database/") for Oracle University in Munich Germany on 22-23 October <http://education.oracle.com/webreg/ShoppingCart?arg=1&doccode=D70363_1657076&av_id=&org_id=34&territory_code=DE&lang=D&fee=&event_id=3253864&p_return_url=http://education.oracle.com/pls/web_prod-plq-dad/searchResult.searchEngineParser?p_search_keyword=Gorman&p_org_id=1001&p_lang=US&p_country=34,41&p_search_location=&p_search_category_id=&p_search_format=&p_search_start_month=OCT&p_search_start_year=2012&p_search_end_month=NOV&p_search_end_year=2012&p_page_number=1&p_search_from_date=&p_search_to_date=&p_forceExpandedSearch=N> and in Utrecht Netherlands on 25-26 October <http://education.oracle.com/webreg/ShoppingCart?arg=1&doccode=D70363_1657076&av_id=&org_id=41&territory_code=NL&lang=NL&fee=&event_id=3253866&p_return_url=http://education.oracle.com/pls/web_prod-plq-dad/searchResult.searchEngineParser?p_search_keyword=Gorman&p_org_id=1001&p_lang=US&p_country=34,41&p_search_location=&p_search_category_id=&p_search_format=&p_search_start_month=OCT&p_search_start_year=2012&p_search_end_month=NOV&p_search_end_year=2012&p_page_number=1&p_search_from_date=&p_search_to_date=&p_forceExpandedSearch=N>. I'll be doing my 60-90 minute version of the 2-day seminar at the Slovenian Oracle Users Group in Llubjana Slovenia 15-16 October <http://www.sioug.si/index.php?option=com_content&view=article&id=201&Itemid=180&lang=en>. I'm glad to do one or more of the 60-90 minute presentations over a GoToMeeting webinar as well, if anyone would be interested? Please let me know if you have any questions. Hope this helps... -- Tim Gorman consultant -> Evergreen Database Technologies, Inc. postal => PO Box 352151, Westminster CO 80035 website => http://www.EvDBT.com/ email => Tim@xxxxxxxxx mobile => +1-303-885-4526 fax => +1-303-484-3608 Lost Data? => http://www.ora600.be/ for info about DUDE... On 9/4/2012 3:10 PM, Pakhutkin, Maxim (Max) wrote: > Hello, > > I'm looking for some advice for designing a process for generating reports. > The reports are ad-hoc, i.e. it's safe to assume that the reports will be > generated continuously and concurrently, no start and finish point or quiet > time. This basically data-warehouse-type process needs to live in a > heavily-OLTP type system, so we hope to reduce datafile and redo I/O overhead > as much as possible. > > The direction we've been going so far: > 1) We are hoping to run multiple database sessions each working on its own > set of data at a time (this corresponds to one report). This way we can add > more processes if we need to speed things up (plenty of CPU and storage on > the system). These sessions are called renderers (they render reports) and > each will use its own render_key value to access its own data. > 2) The processes would copy some detail data for each report into their own > tables, do various joins of this detail data with other data to generate data > for various sections of the report. > 3) The detail data does not need to be restorable, so we are hoping to use > INSERT+APPEND to put it in, TRUNCATE to get rid of it, and NOLOGGING on the > tables/tablespaces that store it, to avoid the I/O overhead of using plain > INSERT/DELETE. > 4) We don't want to use GLOBAL temporary tables, if possible, because in > addition to the render process accessing the data for its report, the > "orchestrator" process will need to access the same data. Also, we want to be > able to easily collect optimizer stats on the data and don't have to set the > stats, which may be error prone and is a support overhead. > 5) after each set of data is inserted, stats would be collected on that data. > > The goal is to get maximum speed out of it, but also hopefully to keep code > fairly reasonably maintainable. > > At first I thought of partitioned tables, with each process working on it own > partition. But I found out that INSERT+APPEND and TRUNCATE PARTITION don't > play well with concurrent access to the same table (even though different > partitions of the same table are being accessed). Collecting stats also would > take out latches on the whole table in library cache, despite us collecting > stats only on partition level. For all of the above we get library cache > waits of various kinds, slowing things down significantly. > > Then we thought of using separate tables, but joining them into a union_all > view (for ease of code maintenance), that way simulating a partitioned table. > Each separate table would still have one column filled with the same value, > dedicated to identifying which render process would work on it. > > CREATE TABLE X1 ( > Render_key number, <-- this column contains the same value for all rows in > the table, and the appropriate bill rendering process would supply this value > in queries to get only its own data from the union-all view. > ... > ); > > CREATE TABLE X2( > Render_key number, > ... > ); > Etc. > > Now we can INSERT+APPEND, but TRUNCATE doesn't work too well, as some > processes accessing the view get "object no longer exists" error due to other > processes truncating their tables that are part of the same view. (This may > be partially because oracle tries to use a very selective index that exists > on each of these tables first and then filter it by render_key identifier, > instead of reading the render_key column histograms first, realizing that > there is only one table where that column will have the right value and not > touching the other tables, as I hoped it would. But I may be wrong and it > would do that even without the index). > > I can't think of any other approaches that would get us what we want (speed + > ease of code maintenance) and am ready to recommend to the developers that > each process work in its own schema with its own tables (so code and tables > would be duplicated everywhere, making maintaining it a harder). > > Can anyone think of other architectural alternatives that could give us what > we want? > > Thanks in advance. > > -- > Max Pakhutkin > > -- > //www.freelists.org/webpage/oracle-l > > > > > -- //www.freelists.org/webpage/oracle-l