SSIS Performance Issue
By Martin Andersson | 10/18/2011 10:51:34 PM |
3 Likes
My problem is rather simplistic in it's own terms.
First a little background information, I'm a consultant and I've been working with SSIS for about 1 year now and it's been a fun ride.
However lately I had been running into some performance issue, while moving data from file to stage to dw.
SSIS package running time had changed from 30 minutes to 120 minutes, not really acceptable in my world.
So my investigation began! I primarly used DETA and SQLP and common sense.
This helped me find out that my tables used in SSIS were being used by other people/profiles.
Someone had been given access to create index, views, stored procedures in our staging area.
After a quick chat with the person, we've solved both our issue, I redirected him to the proper database.
By both our issues I mean, he was also experiencing the same problems as I with long execution times.
Regards Martin.
When Indexes Attack !!!
By Vishal | 10/18/2011 1:30:51 AM |
2 Likes
I recently started working on an issue where performance has been going down couple of times a day during peak hours.
There are around 400+ tables in database, out of which 8 are most important tables. All of these 8 tables has around 60 indexes on them each (rounding up here..). out of these 60 indexes - 20 indexes have 0 reads (seeks + scans + lookups) - around 10 indexes have less than 100 reads.
The writes on these indexes are 1.5+ million ( user_updates).
These numbers can be used since SQL Server has been running for 230 days.
What has happened here is the developers had been running profiles to capture workloads and using DTA to create indexes, which has led to many *useless* indexes being created. Moreover, all of these indexes are created on single column - exactly as suggested by DTA.
We dropped all the unnecessary indexes, (after testing workloads and checking with Vendor - this was the toughest part :) ) and the issue which happened almost daily, is happening once a week (still need more investigation... )
Lets add a table to production
By tjay belt | 10/4/2011 9:57:24 AM |
1 Likes
Development decided to add a table for performing validations. This table was to have a record for each record of our credit card transaction processing system. Records could be archived after 3 days.
We collected data. On day 3 the automated job kicked off to delete, based on date, and the process took 9 hours to complete and caused various issues.
It wasn't until we fought the system for a few weeks and finally rearchitected the delete solution to generate a list of IDs to be deleted, initially based on the composite primary key and secondarily based on the index of date, then deleted those that matched that list that we could perform the operation in minutes, rather than hours.
the Ghost Plan
By Craig Outcalt | 9/21/2011 2:23:19 PM |
5 Likes
We had a client that comprised of about 70% of our customer data and typically if we were going to encounter performance problems it was going to be with that client.
It makes sense because the execution plan is going to change for that client because of the distribution of data.
One particular occasion there was a web report this client was running that was not finishing. They were getting a session timeout on their browser.
I was called, hooked up profiler and grabbed the stored procedure parameters. I boldly fired up the query analyzer, pored over the execution plan and took it from running over a minute and a half to running in under 5 seconds. All I did was add an index and (more importantly) swapped out 2 correlated subqueries for temp tables.
BOOM!
I did the DBA touchdown dance and went back to my normal life.
A couple of weeks roll by and they call again with the same problem.
I fire up profiler and lo & behold the execution times are around 5 seconds and suddenly they can't reproduce the error.
I shrug it off until the next week when the same problem happens.
I fire up profiler and once again I see that execution times are fine and they can't reproduce the error.
This proceeds off and on for a few months until the rest of my hair fell out.
Sensing some missing facts, I set about on a path with a Sr. web developer to finally discover that it was an Ajax rendering problem, and that my finely tuned query
was never being displayed or used and it was never even the source of the problem!
The client was told to use Firefox and their problem was solved.
The morale of the story is that we can go astray when we don't have all the facts.
The second morale is that you can always find a performance problem if you look for one :)
It was a Dark and Stormy Night
By Karl Lambert | 9/21/2011 8:02:18 AM |
5 Likes
It was a dark and stormy night… Ok, maybe that is a little too dramatic, but if you have ever worked for a financial business, you know full well how dark and stormy it can be in the office if “the business” is waiting on financial data to run reports on.
We had already gone through the rigmarole of log-shipping to a read-only reporting database. “The business” did not like the idea of 15-minute refresh times that all too often missed their mark due to one of a plethora of issues we encountered getting log backups from point (a to point (b and restored correctly. We brought in a third-party application to replicate the SQL Server 2000 database to the SQL Server 2005 reporting database using CDC, which made updates “near-real-time”. The business was somewhat satisfied with that solution; however, it was not free of its own issues. One issue in particular dealt with a summary table.
The main application that this database supported would truncate a particular summary table, and rebuild the data in the table from scratch – each night. After the truncate, a process would gather data from several detail tables and sort it, among other (very important) tasks, creating a file that would then be BCP’d back into the summary table. Sounds simple enough, right? This summary table would grow – thanks to said application, and the way that it used this summary table – by upwards of over 30 million records per month. (Apparently, the vendor did not factor in scalability to their design plans, but that is a topic of a totally different Jerry Springer episode.) So here was our dilemma: The source database would receive a BCP load of approx 150M records every night. That load would take somewhere in the neighborhood of 2.5 – 3 hours to execute. Not too bad, we thought, considering all of the factors that we had in play. The problem was that we were replicating this table, with our new CDC third-party application. The truncate would replicate without issue, typically in milliseconds, but when the BCP would start, it would leave CDC in its dust, as we could only get about 10k records a second to process. While the source database was complete in 2.5-3 hours, the reporting database was complete in about 6-7 hours. This time-frame was encroaching on our release time of 8 AM, and we would miss that on occasion due to the fact that there were precious few PKs in the database, and updates oft failed when records could not be verified, etc.
We discovered that the CDC application could “ignore” named transactions, if requested to do so. We decided to use this functionality – after yet again missing the 8 AM release time to the business –and not allow replication to “play catch-up”, waiting for those 150M+ records to replicate. We created a stored proc that essentially named a transaction that would use SQL Server’s BULK INSERT to insert the sorted data file into the source table, and then BCP the same file into the reporting database. Because BULK INSERT shares SQL’s memory space, and BCP does not, we started having the reporting database actually available BEFORE the source! We effectively saved over 4 hours of overnight processing, and were able to eliminate a lot of the issues we would encounter due to long-processing times. Most importantly, we bolstered application stability, and improved our track-record with the business thanks to a more reliable (and SQL internal) process.
A day in a DBA life
By Mauro Pichiliani | 9/18/2011 11:58:58 AM |
12 Likes
Here is my entry for the contest. Please, read the following as an entry in a DBA's personal journal. And yes, unfortunately the following facts are based in a true storey.
07:50. My arrival at work. I came 10 minutes before the start of my turn in order to see if I can start ahead some work before my coworkers and my by boss arrive although I did some extra time yesterday. As always, one of the programmers has on its disk two full spread pages of a morning tabloid open. He says hello to me and start to talk about the latest celebrity scandal.
08h:10. Finally the programmer realizes that I was not interested in the details about the celebrity affairs. I sat down on my not so comfortable chair (the department's boss is the only one that have a good leather chair) and log in. I found 25 e-mails marked as not read: 8 are about a discussion regard an user unhappy about the ERP system, 10 are spam, 1 is a greeting about an upcoming birthday of someone that I don't know. The rest of them I cannot understand how the hell they are forwarded to me since they talk about the poor specification of a system that don't even use a database.
09h:05. Writing an e-mail to the infra guys on which I ask for a follow up about the restart problem of the servers. Also querying them about other issues such as: the delayed purchased of more memory to a database server (one week delayed), the purchase of another server to create the second SQL Server cluster node of a major database (one month delayed), the installation of the latest service pack on all the Windows servers (lost track about the delay of this one...) and requesting the firewall authorization to access the www.microsoft.com website that was blocked by mistake for my net user. I remember to send another e-mail for the help desk guys asking to replace my monitor, since it show a horizontal green like across it for the last three weeks. Last time one of the help desk guys came they replaced my optical mouse by an analog one and don't even looked at my monitor.
09h:25. The analysts start to receive phone calls from the users while I go get a coffee. I don't even get mad to find out that today's coffee is basically what left from yesterday warmed up. I go for a tea while I hear the following phrase said by an analyst during a phone talk with some user:
"bla bla bla... the system is working perfectly... bla bla bla... do you want me to fix it using the right or the quick way? Bla bla bla..."
I think to myself: this is going to be a long day. No signal of the boss yet.
09h:50. The analyst talks to me and say that the HR system is extremely low today. The responsibility about this system was upon a programmer that quit last week and there wasn't a person assigned to handle it. This was the start of a great performance problem that took the rest of my day. The analyst said to me that this was a major task and has the biggest priority because if this system is not working on time the HR guys cannot print the papers necessary to send all the employees month payment. A little bit of physiological pressure is in order, since if I cannot solve this on time everyone in the company will receive it's payment with delay and I will be the one to receive the full blame.
10h:15. While I was looking the SQL Server code used by the specific task some analysts join a gang to lower down the air conditioning level and the interns, programmers and other department's member gather to battle for the increase of the air conditioning level. I can hear the sound of knuckles snapping. Meanwhile I try to focus on a stored procedure's code that contains strange variable names such as SATURN1 to store an employee code, PLUTO69 to store the number days worked...
11h:45. This programmer was really something. He has the ability to use everything in the database in the wrong way. For instance, he used cursors to navigate in a temporary table that has 100.000 rows and no index. So I start rewriting it to employ many SELECT statements. While looking for the stored procedure I found more interesting things that can fit perfectly in a horror show: instead of triggers with many levels of recursion! And all of this copped with queries with SELECT *, many updates without a WHERE clause and on top of that the following compled select that store its data in a temporary table:
select distinct
TIT_ID,dev_cpf CPF,MAX(substring(DEF_FONE,1,4))DDD,MAX(substring(DEF_FONE,5,8))FONE,SALDO, ATRASO, TERMINAL, SEGMENTO, PARCELADO, UF,replace(replace(substring(acn_descricao,1,2),'Re','19'),'-','') COD_ATEND,convert(varchar(30),dea_data,103)AS DTE,convert(varchar(30),DATEPART(HH,DEA_DATA))+':'+convert(varchar(30),DATEPART(mi,DEA_DATA))+':'+convert(varchar(30),DATEPART(ss,DEA_DATA)) AS TME,USU_CODIGO_ACESSO AS AGENT,ISNULL(epo_codigo,'0') PESQ_INADIMP,replace(substring(DEA_HISTORICO,charindex('- Agendado para',DEA_HISTORICO),26),'- Agendado para','')DT_AGENDA,TPA_DATA_DEVOLUCAO DT_BAIXA
into _SEMANAL_PASSADA
from tab_devedor_acionamento (nolock)
inner join tab_devedor (nolock) on dea_devedor=dev_id
inner join tab_devedor_telefone (nolock) on def_devedor=dev_id
inner join tab_acionamento (nolock) on dea_acionamento_id=acn_id
inner join tab_usuario (nolock) on dea_usuario=usu_id
inner join #BRTC(nolock) on tit_devedor=dea_devedor
LEFT JOIN TAB_TITULO_PARCELA (NOLOCK)ON TIT_ID=TPA_TITULO AND TPA_STATUS='D'
inner join tab_carteira (nolock) on car_id=tit_carteira AND CAR_DESCRICAO<>'TESTE'
left join TAB_ENQUETE_RESPOSTA (nolock) on ere_devedor=dev_id AND CONVERT(VARCHAR(30),DEA_DATA,103)=CONVERT(VARCHAR(30),ERE_DATA,103)
left join TAB_ENQUETE_RESPOSTA_ITEM (nolock) on ERI_ENQUETE_RESPOSTA=ERE_ID
left join TAB_ENQUETE_PERGUNTA_OPCAO (nolock) on epo_sequencia=ERI_ENQUETE_PERGUNTA_OPCAO
where
DEA_DATA BETWEEN '2009-02-10 07:00:00.000' AND '2009-02-23 22:00:00.000'
and acn_grupo_acionamento IN (1014,1016)
and tit_cliente=60
AND TIT_CARTEIRA = 23908 --fixa
AND CAR_ID=23930
group by SALDO, ATRASO,TIT_ID,dea_id,dev_cpf , dea_data,acn_descricao,USU_CODIGO_ACESSO,usu_nome,ACN_DESCRICAO,TERMINAL, SEGMENTO, PARCELADO, UF,epo_codigo,USU_CODIGO_ACESSO,replace(substring(DEA_HISTORICO,charindex('- Agendado para',DEA_HISTORICO),26),'- Agendado para',''),TPA_DATA_DEVOLUCAO --,epo_codigo--,COD_ATEND, DTE, TME,AGENT
order by dev_cpf,tme
I used the great on-line SQL formater (http://www.dpriver.com/pp/sqlformat.htm) to simplify this query. Needless to say, I send a few hours trying to understand it and to break it into simpler SELECT statements. By the way, no launch time for me while I did not fix the HR performance problem. During the study of the query I noticed the triumphal boss entrance. I could hear the sound made by multiple ALT+TAB keystrokes and small talks being stopped abruptly. As background noise I could almost hear the Star Wars imperial march while the boss walks toward its desk at the end of the room where he can see what is going on in the monitor of everyone in the department.
14:27. After understanding the query I did a simple test case to ensure that any modification on it do not produce different results. The query received some parameters for the columns (such as TIT_CARTEIRA and CAR_ID) but I was able to use a technique called dynamic search conditions to isolate then and be more fast. For more information about dynamic search conditions check this link: http://www.sommarskog.se/dyn-search-2008.htm.
16:15. I did my best to break the big and complex query and to optimize it with indexes and other techniques. In this exactly time the internet is down. Luckily for me, I did have a local copy of the code I was working on. But other analysts and programmers start a game of poker, play the harmonica and I could swear I hear a guitar riff nearby.
17:20. After more work in the stored procedure I was stuck in a specify query that seems OK but took more time than it should. I tried many ways to optimized it but I still could not do it. A quick look at the execution plan show that some parallelism has been done by the optimizer but some of the cores of the Intel 16 cores processor were not being used during the query execution. I recall a great new approach to use parallelism via a .NET assembly that I studied a few weeks ago when I was writing an article to compare the alternatives to explore parallelism in SQL Server. The following link show the assembly I used and that was proposed by Alan Kaplan in an article called "Asynchronous T-SQL Execution Without Service Broker": http://www.codeproject.com/KB/DATABASE/ASYNCHRONOUSTSQL.ASPX
18:30. Internet's back. I adapted the long running query in 5 threads in order to split the load. What I did was to separate each filter in the WHERE clause with a simple select statement. The result of the separated queries stored its contents in a temporary table. For instance, the following piece of the select statement:
select ...
where DEA_DATA BETWEEN '2009-02-10 07:00:00.000' AND '2009-02-23 22:00:00.000'
and acn_grupo_acionamento IN (1014,1016)
and tit_cliente=60
AND TIT_CARTEIRA = 23908
AND CAR_ID=23930
Was broken down to 5 threads something like this:
-- Thread 1
select ... where DEA_DATA BETWEEN '2009-02-10 07:00:00.000' AND '2009-02-23 22:00:00.000'
-- Thread 2
select... where acn_grupo_acionamento IN (1014,1016)
-- Thread 3
select ...where tit_cliente=60
-- Thread 4
select ...where TIT_CARTEIRA = 23908
-- Thread 5
select ...where CAR_ID=23930
Then I executed these 5 queries in parallel with Alan Kaplan's assembly and stored its results in a temporary table. The rest of the processing (aggregations, joins, etc) was done directly in the temporary table. I did notice that with this approach every core of the server now was busy and the query runs much, much more faster. In the end, the results vary according to the values of the parameters, but in some cases the speedup go up to 5x.
To finish up, I made some tests in the development environment with the analyst and a programmer. Then we implemented it on the production environment while our boss was telling that every extra hour will not be paid in money and must be stored in a hour bank that will be dealt with later on. When we tested the HR application with the user we received a simple answer:
'It was about time that the IT departments solved this one. Everyone was expecting it a few hours ago! Why did it so long guys? I mean, you probably just have to push some buttons or change some parameters to make that specific function in the application runs faster, isn't it? I bet any of our interns could have done it in 5 minutes'.
Process hundreds of mill records in less then an hour
By Doron Farber | 9/17/2011 7:10:12 PM |
5 Likes
I tried to consult with different SQL guys including one MS SQL MVP and none provided me with the method of how to move hundreds of millions records in a quick way instead of spending days for this process. Eventfully one person which is also MS SQLMVP suggested me to use small batches. I tried that and that reduces the process by far in the most amazing way and also could control the size of the log file.
Also to get the ultimate performance see this article: http://dfarber.com/computer-consulting-blog/2011/1/14/processing-hundreds-of-millions-records-got-much-easier.aspx
Processing hundreds of millions records can be done in less than an hour based on small batches...
Handling of hundreds of millions of records requires a different strategy and the implementation should be different compared to smaller tables with only several millions of records.
Say you have 800 millions of records in a table and you need to delete 200 million. Obviously you can use this code:
DELETE FROM Users WHERE DOB = ''
The problem is that you are running one big transaction and the log file will grow tremendously. The process can take a long time. It also depends on the speed of your server as well.
The solution is to use small batches and process 1 to several millions of records at the time. This way the log file stays small and whenever a new process starts, the new batch will reuse the same log file space and it will not grow.
Before I used the batch process technique, in one time I had 250 GB log file when I tried to split the Address field.
In order to improve the process with even less time, then I added a clustered index on the PK and with a combination of small batches I can handle 300 mil in less than hour, and that was major at the time. Even with WHERE condition 1 clustered index was enough.
Using MS Sql Server to Advantage in Distributed Environments
By Nadir Doctor | 9/17/2011 4:42:28 AM |
37 Likes
Providing accurate statistical information to an optimizer is a simple & efficient way to obtain optimal performance. These are some MS Sql Server links which I've found to be very helpful in this regard -
o Best Practices for Managing Statistics - http://technet.microsoft.com/en-us/library/cc966419.aspx#XSLTsection135121120120
o Using Statistics to Improve Query Performance - http://msdn.microsoft.com/en-us/library/ms190397.aspx
o Recommendation # 8 of Top 10 Best Practices for SQL Server Maintenance for SAP - http://technet.microsoft.com/en-us/library/cc966447.aspx
My work takes me to various customer sites and I would like to narrate an example of performance tuning which also involved a combination of DB2 & Oracle. This specific client was using DB2 as a "federated" data server and MS Sql Server & Oracle data sources were configured as remote ones. Some queries were performing horrendously & on further analysis, it was discovered that some crucial tables had stale statistics. An enhancement request was provided to IBM to add a column like remote_stats_time to syscat.nicknames & sysproc.fed_stats objects to make it easier for a dba to determine when statistics was updated the last time at remote data sources from a consolidated perspective.
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.sql.ref.doc%2Fdoc%2Fr0021734.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.swg.im.iis.fed.query.doc%2Ftopics%2Fiiyfqnnsvcl.html
My I/O Nightmare
By Laerte Junior | 9/14/2011 5:26:17 AM |
8 Likes
My I/O Nightmare (an excerpt from my previous written article)
Imagine the scene:
You're at home on a Friday night, choosing to watch movies on the weekend - like a good nerd, the complete collection of Star Wars - when the phone rings.
X: "Laertes my friend, it is XXXXXX. I URGENTLY need you here tomorrow. I already bought the plane ticket." - (Detail as to where he lives: almost 4,000 km away.)
Me: "Ok XXXXX, but what happened?”
X: "Man, I NEED to put 3 stores and a distribution center online in 2 days, and the disks are HUGE LOADED, otherwise I will lose the contract with this guy"
Me: "OK Man, I am going, I’m going!"
Well, when I arrived there and saw the big picture, I really did see his problem. I never in my life saw disk counters as I saw in there.
The client had purchased a really nice, latest-generation server (SAN and Blade, 32 GB RAM, 16 processors, W2K8 64, SQL2K8 64), and the whole disc apparatus was set up according to Microsoft best practices (block size of 64K, stripe size of 128K, set raids 0+1 to separate log and data).. But some counters were completely out of the ordinary. On one of the more accessible databases (Approximately 400 GB – out of a total of almost 4 TB of user Databases), the disk readings were really quite scary. Many of these were way out of the standard norm.
I do not like the things that show you the resolution automatically (as DETA), but in this case I needed some magic.
PS - The server was with 30 stores online and more than 1500 simultaneous connections
Then, I remembered to read about the missing indexes DMVs on SQLServerCentral.com. And I used the script posted there.
The DMV Surprise
When I set out to solve this problem, I focused on the avg_user_impact and user_seek columns because I was more interested in improving 40% of queries by a factor of 50 than 90% of queries by factors of just 2 or 3. Admittedly, once I was done with that I had a second job of reviewing the unused and little-used indexes, review queries, take off cursors etc., but that’s an ongoing project, and not something I’m going to even mention here. I didn’t bother with fill factor at this point (although it is very important) because I just didn’t have time to analyze the load that these indexes would come under.
To tell the truth I was expecting an improvement, just not one like this, as I am deeply skeptical of anything suggested automatically. Sure, I’ll use the Index Tuning Wizard very occasionally, but I really prefer to manually review queries. In this case, because of those phenomenal improvements to the numbers, I decided to understand a little more about these DMV's, and I had some questions:
• When a query which uses this index is run, is the user_seeks column really updated?
• When a query which uses this index is run, is the last_user_seek column really updated?
• When columns are suggested for the index, does the optimizer take into account selectivity, density and other statistical information to make that suggestion, or does it just look at the order of WHERE clauses?
• Does the avg_user_impact column report the actual percentage of potential performance improvement
Conclusion:
Well, given that the predicted avg_user_impact was 98.71%, and I had 4471 ms brought down to1308 ms, the data from DMVs was pretty close. My last question was answered - The avg_total_user_cost is a ‘real’ number, or at least is very, very near the mark.
This is possibly one of the better features included in SQL Server. Obviously these indexes are only suggested, and the system may become very slow if we create too many indexes on the table. Nor can we forget that these indexes will also be using disk space. So analyze the potential impact before creating them. One thing I can tell you for sure is that we can trust the data of the “Missing indexes” DMVs. I hope this fantastic feature can help you like it helped me.
Finally in the Monday, the 3 stores and DC are up and the disk load 100% better than saturday.
You can see the complete history with statistics and measures here in my Simple-Talk article:
Reducing I/O with the 'Missing Indexes' DMVs
Mix and Match Nightmare
By Hakim Ali | 9/13/2011 5:53:41 PM |
4 Likes
In one of my previous jobs, I worked for a company that provided consulting services to hospitals for healthcare reimbursements from the government. This was a very highly data-intensive operation, where we accepted incoming data from various sources including government agencies (Medicaid and Medicare), hospitals, and private insurance companies. Of course, all these sources had their own custom formats for data export, and these formats were neither consistent from one financial period to the next, nor were they compatible with each other. Just getting the data pulled out of these sources and imported into a relational database was a nightmare, but that is not the jest of this story.
Once we had all the data imported, we processed it. Fortunately, the solution was designed to handle all business logic, processing and number crunching with the SQL Server database. Unfortunately, it had not been designed as one solution, but rather as bits and pieces that handled one specific thing, and were put together into the application as an afterthought.
One of the most important parts of this solution was a module that attempted to match records from different sources. So say you have a record from an insurance company, we had to match it to various other data sources to find and verify the matching records in all places and ensure the claim was recognized by all parties involved. Except, the matching was not straightforward because sometimes it was a 1:1 match, sometimes 1:many, sometimes many:1, and worst of all, also many:many. This was further compounded by reversals, refunds, false charges, unrecognized charges, combined reversals (two initial charges combined and refunded as one amount), and various combinations of the above.
The SQL stored procedure that did this was written a long time prior, and was chock full of code using "in", "not in", rerunnning the exact same selects multiple times (instead of storing results in temporary tables or variables), and not using any indexing at all! On a large project, this module alone would take 48-72 hours. Even allowing for the fact that it handled 80+ complicated business rules, that was a ridiculously long wait time. I rewrote this module from scratch using joins instead of "in" and "not in", added indexes, stored results in temp tables for reuse, and managed to get the run time for this process down to under ten minutes for the same sized projects. It was a very big accomplishment for me, and was recognized by the company's leaders as a very significant improvement in our processes.
Operation Crown Jewel - TCP\IP vs. Named Pipes
By Robert Pearl | 9/11/2011 4:35:35 PM |
6 Likes
Allow me to break the ice, and share one of my performance stories with you with respect to SQL Server. In this instance, the problem wasn’t missing indexes or un-optimized code. It required a little out-of-the-box contemplation, but nonetheless was the cause of extremely slow performance, and the clients were not happy!
It was quite a number of years ago at an e-commerce company that still was running its core business on v6.5. Even after v.7.0 has been out for a year with SP1 already released. After my assessment, I issued a DBA report that the entire business is at risk due to its lack of adaption of current technology. Not only would it fall behind its competition, they actually risked losing customers and having its server crash with no failover or HA protection. (I even built my own automated log shipping scripts, before it was part of the Enterprise package.)
Sadly, what every true technologist wishes to avoid, there were office politics involved, and it came down to two camps. I was aligned with my manager, the CFO and CEO, while the rest of the hotshots before me aligned with the Chairman and CTO. I just wanted to ensure that from a database perspective, they would follow industry standards and best recommended practices.
So, as clients relied on the system to return client data reports on-demand, this process became painfully slow and even time-out way too often.
After some intense research and analysis, I saw that network packets were being dropped. After looking at the network protocols setup, I saw that ONLY named pipes were enabled. The protocols were a bit nebulous back then, and you had to decide at installation which protocols were installed. I have reading a bit on TCP\IP, and was convinced that this would be the appropriate route seeing that we tried almost everything else. We already knew it was a network communication issue, so I was so confident to try to enable TCP\IP.
By default TCP/IP protocol is not enabled in v.6.5, and had to go through “SQL Setup” from the MSSQL Server v6.5 on the programs menu to install. I had to add the network library for TCP/IP and then after you must restart SQL services for the new settings to take effect.
This process of course, needed full permission and sign off of the executive team, and only half agreed. Meanwhile, the client report generation suffered and suffered.
My manager asked me to put together a short technical summary lending support to my theory, and why we should go with TCP\IP. Of course today, this is a no-brainer, but it wasn’t really apparent back then, especially in the non-technical executive branch. The irony is the CTO was a very influential, yet closed minded person. Rather than work and embrace my theory for the greater good of the company, he wasn’t going to let a young “snot-nosed” DBA interfere with his absolute technical know-how! I basically outlined in my summary, the technical info below:
In a fast local area network (LAN) environment, Transmission Control Protocol/Internet Protocol (TCP/IP) Sockets and Named Pipes clients are comparable in terms of performance. However, the performance difference between the TCP/IP Sockets and Named Pipes clients becomes apparent with slower networks, such as across wide area networks (WANs) or dial-up networks. This is because of the different ways the communication (IPC) mechanisms communicate between peers.
For named pipes, network communications are typically more interactive and very chatty over the network. A peer does not send data until another peer asks for it using a read command. A network read typically involves a series of peek named pipes messages before it begins to read the data. These can be very costly in a slow network and cause excessive network traffic, which in turn affects other network clients.
For TCP/IP Sockets, data transmissions are more streamlined and have less overhead. Data transmissions can also take advantage of TCP/IP Sockets performance enhancement mechanisms such as windowing, delayed acknowledgements, and so on, which can be very beneficial in a slow network. Depending on the type of applications, such performance differences can be significant.
TCP/IP Sockets also support a backlog queue, which can provide a limited smoothing effect compared to named pipes that may lead to pipe busy errors when you are attempting to connect to SQL Server.
In general, sockets are preferred in a slow LAN, WAN, or dial-up network, whereas named pipes can be a better choice when network speed is not the issue, as it offers more functionality, ease of use, and configuration options.
I was approached by the CEO, who after speaking to my manager, decided to throw in his lot with the DBA (me :-), and arranged for an “executive meeting”. During that time, I would go ahead and install and enable TCP\IP, which require a SQL Server service restart. The CEO was also the founder, and since it was his baby, he really had his heart in it, and the most invested.
Now the mission began - one of espionage and intrigue. :-) Like any good spy movie to retrieve the “crown jewel” (We actually dubbed this operation “crown-jewel”), we cut the feed to the cameras, in this case, disable the monitoring system so there would be no alert generated when the SQL service briefly went down. A few minutes into the “executive” meeting, the CEO stepped out of the board room, to give the green light.
I truly felt I was in the middle of a Mission Impossible film, and if I was, the CEOs final words would be, “Your Mission, should you choose to accept, is to install and configure TCP\IP, restart the service and re-enable the monitoring system. Once that’s done, we will contact client X standing by to test the reports. As always, should you or any of your team be caught or killed, the CEO will disavow any knowledge of your actions. Your DBA career will self-destruct in 5-seconds…. Good Luck.”
Well, that’s reassuring! But, I was determined not to let him down. But just in case, I had updated my resume.
Anyway, I began to do the needy, and after all was said and done, the reports ran successfully in less than 30 seconds. This was down over 4 minutes and 30 seconds, from 5 minutes and change. Plus, after multiple tests, there were no network time-outs. Performance was significantly increased. Mission accomplished.
Although the other side wasn’t informed right away, they chalked it up to some “new” code released some 4 days before. Ha! The CEO knew the task was a success, and the clients happy as a clam.
The point here is, sometimes, even a small configuration change can make the world of difference, but stubbornness and inflexibility can stand in the way of what seems to be common sense. Fortunately, it turned out ok, and I stayed a little while longer. The CTO still seemed to undermine me, and I eventually moved on. However, the CEO was grateful, and personally offered his highest recommendation to my next employer.
We now know TCP\IP is the industry standard protocol used in today’s SQL Server versions, especially in e-commerce and communication to remote clients over the internet.
Now that you’ve read my performance story, let’s see yours! Don’t you guys and gals want to win a FREE iPad2?