Multi-core servers or why alcoholics get drunk. From the point of view of an IT specialist

This article is a continuation of a series of my articles explaining the nature of the action of the human brain by analogy with the work of computer systems. The whole cycle can be read here. The most important for understanding logic first two articles, “Why are women fools or about the nature of demons” and “Computers, demons and revolutions, a religious problem from the point of view of an IT specialist“.

The purpose of this article is to understand the reasons that encourage people to drink alcohol in volumes that violate their social functions. Why do people drink? The options “because the cattle are drunk” are logical, but they do not describe the nature of the phenomenon. If the phenomenon is massive, if it is widespread, then there are reasons. If it directly affects the brain, then there is some rational meaning from the point of view of computer systems. If a person has some kind of vulnerability, it is necessary to understand its nature, this will help solve the problem of alcoholism, solve the problem at a completely rational, technical level.

Initially, I’ll make a reservation – we consider serviceable servers and physically healthy people. It is clear that when a server has a violation at the hardware level, when it is buggy in power, when interfaces are junk, a storage system, when unsuccessful drivers are installed, there can be any problems with the operating system or version of the software. In the same way, in a case of a mental illness a person can have any problems. We will consider healthy servers and healthy people whose specific workload format can cause problems.

A small announcement, in the next article it will be possible to consider immortality. Yes, we are approaching this issue. So far, very carefully. I will not offer the philosopher’s stone, I will not try to sell miraculous powders. But I will try to show how this issue is being solved at the current level of technology.

And so, multi-core servers. This section may be uninteresting and incomprehensible to humanities and professional physicians. I will try to state as simple as possible, but I need to understand, without this in any way.

Almost every organization uses certain IT resources. One way or another, the organization controls these resources, often it is done by system administrators, computer network administrators, database administrators. The consumers of their work are developers, implementers and operators, people who interact with the business, write, implement and operate the necessary application software. Administrators practically do not interact with end users. As a last resort, technical support will notify them of the problems.

I work as a database administrator. Alexander Mikhailovich K took me to my current job with a very specific task – mutual claims often arose between database administrators and software developers. Database administrators, for their part, see a working server – everything is fine, there is a load. Programmers and operators see the program is working poorly. It was necessary to find a way to understand exactly how MS SQL Server is currently working. Understand exactly what resources he lacks, where is the “bottleneck”. Learn to highlight technical server problems and software problems.
The server is working, we do not know what’s inside. By indirect signs, it is possible to judge problems, the end users also say the same. What is missing from the server, what happens to it?

Users call technical support and complain to management. Managers are often smart and cultured people, but even they taunt as they can — they call, ask, go into the workroom, and communicate via internal electronic communication channels.

At this point, the issue of qualification of the database administrator becomes acute. The easiest way is to restart the service, restart MS SQL Server. MS SQL Server is a smart thing, it’s so easy not to overwhelm it. But every database administrator knows “about four queries that can fill up the server.”

If the qualifications of the database administrator are limited, he stops and starts the MS SQL Server service. Sometimes it reboots the physical server or the entire virtual machine so that the operating system also reboots.

The process is long, often painful. For powerful servers, it takes half an hour or more. At the same time, users are not able to work with the server, a direct negative impact on the business process. When users are hundreds of people, dozens of pieces of equipment are simple, it’s a loss of paid working time, it’s a loss for business, very serious money.

Highly qualified database administrators restart the server in extreme cases. In skilled hands, the server works for months or even years, overloading is performed in connection with technical work (new versions of software, transition to new equipment). Such changes are planned, agreed in advance with the business and do not lead to any financial losses.

Alcoholics have not yet reached, but soon 🙂 An experienced database administrator uses monitoring tools, he sees problems in advance, he understands the cause of the problems and can eliminate this reason in the shortest possible time. Sometimes it’s enough to stop one resource-intensive process to restore normal operation. At a good database administrator, neither users nor superiors are aware of the problems; if they arise, they will be fixed within a few minutes. Here you can recommend my material  “MS SQL Server Optimization“.

The human brain is made up of neurons. They can be direct analogues of server processors. How many processors does the server have?

The latest generation of processors have absolutely incredible features. In particular, the processor Intel® Xeon® E7-8894 v4

has 24 cores. In one case 24 processors. Up to 8 of these processors can be put into the server, that is, the total number of cores can be 24×8 = 192. Given Intel® Hyper-Threading technology, each core can be represented as 2 cores, that is, when this option is enabled, the server sees 384 cores. This is not yet the brain, but it is already a very large structure that is difficult to manage. Kernels and memory are collected in NODS, groups of kernels and memory. Without understanding the physical meaning, just remember this.

Logically, if we have 192 cores, the server will be able to support two hundred simultaneously running processes with a large margin, one core per process. At the same time, several processes can work on one core at the same time, and not all processes are constantly working – we can talk about thousands of users quite comfortably served by the server.

Yes and no. If the server performs simple short queries, that’s all, it works fine, even if there are a lot of requests. The situation changes dramatically when you come across a complex and difficult request.
A heavy request should work for a very long time, this is illogical. Therefore, MS SQL Server uses a parallelization mechanism. That is, having received a certain big task, he divides it into parts. How many parts and from what level of complexity the division takes place is specified by special configuration parameters. If we have complex work, we will not entrust it to one person — we will ask it to do several people, for example, 8. One request involved one CPU core — and suddenly there are already 8 of them.

MS SQL Server machine is smart. When it turns out that the query divided into parts is quite complicated, it tries to divide it as well. 8 threads multiplied by another 8! And if these parts are still complex, he may try to divide them, if there is technical capability.

We have two hundred physical cores, certainly enough for everything! But one request has already gobbled up 64 cores and not the fact that he will calm down on this. And if there are several such requests? And why did our server start to fail? Why are users worried, why are bosses calling? Moreover, these pieces of queries access the same data, impose locks, interfere with each other’s work, use other resources — a data storage system, memory. The normal mechanism for hanging performance leads to a landslide drop to the point that the service is completely inoperative.

At that moment, the negligent database administrator yanks the stop-cock, restarts the service, stops and starts it again. An experienced administrator calculates the process aggressively consuming resources and then, locally, either delete it, or improve the server’s operation with trace flags, or contact the operators and ask them to postpone these jobs for a light load. The request itself is allocated, accelerated by indexes or other means, sent to developers to change the program code.

And another very important point is that in MS SQL Server the previously mentioned nodes (processor and memory groups) are loaded unevenly. With stable operation, new tasks come to the least loaded nodes. Under abnormal conditions, a process similar to the formation of “local boilers” in a nuclear reactor is formed. In emergency mode, in fact, the server crumbles into nodes, which each fight for itself. Do you think that the server has 1 TB of RAM, two hundred cores, and it has 8 times less, chopped into nickels. This situation is infrequent, but it does happen. With a reasonable approach and sufficient qualifications of the database administrator, users simply don’t see all this, they don’t have time to see it.

Well now they finally reached the alcoholics. Socially, when do people drink? In moments of severe emotional turmoil. There was a tragedy, injustice, an event that is difficult to realize, a person drinks. Sometimes even gets drunk. The memoirs mention that during the Second World War, pilots after the fighting could not fall asleep without a small dose of alcohol. WHAT FOR?

When a person experiences a severe emotional upheaval, this is the very difficult transaction that consumed all the server resources. A person thinks about it, worries him, he loses control over his computer system. Almost certainly, the brain crumbles into nodes in the same way that server-local zones of neurons try to process the flow of information; control over these local zones is lost. The system has been peddling.

What does a negligent administrator do in such a situation? Overloading the server! Stop all processes! Make the system inoperable! This is a very SIMPLE way out.

The habit of restarting the server instead of identifying and resolving problems is detrimental. It is SIMPLE to overload, there is no need for high qualifications, no need to sit and understand. Over time, the reasons for rebooting the server for alcoholics are becoming easier, the mosquito has bitten, reason to get drunk, reason to restart the server!

I have a very sad experience observing a person who has this sad disease. A couple of days before binge, a person enters into a state of pre-drunken psychosis. Increased activity, clutching a bunch of cases, claims to everyone.

And here the key moment of vulnerability of the human brain as a computing system is manifested – a person is stressed. As previously noted in my previous articles, under stress, a person becomes extremely active, immoral and stupid. In a state of stress, a person switches to the simplest algorithms, his computing resources drop sharply. The brain has several sections (the brain of the reptile), the interaction between them is disturbed.

Ponder, a complex transaction (experience) leads to a shortage of computing resources and, as a result, a person experiences stress, as a result of which computing resources are sharply reduced, and are falling off. A technical analogue may be overheating of the processor due to which the clock frequency drops. A slightly complicated transaction – the processor warmed up and began to work very slowly. And earlier the computing system was uncontrollable, after completely out of control.

Alcohol – an attempt to return the system to a controlled state. The trouble is that the threshold for applying this vicious practice is constantly decreasing. If earlier this problem arose and the server reboot helped, you must restart again! The database administrator is scolded for such tricks for a long time, then they can be fired. Here, a man is his own boss, he screwed a glass, life is getting better.

In such a situation, the cries of household members only worsen the situation. They increase stress!

That’s why people drink too much in a state of tragic experiences, that’s why they drink too much while experiencing constant stress in life – alcohol is a way to take control of a computer system.

What to do? Understanding a problem is half its solution.

It is necessary .. it is necessary to learn how to manage the server. There were so many cases, a person took up yoga, meditation, began to take religion seriously, stopped drinking. He does not need to thump, he does not need a reboot, he is already able to cope with his server.

The fastest recommendation may be a quick diagnosis of the condition. Do not then fight when you are already pouring into a glass, when you feel the onset of psychosis, when an unhealthy activity has begun in your head. The database administrator sees the problem on the counters, identifies the process, and deletes it. KILL is the process number, and no one interferes with the server. The same person, you feel that you started to sausage, that the goat’s husband / wife is a fool, that they didn’t do repairs for a long time and that it’s not enough money to throw this idea out of my head. “I’ll think about it tomorrow” – I’ll think about it tomorrow.

The traditional shot from an American film – a psychotherapist with a notebook and a pencil, a patient on a couch talks about his sorrows – what are they doing? The patient has serious problems, in his head a transaction that the server cannot process. The therapist helps this transaction either simplify or rollback. Remove the workload that could cause the server to malfunction.

Do sedatives help? Yes and no. Yes, they help. No, this is the same vicious practice by rebooting the server trying to take control of it, the same chemicals but of a different kind.

Learn to manage your server, upgrade the skills of a database administrator!

20200612 Dmitry Gorchilin — Дмитрий Горчилин

linkedin.com

facebook.com

Viber

Post Your Comment Here