Press ls -l *.py and enter. What does the shell do for us?

Have you ever wondered what the Unix shell does when you execute a command on the shell? How does the shell understand and interpret these commands? What do you do behind the screen? For example, what does the shell do when we execute ls -l *.py? With this understanding, we can better use the Unix operating system. Today we'll explore it.

0. What is a shell

A shell is usually a command line interface that exposes the services of the operating system to human use or other programs. After the shell starts, the shell usually waits for user input by displaying a prompt. The following figure describes the basic UNIX and Windows shell prompts.

So the shell prompts the user for commands. It's time for users to enter commands. So how does the shell get and interpret the commands entered by the user? To understand this, let's divide them into four steps:

  1. Get and parse user input
  2. Identify the command and its parameters
  3. Find command
  4. Execute command

Now expand in detail:

1. Get and parse user input

For example, if you enter ls -l *.py on the shell and press enter, a function called getline() "declared in #include < stdio. H >, the same below" will be called inside the shell to read the command entered by the user. The command string entered by the user will be used as the standard input stream. Once you press enter, it means the end of a line, and getline() will store the input string in the buffer.

ssize_t getline(char **restrict lineptr, size_t *restrict n, FILE *restrict stream);

Function parameter description:

  • lineptr: buffer
  • n: Buffer size
  • Stream: stream. This is the standard input stream

Now let's look at the code:

char *input_buffer;
size_t b_size;

b_size = 32; // size of the buffer
input_buffer = malloc(sizeof(char) * b_size); // the buffer to store the user input

getline(&input_buffer, &b_size, stdin); // gets the line and stores it in input_buffer

Once the user presses enter, getline() is called to store the string or command entered by the user in input_ In buffer. So now that the shell has obtained user input, what's the next step?

2. Identify the command and its parameters

Now the shell knows that the string you entered is' ls -l *.py '. However, you also need to know which is the command, which is the command parameter, and who will do this? That is the function strtok() "#include < string. H >.

strtok() marks a string as a delimiter, which in this case is a space. So a space tells strtok() that it is the end of a word. So input_ The first tag or word in the buffer is the command (ls), and the remaining words or tags (- l and *. py) are the parameters of the command. Therefore, once the shell marks the strings, it stores them in a variable for later use.

char *strtok(char *restrict str, const char *restrict delim);

Parameter Description:

  • str: string to mark
  • delim: delimiter

The function strtok() takes strings and delimiters as arguments and returns a pointer to the tag string. The specific execution code is as follows:

char *input_buffer, *args, *delim_args, *command_argv[50];
int i;

i = 0;
delim_args = " \t\r\n\v\f"; // the delimeters
args = strtok(input_buffer, delim_args); // stores the token inside args
while (args)
{
 command_argv[i] = args; // stores the token in command_argv
 args = strtok(NULL, delim_args);
 i++;
}
command_argv[i] = NULL; // sets the last entity of command_argv to NULL

command_argv saves the command string as follows:

command_argv[0] = "ls"
command_argv[1] = "-l"
command_argv[2] = "*.py"
command_argv[3] = NULL

All right, command_argv[0] is the command, the others are its parameters, and the last one is NULL, indicating the end of the command. The command string has been disassembled. The next step is to find the command.

3. Find command

The second step already knows that the command to be executed by the user is ls, so where to find this command? The shell goes back to the environment variable PATH, which is the location where the executable commands are stored.

However, a PATH can store more than one PATH:

How to efficiently find ls commands in so many paths? This requires the access() "#include < unistd. H >" function:

int access(const char *pathname, int mode);

Parameter and return value Description:

  • pathname: path of file / executable
  • Mode: mode, we use X_OK to check if the file exists
  • Return value: 0 if the file exists, otherwise - 1
{
 char *path_buff, *path_dup, *paths, *path_env_name, *path[50];
 int i;

 i = 0;
 path_env_name = "PATH";
 path_buff = getenv(path_env_name); /* get the variable of PATH environment */
 path_dup = _strdup(path_buff); /* this function is found below */
 paths = strtok(path_dup, ":"); /* tokenizes it */
 while (paths)
 {
  path[i] = paths;
  paths = strtok(NULL, ":");
  i++;
 }
 path[i] = NULL; /* terminates it with NULL */
}

/**
* _strdup - duplicates a string
* @from: the string to be duplicated
*
* Return: ponter to the duplicated string
*/
char *_strdup(char *from)
{
 int i, len;
 char *dup_str;

 len = _strlen(from) + 1;
 dup_str = malloc(sizeof(int) * len);
 i = 0;

 while (*(from + i) != '\0')
 {
  *(dup_str + i) = *(from + i);
  i++;
 }
 *(dup_str + i) = '\0';

 return (dup_str);
}

The path array in the above code stores all path locations and terminates with NULL. Therefore, you can connect each path location to the command and perform a presence check using the access() function:

{
 char *command_file, *command_path, *path[50];
 int i;

 i = 0;
 command_path = malloc(sizeof(char) * 50);
 while (path[i] != NULL)
 {
  _strcat(path[i], command_file, command_path); /* this function is found below */
  stat_f = access(command_path, X_OK); /* and checks if it exists */
  if (stat_f == 0)
   return (command_path); /* returns the concatenated string if found */

  i++;
 }
 return NULL; /* otherwise returns NULL */
}

/**
* _strcat - concatenates two strings and saves it to a blank string
* @path: the path string
* @command: the command
* @command_path: the string to store the concatenation
*
* Return: Always void
*/
void _strcat(char *path, char *command, char *command_path)
{
 int i, j;

 i = 0;
 j = 0;

 while (*(path + i) != '\0')
 {
  *(command_path + i) = *(path + i);
  i++;
 }
 *(command_path + i) = '/';
 i++;

 while (*(command + j) != '\0')
 {
  *(command_path + i) = *(command + j);
  i++;
  j++;
 }
 *(command_path + i) = '\0';
}

Once the command is found, it returns the full path of the command, otherwise it returns NULL, and the shell displays an error that the command does not exist.

Now if the order is found, then what?

4. Execute command

Once the command is found, it is time to execute it. The problem is how to execute it?

To execute the command, you need to use the function execve() "#include < unistd. H >":

int execve(const char *pathname, char *const argv[],
                  char *const envp[]);

Parameter Description:

  • pathname: the full path of the executable
  • argv: parameters of the command
  • envp: list of environment variables

execve() executes the found command and returns an integer representing the execution result.

But now if the shell just runs execve(), there will be a problem. The execve() call does not return the standard output information, which is not good because the user needs to execute the result. So to solve this problem, the shell executes commands in the subprocess. Therefore, once the execution is completed within the child process, the parent process will receive the signal and the program flow continues. So to execute the command, the shell uses fork() to create a child process. (fork is declared in #include < unistd. H >)

pid_t fork(void);

Fork () creates a new process by copying the calling process. The new process is called a child process. The calling process is called the parent process. fork() returns the process ID of the child process in the parent process and 0 in the child process:

{
 char *command, *command_argv[50], **env;
 pid_t child_pid;
 int status;

 get_each_command_argv(command_argv, input_buffer); /* this function is found below */
 child_pid = fork();
 if (child_pid == -1)
  return (0);

 if (child_pid == 0)
 {
  if (execve(command, command_argv, env) == -1)
   return (0);
 }
 else
  wait(&status);
}

/**
* get_each_command_argv - stores all the arguments \
*             of the input command to the list
* @command_argv: the command argument list
* @input_buffer: the input buffer
*
* Return: Always void
*/
void get_each_command_argv(char **command_argv, char *input_buffer)
{
 char *args, *delim_args;
 int i;

 delim_args = " \t\r\n\v\f";
 args = strtok(input_buffer, delim_args);

 i = 0;
 while (args)
 {
  command_argv[i] = args;
  args = strtok(NULL, delim_args);
  i++;
 }
 command_argv[i] = NULL;
}

The shell uses wait() (the function is declared in #include < sys / wait. H >) to wait for the state change of the child process before the program flow continues, and displays a prompt for the user again.

pid_t wait(int *wstatus);
  • wstatus: is a pointer to an integer that can be used to identify how a child process terminates.

The shell executes commands within the child process, and then waits () for the child process to complete. So the user can get the result of the command and enter another command after the shell displays its prompt.

So finally, when the subprocess completes, the result of ls -l *.py is displayed, and since we have waited for the subprocess to end, this means that the result of the command is given. So now the shell can display its prompt again to wait for user input again. This will continue the loop unless the user types exit.

Posted by ccjob2 on Thu, 02 Dec 2021 22:35:39 -0800