What happens when you type ls -l*.c in the shell?
Many developers have used the shell for years and probably haven’t looked into the details on how it actually works. Let’s talk about the processes that occur in the background of “the Shell”.
So how does it work?
To start, the shell is a program, written in C language. It provides a Command Prompt that can be used to execute entered commands and perform several administrative functions.
Let’s go over the ls -l *.c command and explain what it does:
- Ls : Prints to the screen a list of files and folders in the current folder the user is located
- -l : Display files and folders in long format. This option prints full folder contents including file permissions, bytes, ownership, file size, and total files and folders.
- *.c: The Ls command will only list those files that have a .c extension. The asterisk (*) is a wildcard, which I will detail on this soon.
If the shell would give us the same command in plain English, it would read something like:
“List me all the files that have a .c extension in this current folder. Also, please list them in long format with full details.”
About wildcards
A wildcard in Linux is a symbol or a set of symbols that any character OR a group of characters in a string. There are several wildcard characters in the shell, but for the purpose of this blog I will focus on asterisk (*) wildcards.
Like I mentioned above, an asterisk (*) — matches one or more occurrences of any character, including no character.
Let’s say we want to see all filenames that start with m, regardless of what the rest of the file name may be. Let’s see what we get:
Now, what if I want to list all files that start with m and have a .c extension? In the long format of course. Let ls -l m*.c do the job for us!
Let’s talk about inner workings.
When the enters the command ls -l *.c through the keyboard, the computer reads it as a buffer. This buffer is obtained through the standard input stream (stdin). Then the shell breaks down this buffer into separated smaller parts before executing them.
The shell does several operations in order to do this process.
Check for aliases, which are shortcuts to reference a command. They can be created to save time and hassle for typing long commands. An example of creating an alias to replace a command such as ls -l m*.c could be:
Alias lsm=’ls -l m*.c’
Here we can quickly get all files with a .c extension that starts with ‘m’ by just typing ‘lsm’.
Check for build-in commands
These internal commands are part of the shell and executed directly in itself. The command may execute either be an internal program of the Shell or an external program that was installed.
Look up the program’s PATH
PATH is an environment variable responsible for the storage of the paths needed for each program installed in the computer.
Environment variables contain information about your login session, stored for the system shell to use when executing commands, available system-wide. They are dynamic named values that affect the running processes on a computer. Many of these are set by default during program installation.
For our example, typing “ls” in the Shell will look for the program named ls in its corresponding folder in the PATH, instead of typing “bin/ls”.
Each environment variable name can be shown in capital letters followed by a ‘=’ sign. For example in our variable PATH, you can see it prints as:
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
The PATH entry contains different folder locations of our programs, which are:
- /usr/local/sbin
- /usr/local/bin
- /usr/sbin
- /usr/bin
- /sbin
- /bin
- /usr/games
- /usr/local/games
Each path entry is separated by semicolons (:) and each of these is composed of one or more folders each by a slash (/). Below are all the environment variables in the shell that can be sprinted on the terminal when you enter the printenv command. What other environment variables can you identify?
$ printenvLC_PAPER=en_US.UTF-8
XDG_SESSION_ID=5
LC_ADDRESS=en_US.UTF-8
LC_MONETARY=en_US.UTF-8
TERM=xterm-256color
SHELL=/bin/bash
SSH_CLIENT=10.0.2.2 53676 22
LC_NUMERIC=en_US.UTF-8
SSH_TTY=/dev/pts/0
USER=vagrant
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
LC_TELEPHONE=en_US.UTF-8
MAIL=/var/mail/vagrant
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
LC_IDENTIFICATION=en_US.UTF-8
PWD=/home/vagrant
LANG=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
PS1=\[\033[01;35m\](\T) \[\033[01;32m\]\u@\h\[\e\033[00m\]:\[\e\033[01;34m\]\W\[\e\033[00m\]$(git_branch)\[\e\033[00m\]\n ⟾
SHLVL=1
HOME=/home/vagrant
LOGNAME=vagrant
SSH_CONNECTION=10.0.2.2 53676 10.0.2.15 22
LESSOPEN=| /usr/bin/lesspipe %s
XDG_RUNTIME_DIR=/run/user/1000
LESSCLOSE=/usr/bin/lesspipe %s %s
LC_TIME=en_US.UTF-8
LC_NAME=en_US.UTF-8
OLDPWD=/home/vagrant/simple_shell
_=/usr/bin/printenv
The complete Shell environment variables
So how does all this actually work?
Every program in C starts with the main() function, which will return its exit code, represented by zero at the very end of the file (or a non-zero value if our program has exited with an error).
The heart of this shell runs inside a while loop, an infinite while loop. Let’s see why it has to run this way.
At first, we have a signal function that is implemented like this:
signal(SIGINT, sig_handler);
This function handles the POSIX SIGINT signal, which represents an interruption. We have the sig_handler as a callback parameter in order to handle the ctrl-C hotkey while the shell prompt is running.
Next, we have the “check if our shell is running interactively and writes a prompt” condition. Listed as:
if (isatty(STDIN_FILENO))
After we have our prompt running coming up next we have the getline() functions, which is the one responsible to wait and collect input from the user while it pauses the loop. After getting the input, getline() will write the buffer (buf) into a line variable.
Tokenize
The next thing we need to do is to separate the line variable value that the user has sent to our program into separated, space-delimited pieces or ‘tokens’. This is done using the strtok() function.
After the command is nicely arranged into tokens, our Shell will iterate through each entry of the system’s environment variables until its respective folder is found. What happens next is the Shell process handler that will create and manage the subprocess to execute our command.
Fork()
The shell process handler starts with the assignment of a return value of the fork() system call. fork() is an operation where a process creates a sub-process copy of itself, and the relationships between two processes are parent-child alike.
The child process starts off with a copy of its parent’s file descriptors. It is important to know that the process identifier (process ID variable) will have the non-zero positive value only for the parent process, while for the child it always will be zero.
The child sub-process will be the one that will attempt to execute the command while the ‘parent’ process will pause and wait for its child to finish executing its process.
execve()
The abbreviation execve stands “execv with environment” as a family of the execv functions. This function will attempt to run the command provided by the command handler. If the command is successful, execve() will not return any value, the command will just be executed. If there is any error, it will return a value of -1 with an appropriate errno. That is why it’s always important to invoke the execve() function with a variable assignment. A nice example of a process that would need to check if a command can be run or not would be:
if (execve("./sub", argv, envp) == -1){
perror("Could not execve"); return 1;}
In this snippet of code, we see the function perror(). This useful tool can print a descriptive error message to the standard output stream (stderr). First the string: “Could not execve” is printed, followed by a colon then space.
When the process is done, it prints the result of the command “ls -l *.c”, which we saw above. Then prints a new prompt at the terminal to take a new command from the user as instructed by the infinite loop.
Exit
The exit command is used to exit the shell where it is currently running. It takes one optional parameter as [N] and exits the shell with a return of status N. If n is not provided, then it simply returns the status of last command that is executed. After pressing enter, the shell will simply close.
And that’s how the cycle completes, thank you for reading.
References: