6.828 Homework 2:Shell - LeeReindeer's blog

这次的作业是实现一个简单的 shell。通过阅读 xv6 book 的第零章，基本上就可以实现题目中所说的三个功能，也是shell的基本功能：运行简单的命令，输入输出重定向，管道。

题目提供的sh.c已经实现了命令的解析（大概看了一下流程，大概就是个递归下降解释器吧，这次作业无需关心这个），只需要完成runcmd函数中缺少的代码即可。

流程

先从 main 函数看起：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


int main(void) {
  static char buf[100];
  int fd, r;

  // Read and run input commands.
  while (getcmd(buf, sizeof(buf)) >= 0) {
    if (buf[0] == 'c' && buf[1] == 'd' && buf[2] == ' ') {
      // Clumsy but will have to do for now.
      // Chdir has no effect on the parent if run in the child.
      buf[strlen(buf) - 1] = 0; // chop \n
      if (chdir(buf + 3) < 0)
        fprintf(stderr, "cannot cd %s\n", buf + 3);
      continue;
    }
    if (fork1() == 0)
      runcmd(parsecmd(buf));
    wait(&r);
  }
  exit(0);
}

main 函数循环调用 getcmd从 console 读取命令，对 cd 命令做了特别处理。之后fork创建子进程，调用parsecmd解析命令，再调用runcmd执行得到的命令。最后调用wait等待子进程结束。main 的基本流程就这样。

parsecmd递归调用一系列的解析函数，先解析优先级高的（<，>和|）。cmd 分为三类：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


struct execcmd {
  int type;            // ' '
  char *argv[MAXARGS]; // arguments to the command to be exec-ed
};

struct redircmd {
  int type;        // < or >
  struct cmd *cmd; // the command to be run (e.g., an execcmd)
  char *file;      // the input/output file
  int flags;       // flags for open() indicating read or write
  int fd;          // the file descriptor number to use for the file
};

struct pipecmd {
  int type;          // |
  struct cmd *left;  // left side of pipe
  struct cmd *right; // right side of pipe
};

runcmd根据 cmd 的类型进行分支处理，需要去完成的核心功能也在这个函数中。

System interface

平时对shell的命令执行习以为常，却不曾想背后的实现。

在了解shell的工作原理之前，需要对几个system call有所了解：fork，exec。

Fork: creates a new process(child process), with exactly the same memory contents as the parent.

Exec: replaces the calling process’s memory but preserves its file table.

还有file descriptor ，如果你写过 socket 的代码，应该对这个词不陌生。

File descriptor: a small integer representing a kernel-managed object that a process may read from or write to.

可以理解为File descriptor是对输入输出的抽象，调用的进程无需知道是从终端，文件还是 pipe 输入或输出的。

运行基本的命令

关键的一步是在这里调用execv执行命令，所谓的“执行命令”就是运行命令所指向的可执行文件，这个进程的内存会被改变，但是它的file table不会变。

但是我们不能在主进程直接运行命令，需要fork，可以发现在main函数中已经帮我们做了。所以最简单觉得实现就是这样：

1

execv(ecmd->argv[0], ecmd->argv);

第一个参数是命令所在的执行文件的绝对路径。比如调用ls时，需要输入/bin/ls。每次都这样输入很麻烦，而且不一定每个命令都在/bin 下面。那么我们平时使用的 shell 是怎么实现的呢？

PATH

改进

使用getenv("PATH")可以得到PATH，对其进行分割存在数组中，遍历数组，把路径和命令名进行组合，如果那个路径下存在那个命令，执行。

有两个问题：如何分割？如何检测可以执行？

两个问题的答案分蘖对应两个函数：strtok_r，access。

1

int access(const char *pathname, int mode);

mode 可以是F_OK，R_OK，W_OK，X_OK，分别检查文件是否存在，当前进程是否有权限读，写。执行该文件。mode 也可以是它们的按位或。

所以我们可以调用acess(abs_path, F_OK | X_OK) 来检查该路径下是否可以执行。

最终改进的版本：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


  case ' ':
    ecmd = (struct execcmd *)cmd;
    if (ecmd->argv[0] == 0)
      _exit(0);

    if (access(ecmd->argv[0], F_OK) == 0) { // check current dir
      execv(ecmd->argv[0], ecmd->argv);
    } else {
      const char *path = ecmd->argv[0];
      char *str1, *token, *saveptr1;
      int i;
      char *env_path = getenv("PATH");
      char *env_paths[100];
      for (i = 0, str1 = env_path;; i++, str1 = NULL) {
        token = strtok_r(str1, ":", &saveptr1);
        if (token == NULL)
          break;
        env_paths[i] = token;
      }
      int find = 0;
      char *abs_path; // evn_path[i] + "/" + path;
      int j;
      for (j = 0; j < i && find == 0; j++) {
        abs_path = (char *)malloc(strlen(env_paths[j]) + strlen(path) + 1);
        strcpy(abs_path, env_paths[j]);
        strcat(abs_path, "/");
        strcat(abs_path, path);
        if (access(abs_path, F_OK | X_OK) == 0) {
          find = 1;
          execv(abs_path, ecmd->argv);
        }
        free(abs_path);
      }
      if (!find)
        fprintf(stderr, "Command not find: %s\n", ecmd->argv[0]);
    }
    break;

I/O Redirection

在看懂 xv6 book 的第10页之后实现，输入输出重定向变得很简单：fork之后关闭之前的 fd，打开需要输出或输出的 fd。

这里需要用到的 System call是 open 和 close。

而且也不需要对 < 和 > 做分别的处理。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


struct cmd *redircmd(struct cmd *subcmd, char *file, int type) {
  struct redircmd *cmd;

  cmd = malloc(sizeof(*cmd));
  memset(cmd, 0, sizeof(*cmd));
  cmd->type = type;
  cmd->cmd = subcmd;
  cmd->file = file;
  cmd->flags = (type == '<') ? O_RDONLY : O_WRONLY | O_CREAT | O_TRUNC;
  cmd->fd = (type == '<') ? 0 : 1; // stdin : stdout
  return (struct cmd *)cmd;
}

当输入的命令重定向了输入，如：cat < t.sh。那么rcmd->fd为0，rcmd->flags为O_RDONLY。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


  case '>':
  case '<':
    rcmd = (struct redircmd *)cmd;
    errno = 0;
    close(rcmd->fd);
    rcmd->fd = open(rcmd->file, rcmd->flags, 0644);//注意第三个参数是权限
    if (rcmd->fd == -1) {
      fprintf(stderr, "Error: %s", strerror(errno));
    }
    runcmd(rcmd->cmd);
    break;

调用close(rcmd->fd)会关闭标准输入（0），此时最小可用的fd变为0,。接下来调用open打开t.sh文件，它的fd为0。这样一来这个子进程的输入就重定向为文件了。

Pipe

接下来时管道的实现，也是需要调用 System call：pipe 和 dup，创建用于进程通信的管道。

pipe() creates a pipe, a unidirectional data channel that can be used for interprocess

communication. The array pipefd is used to return two file descriptors refer‐ ring to the ends of the pipe. pipefd[0] refers to the read end of the pipe. pipefd[1] refers to the write end of the pipe. Data written to the write end of the pipe is buffered by the kernel until it is read from the read end of the pipe.

The dup() system call creates a copy of the file descriptor oldfd, using the lowest- numbered unused file descriptor for the new descriptor.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


  //...
  int p[2], r;
  //...
  case '|':
    pcmd = (struct pipecmd *)cmd;
    pipe(p);
    if (fork1() == 0) {
      close(1);
      dup(p[1]); // use 1, write left cmd's output to p[1];
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->left);
    } else{
      close(0);
      dup(p[0]); // read right cmd's input from p[0]
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->right);
    }
    /**
     * The wait() system call suspends execution of the calling
     * process  until one  of its children terminates.  The call wait(&wstatus)
     * is equivalent to: waitpid(-1, &wstatus, 0);
     */
    wait(&r);
    break;

首先在子进程中执行|的左端命令，写入p[1]。父进程从中p[0]读入，执行右端命令。最终左端命令的输出通过管道连接到了右端命令的输入。

~~这种方法，经我测试有时会出现多打印了一个“6.828$”提示符。~~ 在 mit 的讲义里发现了what if sh didn't fork for pcmd->right? [try it] would user-visible behavior change? sleep 10 | echo hi，这样做的结果就是打印了hi， shell 看上去却没有sleep 10。所以这种实现是错误的。

所以，看 xv6 的实现：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


  case '|':
    pcmd = (struct pipecmd *)cmd;
    pipe(p);
    if (fork1() == 0) {
      close(1);
      dup(p[1]); // write left cmd's output to p[1];
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->left);
    }
    if (fork1() == 0) {
      close(0);
      dup(p[0]); // read right cmd's input from p[0]
      close(p[0]);
      close(p[1]);
      runcmd(pcmd->right);
    }
    close(p[0]);
    close(p[1]);
    wait(&r);
    wait(&r);
    break;

fork 了两次，创建了两个子进程，使用pipe管理它们之前的通信，最后也wait两次，等待两个子进程都结束，当然这里可以递归的创建多个子进程。使用这种方便则不会出现多打一个提示符的情况。（为什么我也不知道啊-.-）

执行a.out < t.sh，a.out是shell自己的执行文件。

rsh

Contents

流程

System interface

运行基本的命令

改进

I/O Redirection

Pipe