home/blog/quine


Self reproducing program in C++

2019-03-04

A quine[1] is a source program, that when compiled and executed, will produce as output an exact copy of its source[2]. Of all languages I know, I am most well acquainted with C++, so I decided to realize a quine in it (only out of curiosity, since quines have little practical significance[3]). I first read about quines in the 1983 Turing Award paper, and as soon as I did, I quickly got down to working on it (albeit, without researching about its rules). This made me come up with the following program:

#include <iostream>
#include <fstream>
using namespace std;

int main()
{

    ifstream f;
    f.open("quine.cpp");

    string line;

    while(getline(f,line))
        cout << line << endl;

    f.close();
    return 0;
}

As I later realized, this obviously wasn't anything profound, so scrounging through the internet, I came across the fact that "stepping outside the program" is considered as cheating. I had to create a program that took no input, and printed itself as an output. The first thing that came to mind was a single `cout` statement.

Code

#include <iostream>
#include <string>
int main()
{
    std::cout << "#include \n#include \nint main()\n{\n\tstd::cout << \"";
    return 0;
}

Output

#include <iostream>
#include <string>
int main()
{
	std::cout << "

But you can only go as far as that. Whatever you write inside the cout within the quotation marks, has to be printed as source itself. So, if you start writing the string #include <ios... , just after the original string ended, by the time you get to the end of the modified string, you will have more characters to add. And when you add those characters, you will reach the end of the string again only to find more characters to add, and so on, it will stretch out till infinity.

Thankfully, C++ helps us out, by allowing us to store strings in memory. What this means, is that instead of writing a string directly into the console, we can first store it in a variable. That variable's declaration, when encountered in code, can use its own content to print itself, thereby solving the recursive nature of the problem. So I came up with the following code:

#include <iostream>
#include <string>
int main()
{
    std::string s = "#include \n#include \nint main()\n{\n\tstd::string s = \"";
    std::string r = "\n\tstd::cout << s << s << \";\n\tstd::string r = \" << r << r;\n\treturn 0;\n}";
    std::cout << s << s << ";\n\tstd::string r = " << r << r;
    return 0;
}

Here string s stores the source code up until its own declaration and string r stores the source code after its own declaration. When the two strings are finally printed using a cout statement, immediately after the printing of the string, the contents of the strings are printed as well.

So this allows us to print the source as well as the statement that prints the source. The only drawback here is that when we are printing for the second time, characters get escaped, even though they shouldn't be. Now, if we remove the use of escape characters, we can have the entire code in one line, but even then, the include requires a newline character, so the escapes will need to be there.

This is where I got stuck and started looking for solutions when I found something[4] called raw strings (I had heard of raw strings in Python but not in C++). These strings help you put the characters _as is_. For example you can actually hit enter in the code, instead of typing in \n. So if you put in raw strings instead of regular strings, there is no difference between writing in code, and displaying on the console.

So finally, a quine I could understand was this:

#include <iostream>
#include <string>
int main()
{
std::string s = R"(#include 
#include 
int main()
{
    std::string s = )";
    std::string r = R"(std::cout << s << "R\"(" << s << ")\";\nstd::string r = R\"(" << r << ")\";\n" << r;})";
    std::cout << s << "R\"(" << s << ")\";\nstd::string r = R\"(" << r << ")\";\n" << r;
}

As I delved deeper into this topic I found some theoretical basis around it, that I couldn't quite wrap my head around[5]. I realized that my method did not have a good foundation, having to depend sheerly upon concepts of my language that I didn't know about.



References - 5

[1] Hofstadter, Douglas _Gödel, Escher, Bach: An Eternal Golden Braid (Ch.13)
[2] Thompson, Ken "Reflections on Trusting Trust" Turing Award Lecture 1984
[3] Stack Overflow - C program that prints its own output
[4] Geeks for Geeks - Raw string literals in C
[5] David Madore - quines