Feed Icon RSS 1.0 XML Feed available

Code Trolling, Deletionism, and String Splitting

Date: 1-Mar-2015/7:12:02-5:00

Tags: , ,

Characters: (none)

Students will often show up on Internet programming sites with a question that amounts to "write my code, please!". Day after day, patient sages give wizened advice along the lines of "asking for help is fine, but you have to show you've at least tried something, and can articulate your confusion and where you got stuck."
Note
When the sages aren't around, the great unwashed on StackOverflow will simply downvote them into oblivion...assuming they'll "get the message". I think the only message they get is "StackOverflow sucks and I hate everyone on it". As someone once told me:
There is something you need to understand about showing people Future Things. You have to be careful. It's a lot like if you are dealing with someone who has never had a grape before. When you give them their first grape you must be 100% sure it's not a sour one...because if it is sour, then every time they're asked if they want a grape after that they will say no.
But even the most patient of sage can't help to think--now and again--about trolling them with some code. It's tempting to slip something to them that appears to work, but has some property that would cue any professor or TA reading to know it wasn't original work...and that the student had basically been pranked.

Code Trolling

As an indulgence of this sinister thought, there was a short-lived tag on the Programming Puzzles and Code Golf StackExchange called "code-trolling". It was a programming challenge to respond to a poor question with the best prank code that a naive asker might actually use. Except since you're doing it on a puzzle site, it would be "o.k." vs (perhaps) getting you a flag smackdown on StackOverflow were you to be cheeky enough to try it.
The premise sounds like it would get old fast, and it did. Writing a good programming challenge is difficult, but writing a short poor question with no code is very easy (as the flow of questions coming into StackOverflow every few nanoseconds shows.)
It was an amusing fad for a week or so. But ultimately it was banned and that's fine by me.
What ISN'T FINE by me is a moderation habit that's become all-too-typical: to take community upvoted content and wipe it from the historical record, based on small backroom discussions no one finds out about (if a discussion happens at all). I have complained about this already, in where all the Code Golf questions from StackOverflow were deleted--regardless of how many upvotes they had. This frustrates me personally as I had linked to those pages for puzzle definitions...now those puzzles are gone!
Note At least I saved my solutions, but what if I hadn't? Under what sort of worldview is this being considered sensible? In addition to spending time on this, I've given presentations at conferences with the puzzles in them. We should be archivists of each others' efforts, not destroyers.
This caused me to think to go back and check up on my one Code Trolling answer, which I spent some time on and thought was rather epic. It was thankfully still there! However, it nearly wasn't :-/ and apparently I have Turion to thank. As an answerer, I received no notification of its pending demise, and these things seem to be decided off-the-cuff:
Code-trolling is in the process of being removed, as per the official stance. This post recieved over 75% "delete" votes on the poll. It does have a large amount of votes on the question and the answers, but it is over 3 months old and no reputation will be lost. Therefore, I am closing this and will delete it in 24 hours. Note that since this is an outlier in that it has a large amount of votes, I'll be happy to undelete and lock given a convincing argument on meta.
@Doorknob, this is not a question to be deleted according to your accepted answer in the linked official stance. It has 44 answers and 21 votes, which is quite popular. As for the poll, I wasn't even aware of such a poll existing until now. I'm not going into spending time on writing another answer on meta pro code-trolling since it's obvious that exactly the meta-users are opposed to code-trolling whereas a sizeable part of codegolf users isn't. Closing this question is an excellent idea, but deleting it is in my opinion unnecessary and unhelpful.
This is important, and I don't know what to do about it. There have been questions about the most effective way to get your questions and answers out. But there's no incentive for StackExchange to make this turnkey, or to give people easy tools to put the authors in control of their content. If they delete it...even if it was your answer, you can't get it back from the database.
I don't know the general solution, but I think people need to be aware. In the meantime, I'll go ahead and archive here my answer for "how to split a string".

Q: How do I split a string??? Help plz?

My homework assignment is take a string and split it into pieces at every new line. I have no idea what to do! Please help!

A: Implementation in the C Language

Tricky problem for a beginning C programming class! First you have to understand a few basics about this complicated subject.
A string is a sequence made up of only characters. This means that in order for programmers to indicate an "invisible" thing (that isn't a space, which counts as a character), you have to use a special sequence of characters somehow to mean that invisible thing.
On Windows, the new line is a sequence of two characters in the string: backslash and n (or the string "\n")
On Linux or OS/X Macs, it is a sequence of four characters: backslash, n, backslash, and then r: (or "\n\r").
*(Interesting historical note: on older Macintoshes it was a different sequence of four characters: "\r\n"... totally backwards from how Unix did things! History takes strange roads.)
It may seem that Linux is more wasteful than Windows, but it's actually a better idea to use a longer sequence. Because Windows uses such a short sequence the C language runtime cannot print out the actual letters \n without using special system calls. You can usually do it on Linux without a system call (it can even print \n\ or \n\q ... anything but \n\r). But since C is meant to be cross platform it enforces the lowest common-denominator. So you'll always be seeing \n in your book.
Note If you're wondering how we're talking about \n without getting newlines every time we do, StackOverflow is written almost entirely in HTML...not C. So it's a lot more modern. Many of these old aspects of C are being addressed by things you might have heard about, like CLANG and LLVM.
But back to what we're working on. Let's imagine a string with three pieces and two newlines, like:
"foo\nbaz\nbar"
You can see the length of that string is 3 + 2 + 3 + 2 + 3 = 13. So you have to make a buffer of length 13 for it, and C programmers always add one to the size of their arrays to be safe. So make your buffer and copy the string into it:
    /* REMEMBER: always add one to your array sizes in C, for safety! */
    char buffer[14];
    strcpy(buffer, "foo\nbaz\nbar");
Now what you have to do is look for that two-character pattern that represents the newline. You aren't allowed to look for just a backslash. Because C is used for string splitting quite a lot, it will give you an error if you try. You can see this if you try writing:
    char pattern[2];
    strcpy(pattern, "\");
Note There is a setting in the compiler for if you are writing a program that just looks for backslashes. But that's extremely uncommon; backslashes are very rarely used, which is why they were chosen for this purpose. We won't turn that switch on.
So let's make the pattern we really want, like this:
char pattern[3];
strcpy(pattern, "\n");
When we want to compare two strings which are of a certain length, we use strncmp. It compares a certain number of characters of a potentially larger string, and tells you whether they match or not. So strncmp("\nA", "\nB", 2) returns 1 (true). This is even though the strings aren't entirely equal over the length of three... but because only two characters are needed to be.
So let's step through our buffer, one character at a time, looking for the two character match to our pattern. Each time we find a two-character sequence of a backslash followed by an n, we'll use the very special system call (or "syscall") putc to put out a special kind of character: ASCII code 10, to get a physical newline.
#include "stdio.h"
#include "string.h"

char buffer[14]; /* actual length 13 */
char pattern[3]; /* actual length 2 */
int i = 0;

int main(int argc, char* argv[]) {
    strcpy(buffer, "foo\nbar\nbaz");
    strcpy(pattern, "\n");

    while (i < strlen(buffer)) {
       if (1 == strncmp(buffer + i, pattern, 2)) {
           /* We matched a backslash char followed by n */
           /* Use syscall for output ASCII 10 */
           putc(10, stdout);
           /* bump index by 2 to skip both backslash and n */
           i += 2;
       } else {
           /* This position didn't match the pattern for a newline */
           /* Print character with printf */
           printf("%c", buffer[i]);
           /* bump index by 1 to go to next matchable position */
           i += 1;
       }
    }

    /* final newline and return 1 for success! */
    putc(10, stdout); 
    return 1;
}
The output of this program is the desired result...the string split!
foo
baz
bar

\t is for \trolling

It's absolutely incorrect from the top to the bottom. Yet it's filled with plausible-sounding nonsense that has scrambled information like what's in the textbook or Wikipedia. Program logic appears transparent in the context of the misinformation, but is completely misleading. Even global variables and returning an error code, for good measure...
Of course, there's only one character in the C string representation of the two-character source literal sequence \n. But making a buffer larger is harmless, as long as strlen() is used to get the actual length to operate on.
I try to convince the reader that strncmp is a boolean operation that either matches (1) or doesn't (0). But it actually has three return values (-1 matching less, 0 for equal, 1 for matching greater). Our two character "pattern" being compared is not [\, n], but rather [\n, \0]...picking up the implicit null terminator. As that sequence slides through the string it will never be greater than a two-character sequence it's compared to...at best it will be zero if there is a terminating newline in the input string.
So all this does is loop through the string and print it one character at a time. The top branch never runs. (Though you could get it to if your string had lower-than \n codes in it, say tab...which could be used to mysteriously omit characters from the output :-P)

Moment of Zen

When people were criticizing the usefulness of Code Trolling, I agreed the tag should probably be banned. Yet I cited this as a good example of how the puzzle form could produce useful teaching exercises, along the lines of "what's wrong with this picture". It's essential to be able to explain what's wrong with the above, if you are to claim to be a C programmer. And as a side benefit, I think it's hilarious.
I'm glad I could save it. But how much else will be lost, or has been? The Internet was based on decentralization...yet we keep trusting our content and identity to centralized institutions that re-emerged from the information democracy. I'll close with some Chomsky:
...power is always illegitimate, unless it proves itself to be legitimate. So the burden of proof is always on those who claim that some authoritarian hierarchic relation is legitimate. If they can't prove it, then it should be dismantled.
Business Card from SXSW
Copyright (c) 2007-2018 hostilefork.com

Project names and graphic designs are All Rights Reserved, unless otherwise noted. Software codebases are governed by licenses included in their distributions. Posts on blog.hostilefork.com are licensed under the Creative Commons BY-NC-SA 4.0 license, and may be excerpted or adapted under the terms of that license for noncommercial purposes.