Compiler question - what are 'side effects'?

Community Forums/General Help/Compiler question - what are 'side effects'?

col	(Posted 2014) [#1]

Hiya all,

Within the world of the compilers and the semanting phase I've seen mentions of expressions/functions having possible side effects and I'm trying to fully and completely understand what these are. I've done the usual googling and found some resources that relate 'side effects' as say a global variable ( declared outside of a function ) being altered inside a function, or in the case of cpp 'b = ++a' which I see b is assigned to the value of a with the side effect of a being incremented first, but I don't understand how this matters during compilation/semanting :/
So, using BlitzMax as the source language is it possible for these 'side effects' to be of concern?, if so could someone show an example with an explanation please?
Would it be something like say converting an rvalue of type Int into a type of String so as to allow a valid expression for an assign statement which has an lvalue of type String?

Many thanks!

Yasha

(Posted 2014) [#2]

A "side effect" means that the function "does" something, has a lasting "effect" on the world as a result of being called. It mutates some global resource - but that doesn't just mean global variables, it means any global resources: files, streams, buffers, objects passed in by reference, etc.

The alternative to a side-effecting function is a "pure" function, so called because it is pure in the mathematical sense of simply mapping input values to a returned value. Pure functions do not touch anything outside of their own internal variables, and do not read anything other than their parameters. Pure functions by definition do not manipulate BlitzMax-style objects since a referenced object is a global resource, but they can potentially read an object's fields.

Side effects aren't really important to the compiler, so much as their absence: if you can prove there are no side effects and that a function is pure you can do a lot of useful things:

-- you can usually inline it much more easily
-- you can remove dead code more easily
-- you can evaluate any pure functions called with constant values at compile-time, and treat the results as constant because you know the value will never be affected by anything else that happens
-- you can partially inline or specialize pure functions called with semiconstant arguments
-- you can automatically parallelize/multithread it as much as you like
-- you can try to work out the underlying algorithm and rewrite the actual logic to something more efficient

... whereas a side-effecting function leaves the world in a different state after it runs, and you can't get rid of it or change it very easily at all because if you move or remove it, the changes to the world won't have happened, or won't have happened in the right order.

BlitzMax is an inherently effectful language because it's based on the notion of a series of instructions that modify the world, so this analysis is only vaguely useful because almost everything will be effectful and it's almost easier to just assume everything has a side effect. Pure functions are mostly interesting in mathematically-focused languages like Haskell and ML (where the program is more like an operation mapping input to output, than an instruction stream), where variables are usually immutable. Such languages are also usually more reliant on the compiler to produce efficient code at all, whereas BlitzMax programs are usually straightforward enough not to need these things.

Note however that functional languages also usually operate at a higher abstraction level and don't make things like memory allocation visible to the user, so your example would usually be considered "pure" even though it allocates a string object behind the scenes. Since BlitzMax also has the option to work at the level of raw memory, it's near impossible for anything in BlitzMax to be pure that isn't limited to integers and floats only. So what specifically counts as an effect is also to an extent determined by the language; ones that completely hide all low-level memory work from the user have a great deal more freedom to rewrite code dealing with objects (usually don't draw a distinction between objects and values at all).

col	(Posted 2014) [#3]

Thanks Yasha, it's very easy for me to understand the way you have explained them.

So with that in mind, out of curiosity, does the same analogy apply to Objects? if you have well designed OO objects that have a well designed interface of methods that only work on the objects private member variables ( and the private variables are not accessible from outside the object, so you would have setters/getters ) - how would such an object be classified ? Can it be classified in the same way?

Yasha

(Posted 2014) [#4]

....yeeeees but probably not in the way you're looking for.

I think to have a meaningful equivalent for objects you'd need the object itself to be immutable (all fields const), which guarantees that every call to its methods doesn't involve changing state in the form of setting or updating fields of the object. If there are "setters", then the system is not pure: something gets changed.

There are so-called OOP systems that work this way (here's one courtesy of the great Oleg: http://okmij.org/ftp/Scheme/#pure-oo ), but since they don't allow objects to be updated, the code you would end up writing is very different from idiomatic BlitzMax: "update"/"modification" is achieved by returning a new, slightly different, copy of an object; there are no setters as normally understood.

col	(Posted 2014) [#5]

Yes, I was thinking of the same analogy applying at the 'object level'. In a sense the same as you have a 'pure function' not effecting anything outside of itself, if you scale the analogy up to the 'object level' whereby only fields/variables within itself are altered and no 'outside' code/objects are effected by any code within it, giving a 'pure object' in essence.

Does such a system not qualify because each method/function alters variables outside of itself ( therefore breaking the concept ), even though its still within the object? Or is it called something else?

Yasha

(Posted 2014) [#6]

Does such a system not qualify because each method/function alters variables outside of itself ( therefore breaking the concept ), even though its still within the object? Or is it called something else?

Depends on one big question. Do you allow sharing of resources/references? (At the semantic level: whether you implement it with pointers is immaterial.)

In the normal OO model, if you passed an object as a parameter to a setter and it was added as a property of the main object, that parameter is still (potentially) referenced from outside. It can still be internally modified by calls to its own methods, which means that changes can happen without the knowledge of the "owning" object. This system is not "pure" in the above sense, as the same method call could have different results depending upon what happened to an invisible value in between.

If however you do something different, things get interesting.

Firstly, if all setters copy (or implicitly copy) their arguments - imagine passing every struct by value in C++ - then whatever gets passed in is separated from the outside world and any external references are cut off from it. This system does have the potential to be pure because running a setter is implicitly the same as returning a copied, modified owner object (methods could implicitly reassign to the same lvalue they were called on; rvalues can't call setters). You could build a normal language using an OO system a bit like Oleg's in this way.

Secondly, and even more interestingly: if anything passed as an argument stops existing in its original scope, you have an enforced single-ownership system called "linear typing". Not only does this potentially allow for purity, it also completely removes the need for any form of GC, because the lifetime of every object can be predicted by the compiler. (Some things in this system get a bit weird, e.g. You have to have a "swap" operator to switch the values of a variable and an array slot or you'd get an array with holes in it.) Linear types a very interesting to people developing new system languages at the moment because of this property of allowing for perfect memory management without a GC.

Purity in the functional sense is also called "referential transparency". You can call whatever you want "pure", but I don't think it's ever the same thing in OO; there just happen to be some systems (like Oleg's) that are isomorphic to a functional representation. Most aren't.

To actually answer the question I think what you described is more just "good design". Without some additional constraint though it provides little useful information to the compiler (e.g. Smalltalk, Lua both already do this).

Yue	(Posted 2014) [#7]

The feeling is the same when you see playing two great champions of chess, understand nothing but I think it's wonderful to see sintelectuales do talk about a topic, and you always learn something new.

col	(Posted 2014) [#8]

It's strange that you mention garbage collection as that was where my thoughts were going, and you've already briefly explained what would need to happen to get it work reliably without a GC.

Thankyou Yasha for the little lecture, it's been very interesting. I'll leave things here for the mean time and carry on with my compiler before I end up wondering off on a tangent from my original project goal :D

I remember reading sometime ago that you are CS undergraduate, are you still studying or have you now graduated and moved on?

Yasha

(Posted 2014) [#9]

If you're actually interested in alternative memory management techniques, another one to look into (unrelated to the above) is "region inference". Unlike linear typing it has the advantage of working with "normal" program code, although you need to tweak it a bit to prevent space leaks (the original ML version has a few critical flaws, but people have it working for Java now - for RTJ that doesn't allow for a GC - so it must be more or less fixed).

I did a CS postgraduate degree a while back. Still thinking about going back for a PhD. Finally now working on a paper of my own. etc.

col	(Posted 2014) [#10]

I am interested in memory management that doesn't require the use of a 'traditional' GC mechanism, especially one that fits naturally into a multi-processor environment.

I wish you all the best with the thesis. I know of a few friends who are going for a PhD ( not CS related ) and although they really enjoy it they say it is long hard work.