Differences

This shows you the differences between two versions of the page.

--- utilitarianism [2025/02/21 16:05] – ultracomfy
+++ utilitarianism [2025/04/09 20:29] (current) – ultracomfy
@@ Line 1: / Line 1: @@
+~~Title:Utilitarianism~~
 ====== Utilitarianism ======
 The way some (!) artificial intelligence works is by reward functions. Essentially, the neural network gets a problem to solve, responds with an action and then we give a "score" back to it. Say, bowling. Each pin gives +10, missing the throw is -50, and by letting it repeat and repeat it will try out all kinds of angles and speeds and spins until it settles into solution that gives it maximum points.
-I believe with humans it's the same. Humans constantly train their neural networks (learn) and pain and suffering are a conscious expression of reward mechanisms being triggered in your brain. All pain is aimed at discouraging you from attempting that strategy again, and all pleasure is meant to encourage you to do the same thing again.
+I believe with humans it's the same. Humans constantly train their neural networks (learn) and pain and suffering are a conscious expression of reward mechanisms being triggered in your brain. All pain is aimed at discouraging you from doing what brought you here //again//, and all pleasure is meant to encourage you to repeat the thing that lead to you experiencing reward.
 Inside the human brain, you could, in theory, boil everything down into the result of a reward function, and that reward function is comprised of the many little things that human brains consider to be  conducive to their survival and the things they consider to be deducive to their survival. Snakes and getting bit are deducive and therefore humans, even babies, are naturally averse to snakes and feel pain (ie. the reduction of a number in the reward function) where it takes conscious effort and training to overcome that fear (though it's easier to do with babies who have only what little fear of snakes can be coded into dna as opposed to real life negative experiences).
@@ Line 11: / Line 12: @@
 The problem with this, and why it's called a local optimum, is because trying to nudge the AI to a better solution will always require the AI to try things it will, initially not be as good as as the thing it's already doing. For example, if you tried to very strongly "encourage" the AI to throw into a particular direction then with its skillset and strategies it will not be able to score as many points as it would be able to do with its current strategies and skillset. This is why "delayed gratification" is such a big buzzword, doing your homework or quitting drugs is extremely painful from a reward function perspective and since we naturally gravitate towards maximizing our reward function the easiest thing to do is to just not do homework or to keep doing your drugs.
-This is the same reason why hitting your child or talking down to is does NOT work. It doesn't do anything. If a kid just keeps doing a thing you would rather it did not then it's because it found a local optimum with the strategies it has available to maximize its reward function ('gratification'). Unless you make that particular spot of the reward curve a massive pluging black hole that is so immediately punishing that even the territories around that behavior are less painful that what it's doing right now then you're not helping (and if you do that, you're just a traumatizing monster). What you need to do is to find where on the reward curve the kid is, which strategies it uses to get there, then shape the world around you in a way such that an immediately neighboring strategy is more conducive to its reward function than the current one, because then it will automatically gravitate towards that. That way you can slowly guide it into new strategies. Basically, to reach an even higher peak you first have to descent from your current mountain, and you cannot slap a kid until it falls off the mountain.
+This is the same reason why hitting your child or talking down to is does NOT work. It doesn't do anything. If a kid just keeps doing a thing you would rather it did not then it's because it found a local optimum with the strategies it has available to maximize its reward function ('gratification'). Unless you make that particular spot of the reward curve a massive pluging black hole that is so immediately punishing that even the territories around that behavior are less painful that what it's doing right now then you're not helping (and if you do that, you're just a traumatizing monster). The kid will just do what is literally the next closest thing to what it's currently doing - NOT what you want it to do.
+What you need to do instead is to find where on the reward curve the kid is, which strategies it uses to get there, then shape the world around you in a way such that an immediately neighboring strategy is more conducive to its reward function than the current one, because then it will automatically gravitate towards that. That way you can slowly guide it into new strategies. Basically, to reach an even higher peak you first have to descent from your current mountain, and you cannot slap a kid until it falls off the mountain.
 https://www.youtube.com/watch?v=EWjUY_3ubf4