Standardizing the Same Game test

General questions, debates, and rants about RPGs

Moderator: Moderators

Post Reply
User avatar
Leress
Prince
Posts: 2770
Joined: Fri Mar 07, 2008 7:54 pm

Standardizing the Same Game test

Post by Leress »

fliprushman wrote: I do pay attention to these test despite the validity. One thing that I have noticed is what people set up as the encounters. Each person does something different for the challenges. I think that these test need a common set of challenges that uses several different builds, not just the more powerful or optimized ones. Maybe take the character through a written adventure solo and see if the character could survive each encounter 50% and record those results. I think the results would have a little more validity and show off what is actually weak in the class and how it truly compares to the other classes.
This person does have a point, multiple people have done Same game test with varying testing parameters.
  • Squirrelloid: 1,4, 7, etc until it stop being competitive with and average non optimized build. Using monsters on home turf
  • Bigode: 5, 10, 15 with standard build with some utilitarian items and a couple of mundane items and mostly pure number boosters. Also uses monsters on home turf. For each challenge does x out of 3.
  • Leress: 10 with slightly optimized build with pure number boosters rarely with any utilitarian items. Using monsters on home turf.
These are just a few ways that the Same game test has been done. I would like a more guidelines in using the Same Game test to help make it more unified in it's results and data gathering. I would suggest.
  • Test levels 5, 7, 10, and 15 since these seem like the most reasonable places to look for cracks in the classes.
  • 30 point buy
  • Pure number boosters (belts of strength)
  • Few Utilitarian items (sap, torch, flour)
  • x out of 3 per challenge of that level to determine effectiveness
  • Using the combat tactics of the creature
  • Using the Environment they are usually said that they live. If "any" then use a dungeon.
  • Feats from at least PHB, MM1, and Book X**
  • Falling into a pit trap is considered a loss. (It doesn't matter if they take no damage from it)


Doing this may help with some parts of the discussing about the game when using the same game test.*

*I am talking about mostly with Pathfinder discussions.
**If it is put into a supplement then the feats that are in that book as well.

I know this won't solve all the discussion problem when using the Same Game test, but it will make it easier for those who use it go through the data a little faster.
Last edited by Leress on Wed May 21, 2008 3:31 pm, edited 4 times in total.
User avatar
virgil
King
Posts: 6339
Joined: Fri Mar 07, 2008 7:54 pm

Post by virgil »

Now if only you can get the people to stop screaming about how solo playtests mean nothing.

(paraphrase) "Because characters can be made stupidly, and thus fail all tests, performance in solo is meaningless."...*grumble*
Come see Sprockets & Serials
How do you confuse a barbarian?
Put a greatsword a maul and a greataxe in a room and ask them to take their pick
EXPLOSIVE RUNES!
User avatar
Leress
Prince
Posts: 2770
Joined: Fri Mar 07, 2008 7:54 pm

Post by Leress »

virgileso wrote:Now if only you can get the people to stop screaming about how solo playtests mean nothing.

(paraphrase) "Because characters can be made stupidly, and thus fail all tests, performance in solo is meaningless."...*grumble*
Well, there really is no real helping them. Since apparently it looks like they think that not using the very stick that that Pathfinder is using is bad. Yes running the class in an actual adventure is good and valid but it doesn't really show a class is by itself.

Do you have any suggestions on what to add to/ adjust on the list?

Because I will be damned if I do a full playtest at each level when the chances of even a slight change are slim to none. It's a bloody alpha this is the best time to beat the ever loving shit out of it.
UmaroVI
Journeyman
Posts: 116
Joined: Sat May 17, 2008 2:05 am

Post by UmaroVI »

The "ideal" way to test, I think, would be to make a default party of 3 characters of good but straightforward classes/builds (in the sense of - not dependent on having specific other characters in the group or on highly skilled playing), and then throw them up against challenges hard enough that they aren't going to always win on their own (probably EL +4), then run them through the exact same challenges with a varying 4th party member, and see what happens. Include the caveat that spellcasters vary memorized spells based on the 4th member, but they don't change the rest of their build.

Of course, this takes more time per test and requires more tests due to more variables, so I'm not sure it's feasible.
Username17
Serious Badass
Posts: 29894
Joined: Fri Mar 07, 2008 7:54 pm

Post by Username17 »

Calibrating such a 4th man test would also be next to impossible. To set up the initial 3 characters you'd have to already have a set of 3 characters who you had some reason to believe filled their end of the balance bargain as a whole. Meaning that you'd basically first have to run up some large number of 4 man tests and have them come out balanced and then run an even larger number of 4 man tests varying the 4th man in order to demonstrate somehow that the original 4 man tests weren't being scewed by a strangely performing fixed 3-man group.

In order to even begin to calibrate a 3 man group you'd need at least 5 classes to play with. Then you'd take those five classes and make 5 different 4 man groups out of them and then run each of them through sufficient numbers of Same Game challenges that you could do statistics on the results (a minimum of say 100 regressions each so you could get percentages). Then you'd take the challenges and take the percentage shifts caused by the variant team makeup to estimate a percentage contribution by each member, and then hope that one of the arrangements of 3 men happens to add up to a 75% contribution between them. And if none of them do, or the numbers don't line up in a sensible way that allows you to get consistent numbers for some of the classes, you have to increase the numbers of classes involved (with 6 classes to worry about you need to go through the trials 15 times, with 7 you need to do it 35 times).

That's... possible, I just don't want to do it. No one else wants to do it either, which is why we use single character tests. It's slightly less accurate I suppose, at least in theory. But doing more regressions makes the prospect more accurate, and if you take say 25% as long to do a test you can regress four times as many run throughs, which makes up the accuracy you'd have lost many times over.

Statistics isn't an exact science, it's an approximate science. And you aren't going to be able to get answers if you aren't willing to make the kinds of approximations that allow you to get answers.

-Username17
UmaroVI
Journeyman
Posts: 116
Joined: Sat May 17, 2008 2:05 am

Post by UmaroVI »

Well, I agree that what you describe would be incredibly hard to do, but would also be the best way to test it. The question, I guess, is whether the conclusions would be stronger (enough stronger to be worth the trouble) than the single-character tests, if one did a more limited form of the 4th-man test that stuck with a fixed 3-man team plus a variable fourth member.

To be honest, having read the Paizo forums, I think that if God Almighty descended from the heavens with a chorus of angels trumpeting that melee was underpowered compared to casters starting at about 7th level, it wouldn't change a damn thing, and hence I hesitate to actually try to run testing on the level that would firmly establish anything.

As far as the last statement goes, a lot of people seem to be suggesting the hypothesis that Fighters are a "team player" who can only be effective as part of a team, whereas casters are better in solo encounters. I don't think this is true at all, but it is consistent with the data once you remove common sense and logic from the equation, and running a lot of tests to establish that casters do much better than melee as solo adventurers doesn't really disprove it.

I'm totally with you on the "it's possible, but probably not worth the trouble" though. Like I said, I don't think Paizo or most of their population at the forums would care a whole lot regardless of how accurate the tests are. Still, I think 4th man tests would be interesting to run, at least on a conceptual level.
Last edited by UmaroVI on Sun May 18, 2008 8:28 pm, edited 1 time in total.
Username17
Serious Badass
Posts: 29894
Joined: Fri Mar 07, 2008 7:54 pm

Post by Username17 »

Umarov wrote:Well, I agree that what you describe would be incredibly hard to do, but would also be the best way to test it.
No. It would be the method which has the highest theoretical upper end of accuracy. That's not the same as being the best way to test things. The best way to test is the way that gives you usable data with the least amount of work.

Or to put things in very blunt terms: I can see if things are statistically close to 50% in less than 50 tests. So in the 500+ tests that it would take the above method to even have a chance to be able to start, I could run all 11 classes through a baseline same game challenge and get a very good idea on how balanced they all were. And since we're talking single character tests, we'd get through each of the tests more quickly.

Then instead of then (assuming that the numbers happened to throw up a usable 3-man template on the first go) moving on to doing the actual tests for each of the classes, we'd already have a list of which classes we were reasonably confident were underperforming, too good, catastrophically weak, or ungodly powerful and we could go on to testing at other levels or proposing rule changes and retesting the effects.

You can't just try to get Planck accuracy on everything you do, you'll never finish. You need to design experiments in such a manner that you can get meaningful results with the labor pool you have in a reasonable amount of time. Nothing is ever certain. Science does not, and can not give you absolute answers. The trick is figuring out how much certainty you need and putting in enough resources to get that much certainty without taking the rest of your resources and lighting them on fire chasing mirages.

-Username17
UmaroVI
Journeyman
Posts: 116
Joined: Sat May 17, 2008 2:05 am

Post by UmaroVI »

I should probably have said "ideal" instead of "best."

You raise a very good point about the difficulties inherent in the 4th man test, I now agree that it's too intensive to be done with a sufficient degree of accuracy to be worthwhile, as compared to just a lot of work to do.

It is unfortunate that there's no good way to test the effectiveness of "force-multiplier" characters, or those claimed to work as such. Characters like this are always tricky to playtest, and I've noticed that in D&D they tend to be impressively unbalanced. Stuff like the Marshal (subpar except as a quick dip), Healer (godawful), White Raven Tactics (badly overpowered for how easy it is to get), etc. I guess this says a lot about why such things are hard to evaluate.
Draco_Argentum
Duke
Posts: 2434
Joined: Fri Mar 07, 2008 7:54 pm

Post by Draco_Argentum »

I tend to think force multiplier classes are a bad idea. They scale very poorly with the number of players in the group. They don't work well if the group gets split. They are highly susceptible to death spirals since their power drops considerably when the group starts to get KOed.



Leress is right. There should be a standard same game test. Otherwise its hardly the same game. Frank actually did ask what people wanted to be the standard challenges for TNE. Unfortunately there didn't seem to be a lot of agreement.
User avatar
Leress
Prince
Posts: 2770
Joined: Fri Mar 07, 2008 7:54 pm

Post by Leress »

Another thing is about feat selection. Since some feats give new abilities and others just give bonuses to what ever. What would be the proper pool of feats to choose?

I would suggest at least PHB and MM1.
Last edited by Leress on Tue May 20, 2008 3:04 am, edited 1 time in total.
User avatar
Leress
Prince
Posts: 2770
Joined: Fri Mar 07, 2008 7:54 pm

Post by Leress »

Same Game Party Test

*Being Revised*
Last edited by Leress on Wed May 21, 2008 10:42 pm, edited 1 time in total.
UmaroVI
Journeyman
Posts: 116
Joined: Sat May 17, 2008 2:05 am

Post by UmaroVI »

I'm not sure that really works. If a 5th level fighter has a 25% chance of beating an animated object, and a 5th level barbarian has a 25% chance of beating an animated object, both of them working together most likely have better than a 50% chance, but that seems to be what you're saying.
User avatar
Leress
Prince
Posts: 2770
Joined: Fri Mar 07, 2008 7:54 pm

Post by Leress »

UmaroVI wrote:I'm not sure that really works. If a 5th level fighter has a 25% chance of beating an animated object, and a 5th level barbarian has a 25% chance of beating an animated object, both of them working together most likely have better than a 50% chance, but that seems to be what you're saying.
That is not exactly what I am saying so, that means I need to re-write the post.
User avatar
Bigode
Duke
Posts: 2246
Joined: Fri Mar 07, 2008 7:54 pm

Re: Standardizing the Same Game test

Post by Bigode »

Leress wrote:
  • Test levels 5, 7, 10, and 15 since these seem like the most reasonable places to look for cracks in the classes.
  • 30 point buy
  • Pure number boosters (belts of strength)
  • Few Utilitarian items (sap, torch, flour)
  • x out of 3 per challenge of that level to determine effectiveness
  • Using the combat tactics of the creature
  • Using the Environment they are usually said that they live. If "any" then use a dungeon.
  • Feats from at least PHB, MM1, and Book X**
  • Falling into a pit trap is considered a loss. (It doesn't matter if they take no damage from it)
What springs to mind's mostly: we'd need an EL 7 list (yeah, I know it can be hacked up by doubling EL 5 stuff - not sure if it's the best), and "disabling" traps with one's own face's something that people really do, willingly or not, and it really does save resources when people wind up whole on the other side, so it should be as much a win as any other,
Draco_Argentum wrote:I tend to think force multiplier classes are a bad idea. They scale very poorly with the number of players in the group. They don't work well if the group gets split. They are highly susceptible to death spirals since their power drops considerably when the group starts to get KOed.
The thing I'm thinking about that's to ensure classes concepted as force multipliers have single-person, bigger boosts on call: since threat's scaled down to smaller groups, the fact that a single-person effect is giving the bigger benefit to a bigger share of the party pays back the loss in whole-party benefit.
Hans Freyer, s.b.u.h. wrote:A manly, a bold tone prevails in history. He who has the grip has the booty.
Huston Smith wrote:Life gives us no view of the whole. We see only snatches here and there, (...)
brotherfrancis75 wrote:Perhaps you imagine that Ayn Rand is our friend? And the Mont Pelerin Society? No, those are but the more subtle versions of the Bolshevik Communist Revolution you imagine you reject. (...) FOX NEWS IS ALSO COMMUNIST!
LDSChristian wrote:True. I do wonder which is worse: killing so many people like Hitler did or denying Christ 3 times like Peter did.
User avatar
Leress
Prince
Posts: 2770
Joined: Fri Mar 07, 2008 7:54 pm

Re: Standardizing the Same Game test

Post by Leress »

Bigode wrote:
Leress wrote:
  • Test levels 5, 7, 10, and 15 since these seem like the most reasonable places to look for cracks in the classes.
  • 30 point buy
  • Pure number boosters (belts of strength)
  • Few Utilitarian items (sap, torch, flour)
  • x out of 3 per challenge of that level to determine effectiveness
  • Using the combat tactics of the creature
  • Using the Environment they are usually said that they live. If "any" then use a dungeon.
  • Feats from at least PHB, MM1, and Book X**
  • Falling into a pit trap is considered a loss. (It doesn't matter if they take no damage from it)
What springs to mind's mostly: we'd need an EL 7 list (yeah, I know it can be hacked up by doubling EL 5 stuff - not sure if it's the best), and "disabling" traps with one's own face's something that people really do, willingly or not, and it really does save resources when people wind up whole on the other side, so it should be as much a win as any other,
This is what Squirrelloid has as a EL 7 Challenge
  • 40 ft spiked pit trap with a proximity trigger Fireball (8d6) at the bottom.
  • A Chimera
  • A Succubus
  • An Huge Air Elemental
  • A Lillend
  • A Spectre
  • A pair of Achaierai
  • A pair of Green Hags
  • Six Chokers
  • An Elf Wizard 7
User avatar
Talisman
Duke
Posts: 1109
Joined: Fri Mar 07, 2008 7:54 pm
Location: The Cliffs of Insanity!

Re: Standardizing the Same Game test

Post by Talisman »

Leress wrote:Test levels 5, 7, 10, and 15 since these seem like the most reasonable places to look for cracks in the classes.
Is there a reason super-low-level (say, 1-3) is left off here? Most D&D games I've played in start at level 1 or 3, and it'd suck to either (1) be Captain Awesome for two levels, then find out you suck, or (2) labor through a few levels of relative uselessness before the fun kicks in.

In a similar vein, why nothing higher than 15th? I'm guessing it's because 9th-level spells are gained at level 17, at which point the game starts to break apart like the Enterprise at Warp 10.
MartinHarper wrote:Babies are difficult to acquire in comparison to other sources of nutrition.
User avatar
JonSetanta
King
Posts: 5525
Joined: Fri Mar 07, 2008 7:54 pm
Location: interbutts

Post by JonSetanta »

I've found 1 to 2 extremely similar but for most classes there is a leap in power at 3-4. That's also, hopefully, where most of the best low-level adventures begin.
(Almost) no one wants their carefully backstoried character to die to ratbites in some Faerun sewer within the first half hour just because the game began at L1.
Perhaps that's why both 4e and the (sad) d20 WoD began at a pseudo-level-2 and 4 respectably.
So, yeah, 3 is fine to me for beginning an analysis.
The Adventurer's Almanac wrote:
Fri Oct 01, 2021 10:25 pm
Nobody gives a flying fuck about Tordek and Regdar.
Draco_Argentum
Duke
Posts: 2434
Joined: Fri Mar 07, 2008 7:54 pm

Re: Standardizing the Same Game test

Post by Draco_Argentum »

Bigode wrote:What springs to mind's mostly: we'd need an EL 7 list (yeah, I know it can be hacked up by doubling EL 5 stuff - not sure if it's the best),
Its not, doing that would skew the challenges towards larger numbers of enemies. In D&D you sometimes do face a single tough dude.



Buff bots whose buffs get better as they effect fewer people is potentially workable. It would still need more than the same game challenge to balance it. They would by definition not have the same power in a group as they would solo so rubric behind the SGC wouldn't apply.
User avatar
Bigode
Duke
Posts: 2246
Joined: Fri Mar 07, 2008 7:54 pm

Post by Bigode »

Idea of the day: how about cutting the crap and deeming the necromancer a dread (Heroes of Horror) one? Not that I'd want to sidestep making a stat block for, for example, the dragon. Also, a thing I'd thought (a little) about was writing down the terrain for each battle. Thoughts?
Hans Freyer, s.b.u.h. wrote:A manly, a bold tone prevails in history. He who has the grip has the booty.
Huston Smith wrote:Life gives us no view of the whole. We see only snatches here and there, (...)
brotherfrancis75 wrote:Perhaps you imagine that Ayn Rand is our friend? And the Mont Pelerin Society? No, those are but the more subtle versions of the Bolshevik Communist Revolution you imagine you reject. (...) FOX NEWS IS ALSO COMMUNIST!
LDSChristian wrote:True. I do wonder which is worse: killing so many people like Hitler did or denying Christ 3 times like Peter did.
fliprushman
1st Level
Posts: 30
Joined: Sat Sep 27, 2008 6:05 am
Location: Pacific, WA

Post by fliprushman »

I'm surprised to see that this made it here. I would be game to try these tests again but I need a little help on them. The last time I ran them, I realized that there were other options to explore because there is no standardize build of a charater. The only thing I was able to standardized was attributes at the elite progression which puts all characters on a equal footing and feats/spells/skills could only come from the PHB. So if we could come up with a standardized build as well as an adventure for all the characters to go through, these test could work. If one standardize build is not able to be completed, how about coming up three areas of builds, a underpowered one, an average build, and a optimized one. Just some idea to help with the tests.
Username17
Serious Badass
Posts: 29894
Joined: Fri Mar 07, 2008 7:54 pm

Post by Username17 »

The last time I ran them, I realized that there were other options to explore because there is no standardize build of a charater.
The Same Game test can't show you that you can't make a balanced character. But it can show you that you can make a balanced character.

Any build you try could happen to be really weak. So underperforming at the challenge doesn't give you certainty. If a build you try out blows the challenge away though, that does indicate that at least one available build is too powerful - and that comes with a fair degree of certainty.

-Username17
User avatar
Bigode
Duke
Posts: 2246
Joined: Fri Mar 07, 2008 7:54 pm

Post by Bigode »

There's no standardized build. The thing actually worth talking about's how to standardize the individual encounters. And forget about adventures; not only they'll vary a lot anyway, they'll also highly subject to whim in practice - way more than how fights play out.
Hans Freyer, s.b.u.h. wrote:A manly, a bold tone prevails in history. He who has the grip has the booty.
Huston Smith wrote:Life gives us no view of the whole. We see only snatches here and there, (...)
brotherfrancis75 wrote:Perhaps you imagine that Ayn Rand is our friend? And the Mont Pelerin Society? No, those are but the more subtle versions of the Bolshevik Communist Revolution you imagine you reject. (...) FOX NEWS IS ALSO COMMUNIST!
LDSChristian wrote:True. I do wonder which is worse: killing so many people like Hitler did or denying Christ 3 times like Peter did.
Post Reply