Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PyTensor deterministic operations as observations #7656

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

wd60622
Copy link
Contributor

@wd60622 wd60622 commented Jan 24, 2025

Description

Will need some help on this implementation and how to best test this.

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc--7656.org.readthedocs.build/en/7656/

@github-actions github-actions bot added the bug label Jan 24, 2025
pymc/data.py Outdated Show resolved Hide resolved
pymc/data.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Member

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

@ricardoV94
Copy link
Member

I don't consider this a bugfix, it's behavior that was explicitly forbidden for conservative reasons (more like a NotImplementedError)

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

Where does this happen in the code?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

I think the MiniBatch tests will still fail. Any thoughts on how that should behave?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

Also, feel free to choose a better title! I couldn't express it too well

scale = 12
scaled_target = target / scale
mu = pm.Normal("mu", mu=0, sigma=1)
pm.Normal("x", mu=mu, sigma=1, observed=scaled_target)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I sample this to check that it has the correct data in the InferenceData?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we have more direct ways of testing it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well maybe. Just make sure to do a cheap sampling, since we don't care about draws at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the "extract_..." function directly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a more direct unit test, this would be a CI, we don't care how it's done just want to be sure the data is there in the end?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want some sampling here to check the outputs? This test is not testing anything explicitly at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add sampling to this test when I can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 25, 2025

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

Where does this happen in the code?

obs_data = extract_obs_data(aux_obs)

def extract_obs_data(x: TensorVariable) -> np.ndarray:

We should be able to just use constant_fold (also in pytensor) for it

@ricardoV94 ricardoV94 changed the title Support ops to pm.Data in observed variables Support pytensor deterministic operations as observations Jan 25, 2025
@ricardoV94 ricardoV94 changed the title Support pytensor deterministic operations as observations Support PyTensor deterministic operations as observations Jan 25, 2025
@wd60622
Copy link
Contributor Author

wd60622 commented Jan 25, 2025

We should be able to just use constant_fold (also in pytensor) for it

I have an implementation before this suggestion. I need a little help understanding this. Feel free to put suggestion

@ricardoV94
Copy link
Member

We should be able to just use constant_fold (also in pytensor) for it

I have an implementation before this suggestion. I need a little help understanding this. Feel free to put suggestion

What do you mean? My suggestion is to simply call constant_fold on the observed variable

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

I am not able to make this work with either constant_fold(x) or constant_fold([x]). Both return errors. Is the constant_fold from pymc.pytensorf or from pytensor?

@ricardoV94
Copy link
Member

I am not able to make this work with either constant_fold(x) or constant_fold([x]). Both return errors. Is the constant_fold from pymc.pytensorf or from pytensor?

From pymc.pytensorf. There's a raise_if_not_constant flag you can set, but what cases is it failing?

@ricardoV94
Copy link
Member

Ah if it's a SharedVariable like pm.Data of course constant_fold is not going to work... Dummy me.

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 28, 2025

from pytensor.compile.mode import Mode
import pymc as pm

with pm.Model(coords={"date": [0, 1, 2]}) as m:
    data = pm.Data("data", [0, 1, 2], dims="date")
    x = pm.Normal("x")
    y = pm.Normal("y", x, observed=data, dims="date")

cheap_eval_mode = Mode(linker="py", optimizer=None)
m.rvs_to_values[y].eval(mode=cheap_eval_mode)
# array([0., 1., 2.])

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

Thanks for the suggestion. I've used mode on the random variable directly. Is it different to get the rvs_to_values from the model? it currently isn't accessed from the function at the moment.

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

Could you point me to similar tests for this?

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 28, 2025

I've just added to associated pytensorf test suite

pymc/data.py Outdated Show resolved Hide resolved
@wd60622
Copy link
Contributor Author

wd60622 commented Jan 29, 2025

There are some failing tests related to the Mini batch also using the is_valid_observed function as well. How should those be handled. Are those test tests no longer valid?

@ricardoV94
Copy link
Member

There are some failing tests related to the Mini batch also using the is_valid_observed function as well. How should those be handled. Are those test tests no longer valid?

Yeah sounds like the test should be changed to the more strict kind of stuff that we still don't allow, like having a pt.random.normal in it.

@wd60622
Copy link
Contributor Author

wd60622 commented Jan 29, 2025

Looks like a random test failure?

Will address rest of feedback when I have chance

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 29, 2025

Looks like a random test failure?

Yeah that one is flaky, we should probably try/except on that error and just skip when it happens

@cetagostini
Copy link

Are we far from merge here?

@wd60622
Copy link
Contributor Author

wd60622 commented Feb 8, 2025

Hi @ricardoV94
Any modifications or additional tests needed for this?

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.65%. Comparing base (358b825) to head (2269bd6).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7656      +/-   ##
==========================================
- Coverage   92.70%   92.65%   -0.06%     
==========================================
  Files         107      107              
  Lines       18391    18327      -64     
==========================================
- Hits        17050    16981      -69     
- Misses       1341     1346       +5     
Files with missing lines Coverage Δ
pymc/data.py 84.39% <100.00%> (-4.83%) ⬇️
pymc/pytensorf.py 89.76% <100.00%> (-0.91%) ⬇️

... and 1 file with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: data as observed in RV
3 participants