Support PyTensor deterministic operations as observations #7656

wd60622 · 2025-01-24T20:16:41Z

Description

Will need some help on this implementation and how to best test this.

Related Issue

Closes BUG: data as observed in RV #7649
Related to #

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7656.org.readthedocs.build/en/7656/

pymc/data.py

ricardoV94 · 2025-01-25T13:08:56Z

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

ricardoV94 · 2025-01-25T13:09:54Z

I don't consider this a bugfix, it's behavior that was explicitly forbidden for conservative reasons (more like a NotImplementedError)

wd60622 · 2025-01-25T15:31:41Z

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

Where does this happen in the code?

wd60622 · 2025-01-25T15:32:16Z

I think the MiniBatch tests will still fail. Any thoughts on how that should behave?

wd60622 · 2025-01-25T15:33:11Z

Also, feel free to choose a better title! I couldn't express it too well

wd60622 · 2025-01-25T15:54:40Z

tests/test_data.py

+        scale = 12
+        scaled_target = target / scale
+        mu = pm.Normal("mu", mu=0, sigma=1)
+        pm.Normal("x", mu=mu, sigma=1, observed=scaled_target)


Should I sample this to check that it has the correct data in the InferenceData?

No, we have more direct ways of testing it

Well maybe. Just make sure to do a cheap sampling, since we don't care about draws at all?

I've tested the "extract_..." function directly

Yeah that's a more direct unit test, this would be a CI, we don't care how it's done just want to be sure the data is there in the end?

I still want some sampling here to check the outputs? This test is not testing anything explicitly at the moment

I will add sampling to this test when I can

ricardoV94 · 2025-01-25T16:25:44Z

We need to tell InferenceData converter how to get the data values to put in constant_data, and observed_data. It had some hardcode logic assuming you could only due Casts

Where does this happen in the code?

pymc/pymc/backends/arviz.py

Line 63 in fa43eba

obs_data = extract_obs_data(aux_obs)

pymc/pymc/pytensorf.py

Line 152 in fa43eba

def extract_obs_data(x: TensorVariable) -> np.ndarray:

We should be able to just use constant_fold (also in pytensor) for it

wd60622 · 2025-01-25T21:03:07Z

We should be able to just use constant_fold (also in pytensor) for it

I have an implementation before this suggestion. I need a little help understanding this. Feel free to put suggestion

ricardoV94 · 2025-01-26T18:27:15Z

We should be able to just use constant_fold (also in pytensor) for it

I have an implementation before this suggestion. I need a little help understanding this. Feel free to put suggestion

What do you mean? My suggestion is to simply call constant_fold on the observed variable

wd60622 · 2025-01-28T12:57:59Z

I am not able to make this work with either constant_fold(x) or constant_fold([x]). Both return errors. Is the constant_fold from pymc.pytensorf or from pytensor?

ricardoV94 · 2025-01-28T12:59:10Z

I am not able to make this work with either constant_fold(x) or constant_fold([x]). Both return errors. Is the constant_fold from pymc.pytensorf or from pytensor?

From pymc.pytensorf. There's a raise_if_not_constant flag you can set, but what cases is it failing?

ricardoV94 · 2025-01-28T13:01:01Z

Ah if it's a SharedVariable like pm.Data of course constant_fold is not going to work... Dummy me.

ricardoV94 · 2025-01-28T13:02:21Z

from pytensor.compile.mode import Mode
import pymc as pm

with pm.Model(coords={"date": [0, 1, 2]}) as m:
    data = pm.Data("data", [0, 1, 2], dims="date")
    x = pm.Normal("x")
    y = pm.Normal("y", x, observed=data, dims="date")

cheap_eval_mode = Mode(linker="py", optimizer=None)
m.rvs_to_values[y].eval(mode=cheap_eval_mode)
# array([0., 1., 2.])

wd60622 · 2025-01-28T16:53:31Z

Thanks for the suggestion. I've used mode on the random variable directly. Is it different to get the rvs_to_values from the model? it currently isn't accessed from the function at the moment.

wd60622 · 2025-01-28T16:54:51Z

Could you point me to similar tests for this?

wd60622 · 2025-01-28T17:06:44Z

I've just added to associated pytensorf test suite

pymc/data.py

pymc/pytensorf.py

wd60622 · 2025-01-29T07:43:24Z

There are some failing tests related to the Mini batch also using the is_valid_observed function as well. How should those be handled. Are those test tests no longer valid?

ricardoV94 · 2025-01-29T14:31:56Z

There are some failing tests related to the Mini batch also using the is_valid_observed function as well. How should those be handled. Are those test tests no longer valid?

Yeah sounds like the test should be changed to the more strict kind of stuff that we still don't allow, like having a pt.random.normal in it.

tests/test_data.py

wd60622 · 2025-01-29T20:28:43Z

Looks like a random test failure?

Will address rest of feedback when I have chance

ricardoV94 · 2025-01-29T22:12:18Z

Looks like a random test failure?

Yeah that one is flaky, we should probably try/except on that error and just skip when it happens

cetagostini · 2025-02-01T23:57:20Z

Are we far from merge here?

wd60622 · 2025-02-08T11:14:35Z

Hi @ricardoV94
Any modifications or additional tests needed for this?

codecov · 2025-02-08T11:44:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.65%. Comparing base (358b825) to head (2269bd6).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7656      +/-   ##
==========================================
- Coverage   92.70%   92.65%   -0.06%     
==========================================
  Files         107      107              
  Lines       18391    18327      -64     
==========================================
- Hits        17050    16981      -69     
- Misses       1341     1346       +5

Files with missing lines	Coverage Δ
pymc/data.py	`84.39% <100.00%> (-4.83%)`	⬇️
pymc/pytensorf.py	`89.76% <100.00%> (-0.91%)`	⬇️

... and 1 file with indirect coverage changes

add a test case

e3f66d9

github-actions bot added the bug label Jan 24, 2025

wd60622 added 2 commits January 24, 2025 21:50

check for random ancestors as well

7b39d7a

check for inputs first

b36e573

ricardoV94 reviewed Jan 25, 2025

View reviewed changes

pymc/data.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jan 25, 2025

View reviewed changes

pymc/data.py Outdated Show resolved Hide resolved

ricardoV94 added enhancements and removed bug labels Jan 25, 2025

use existing function

7f71397

eval for the observed_data group

b182686

wd60622 commented Jan 25, 2025

View reviewed changes

ricardoV94 changed the title ~~Support ops to pm.Data in observed variables~~ Support pytensor deterministic operations as observations Jan 25, 2025

ricardoV94 changed the title ~~Support pytensor deterministic operations as observations~~ Support PyTensor deterministic operations as observations Jan 25, 2025

specify the mode

b2a3d1e

add test to extract function

acd22f3

ricardoV94 reviewed Jan 28, 2025

View reviewed changes

pymc/data.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jan 28, 2025

View reviewed changes

pymc/pytensorf.py Show resolved Hide resolved

wd60622 added 2 commits January 29, 2025 08:34

simplify and remove helper function

56ad68b

check for variable having inputvars

6b410f3

allowing for minibatch of pytensor operations

2d753b3

ricardoV94 reviewed Jan 29, 2025

View reviewed changes

tests/test_data.py Show resolved Hide resolved

cetagostini mentioned this pull request Feb 2, 2025

Out-of-Box MultiDimensional MMM pymc-labs/pymc-marketing#1036

Merged

13 tasks

wd60622 added 2 commits February 8, 2025 12:09

case that doesnt work now

f1be187

change the error message

77f7384

ricardoV94 added the model label Feb 10, 2025

wd60622 and others added 2 commits February 12, 2025 19:22

Merge branch 'main' into data-as-observed

89561ec

add sample to check observed_data

2269bd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PyTensor deterministic operations as observations #7656

Support PyTensor deterministic operations as observations #7656

wd60622 commented Jan 24, 2025 •

edited

Loading

ricardoV94 commented Jan 25, 2025

ricardoV94 commented Jan 25, 2025

wd60622 commented Jan 25, 2025

wd60622 commented Jan 25, 2025

wd60622 commented Jan 25, 2025

wd60622 Jan 25, 2025

ricardoV94 Jan 25, 2025

ricardoV94 Jan 28, 2025

wd60622 Jan 29, 2025

ricardoV94 Jan 29, 2025

ricardoV94 Feb 10, 2025

wd60622 Feb 10, 2025

wd60622 Feb 12, 2025

ricardoV94 commented Jan 25, 2025 •

edited

Loading

wd60622 commented Jan 25, 2025

ricardoV94 commented Jan 26, 2025

wd60622 commented Jan 28, 2025 •

edited

Loading

ricardoV94 commented Jan 28, 2025

ricardoV94 commented Jan 28, 2025

ricardoV94 commented Jan 28, 2025 •

edited

Loading

wd60622 commented Jan 28, 2025

wd60622 commented Jan 28, 2025

wd60622 commented Jan 28, 2025

wd60622 commented Jan 29, 2025

ricardoV94 commented Jan 29, 2025

wd60622 commented Jan 29, 2025

ricardoV94 commented Jan 29, 2025 •

edited

Loading

cetagostini commented Feb 1, 2025

wd60622 commented Feb 8, 2025

codecov bot commented Feb 8, 2025 •

edited

Loading

Support PyTensor deterministic operations as observations #7656

Are you sure you want to change the base?

Support PyTensor deterministic operations as observations #7656

Conversation

wd60622 commented Jan 24, 2025 • edited Loading

Description

Related Issue

Checklist

Type of change

ricardoV94 commented Jan 25, 2025

ricardoV94 commented Jan 25, 2025

wd60622 commented Jan 25, 2025

wd60622 commented Jan 25, 2025

wd60622 commented Jan 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ricardoV94 commented Jan 25, 2025 • edited Loading

wd60622 commented Jan 25, 2025

ricardoV94 commented Jan 26, 2025

wd60622 commented Jan 28, 2025 • edited Loading

ricardoV94 commented Jan 28, 2025

ricardoV94 commented Jan 28, 2025

ricardoV94 commented Jan 28, 2025 • edited Loading

wd60622 commented Jan 28, 2025

wd60622 commented Jan 28, 2025

wd60622 commented Jan 28, 2025

wd60622 commented Jan 29, 2025

ricardoV94 commented Jan 29, 2025

wd60622 commented Jan 29, 2025

ricardoV94 commented Jan 29, 2025 • edited Loading

cetagostini commented Feb 1, 2025

wd60622 commented Feb 8, 2025

codecov bot commented Feb 8, 2025 • edited Loading

Codecov Report

wd60622 commented Jan 24, 2025 •

edited

Loading

ricardoV94 commented Jan 25, 2025 •

edited

Loading

wd60622 commented Jan 28, 2025 •

edited

Loading

ricardoV94 commented Jan 28, 2025 •

edited

Loading

ricardoV94 commented Jan 29, 2025 •

edited

Loading

codecov bot commented Feb 8, 2025 •

edited

Loading