Thank you for your impressive work and the valuable contributions you have made!
I noticed that the way you calculate ray directions differs slightly from NeRF in the following lines:
directions = F.pad( torch.stack( [ (x - self.K[0, 2] + 0.5) / self.K[0, 0], (y - self.K[1, 2] + 0.5) / self.K[1, 1] * self.sign_z, ], dim=-1, ), (0, 1), value=self.sign_z, ) # [H,W,3] torch.Size([800, 800, 3])
Here, adding 0.5 to both x and y implies that the ray is cast from the center of the pixel, which I find quite reasonable. Furthermore, I believe this approach has additional advantages when dealing with datasets of varying resolutions, such as 800×800 and 100×100, as it reduces ambiguity in ray direction calculations.
For instance, using the Mip-NeRF method, every sample point of each ray in a 100×100 image will reappear in an 800×800 resolution image, potentially causing ambiguity. However, with your method, this ambiguity only exists between non-multiplicative resolutions, such as 300×300 and 900×900. In cases where datasets span resolutions from 100×100 to 800×800, the rays do not overlap as much, which naturally decreases ambiguity.
To validate this hypothesis, I conducted an experiment comparing results with and without adding 0.5. Below are the PSNR results:

It seems that PSNR improves significantly when 0.5 is added while calculating ray directions. Do you think this improvement could be attributed to the dataset's resolution setup that I mentioned, or is it more likely due to the intrinsic advantage of starting ray calculations from the pixel center?
I would greatly appreciate your thoughts on this!
Thank you for your impressive work and the valuable contributions you have made!
I noticed that the way you calculate ray directions differs slightly from NeRF in the following lines:
directions = F.pad( torch.stack( [ (x - self.K[0, 2] + 0.5) / self.K[0, 0], (y - self.K[1, 2] + 0.5) / self.K[1, 1] * self.sign_z, ], dim=-1, ), (0, 1), value=self.sign_z, ) # [H,W,3] torch.Size([800, 800, 3])Here, adding 0.5 to both x and y implies that the ray is cast from the center of the pixel, which I find quite reasonable. Furthermore, I believe this approach has additional advantages when dealing with datasets of varying resolutions, such as 800×800 and 100×100, as it reduces ambiguity in ray direction calculations.
For instance, using the Mip-NeRF method, every sample point of each ray in a 100×100 image will reappear in an 800×800 resolution image, potentially causing ambiguity. However, with your method, this ambiguity only exists between non-multiplicative resolutions, such as 300×300 and 900×900. In cases where datasets span resolutions from 100×100 to 800×800, the rays do not overlap as much, which naturally decreases ambiguity.
To validate this hypothesis, I conducted an experiment comparing results with and without adding 0.5. Below are the PSNR results:
It seems that PSNR improves significantly when 0.5 is added while calculating ray directions. Do you think this improvement could be attributed to the dataset's resolution setup that I mentioned, or is it more likely due to the intrinsic advantage of starting ray calculations from the pixel center?
I would greatly appreciate your thoughts on this!