Skip to content

Commit 72a9deb

Browse files
committed
refactoring
Updating README passing all the tests renamed to read_file() return base path when all else fails add few more tests
1 parent f1f9e2a commit 72a9deb

9 files changed

Lines changed: 786 additions & 64 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ Your Python dependencies can be packaged as .py files, .zip archives (containing
124124
Your entry point script will define logic using the `Client` object which wraps data access layers.
125125

126126
You should only need the following methods:
127+
* `read_file(file_name)` - Returns a file handle of the provided file_name
127128
* `read_dlo(name)` – Read from a Data Lake Object by name
128129
* `read_dmo(name)` – Read from a Data Model Object by name
129130
* `write_to_dlo(name, spark_dataframe, write_mode)` – Write to a Data Model Object by name with a Spark dataframe

docs/file_reader_refactoring.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# DefaultFileReader Class Refactoring
2+
3+
## Overview
4+
5+
The `DefaultFileReader` class has been refactored to improve testability, readability, and maintainability. This document outlines the changes made and how to use the new implementation.
6+
7+
## Key Improvements
8+
9+
### 1. **Separation of Concerns**
10+
- **File path resolution** is now handled by dedicated methods
11+
- **File opening** is separated from path resolution
12+
- **Configuration management** is centralized and configurable
13+
14+
### 2. **Enhanced Testability**
15+
- **Dependency injection** through constructor parameters
16+
- **Mockable methods** for unit testing
17+
- **Clear interfaces** between different responsibilities
18+
- **Comprehensive test coverage** with isolated test cases
19+
20+
### 3. **Better Error Handling**
21+
- **Custom exception hierarchy** for different error types
22+
- **Descriptive error messages** with context
23+
- **Proper exception chaining** for debugging
24+
25+
### 4. **Improved Configuration**
26+
- **Configurable defaults** that can be overridden
27+
- **Environment-specific settings** support
28+
- **Clear configuration contract**
29+
30+
### 5. **Enhanced Readability**
31+
- **Comprehensive docstrings** for all methods
32+
- **Clear method names** that describe their purpose
33+
- **Logical method organization** from public to private
34+
- **Type hints** throughout the codebase
35+
36+
## Class Structure
37+
38+
### DefaultFileReader
39+
The main class that provides the file reading framework:
40+
41+
```python
42+
class DefaultFileReader(BaseDataAccessLayer):
43+
# Configuration constants
44+
DEFAULT_CODE_PACKAGE = 'payload'
45+
DEFAULT_FILE_FOLDER = 'files'
46+
DEFAULT_CONFIG_FILE = 'config.json'
47+
48+
def __init__(self, code_package=None, file_folder=None, config_file=None):
49+
# Initialize with custom or default configuration
50+
51+
def file_open(self, file_name: str) -> io.TextIOWrapper:
52+
# Main public method for opening files
53+
54+
def get_search_locations(self) -> list[Path]:
55+
# Get all possible search locations
56+
```
57+
58+
## Exception Hierarchy
59+
60+
```python
61+
FileReaderError (base)
62+
├── FileNotFoundError (file not found in any location)
63+
└── FileAccessError (permission, I/O errors, etc.)
64+
```
65+
66+
## Usage Examples
67+
68+
### Basic Usage
69+
```python
70+
from datacustomcode.file.reader.default import DefaultFileReader
71+
72+
# Use default configuration
73+
reader = DefaultFileReader()
74+
with reader.file_open('data.csv') as f:
75+
content = f.read()
76+
```
77+
78+
### Custom Configuration
79+
```python
80+
from datacustomcode.file.reader.default import DefaultFileReader
81+
82+
# Custom configuration
83+
reader = DefaultFileReader(
84+
code_package='my_package',
85+
file_folder='data',
86+
config_file='settings.json'
87+
)
88+
```
89+
90+
### Error Handling
91+
```python
92+
try:
93+
with reader.file_open('data.csv') as f:
94+
content = f.read()
95+
except FileNotFoundError as e:
96+
print(f"File not found: {e}")
97+
except FileAccessError as e:
98+
print(f"Access error: {e}")
99+
```
100+
101+
## File Resolution Strategy
102+
103+
The file reader uses a two-tier search strategy:
104+
105+
1. **Primary Location**: `{code_package}/{file_folder}/{filename}`
106+
2. **Fallback Location**: `{config_file_parent}/{file_folder}/{filename}`
107+
108+
This allows for flexible deployment scenarios where files might be in different locations depending on the environment.
109+
110+
## Testing
111+
112+
### Unit Tests
113+
The refactored class includes comprehensive unit tests covering:
114+
- Configuration initialization
115+
- File path resolution
116+
- Error handling scenarios
117+
- File opening operations
118+
- Search location determination
119+
120+
### Mocking
121+
The class is designed for easy mocking in tests:
122+
```python
123+
from unittest.mock import patch
124+
125+
with patch('DefaultFileReader._resolve_file_path') as mock_resolve:
126+
mock_resolve.return_value = Path('/test/file.txt')
127+
# Test file opening logic
128+
```
129+
130+
### Integration Tests
131+
Integration tests verify the complete file resolution and opening flow using temporary directories and real file operations.
132+
133+
## Migration Guide
134+
135+
### From Old Implementation
136+
The old implementation had these issues:
137+
- Hardcoded configuration values
138+
- Mixed responsibilities in single methods
139+
- Limited error handling
140+
- Difficult to test
141+
142+
### To New Implementation
143+
1. **Update imports**: Use `DefaultFileReader` from `datacustomcode.file.reader.default`
144+
2. **Error handling**: Catch specific exceptions instead of generic ones
145+
3. **Configuration**: Use constructor parameters for custom settings
146+
4. **Testing**: Leverage the new mockable methods
147+
148+
## Benefits
149+
150+
### For Developers
151+
- **Easier debugging** with clear error messages
152+
- **Better IDE support** with type hints and docstrings
153+
- **Simplified testing** with dependency injection
154+
- **Clearer code structure** with separated responsibilities
155+
156+
### For Maintainers
157+
- **Easier to extend** with new file resolution strategies
158+
- **Better error tracking** with custom exception types
159+
- **Improved test coverage** with isolated test cases
160+
- **Clearer documentation** with comprehensive docstrings
161+
162+
### For Users
163+
- **More reliable** with proper error handling
164+
- **More flexible** with configurable behavior
165+
- **Better debugging** with descriptive error messages
166+
- **Consistent interface** across different implementations
167+
168+
## Future Enhancements
169+
170+
The refactored structure makes it easy to add:
171+
- **Additional file resolution strategies** (URLs, cloud storage, etc.)
172+
- **File format detection** and automatic handling
173+
- **Caching mechanisms** for frequently accessed files
174+
- **Async file operations** for better performance
175+
- **File validation** and integrity checking
176+
177+
## Conclusion
178+
179+
The refactored `DefaultFileReader` class provides a solid foundation for file reading operations while maintaining backward compatibility. The improvements in testability, readability, and maintainability make it easier to develop, test, and maintain file reading functionality in the Data Cloud Custom Code SDK.

src/datacustomcode/client.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,22 @@
1414
# limitations under the License.
1515
from __future__ import annotations
1616

17-
import io
18-
1917
from enum import Enum
2018
from typing import (
2119
TYPE_CHECKING,
2220
ClassVar,
2321
Optional,
2422
)
23+
2524
from pyspark.sql import SparkSession
25+
2626
from datacustomcode.config import SparkConfig, config
27+
from datacustomcode.file.reader.default import DefaultFileReader
2728
from datacustomcode.io.reader.base import BaseDataCloudReader
28-
from datacustomcode.file.reader.base import BaseFileReader
2929

3030
if TYPE_CHECKING:
31+
import io
32+
3133
from pyspark.sql import DataFrame as PySparkDataFrame
3234

3335
from datacustomcode.io.reader.base import BaseDataCloudReader
@@ -113,7 +115,7 @@ class Client:
113115
_instance: ClassVar[Optional[Client]] = None
114116
_reader: BaseDataCloudReader
115117
_writer: BaseDataCloudWriter
116-
_file: BaseFileReader
118+
_file: DefaultFileReader
117119
_data_layer_history: dict[DataCloudObjectType, set[str]]
118120

119121
def __new__(
@@ -156,7 +158,7 @@ def __new__(
156158
writer_init = writer
157159
cls._instance._reader = reader_init
158160
cls._instance._writer = writer_init
159-
cls._instance._file = BaseFileReader()
161+
cls._instance._file = DefaultFileReader()
160162
cls._instance._data_layer_history = {
161163
DataCloudObjectType.DLO: set(),
162164
DataCloudObjectType.DMO: set(),
@@ -215,11 +217,10 @@ def write_to_dmo(
215217
self._validate_data_layer_history_does_not_contain(DataCloudObjectType.DLO)
216218
return self._writer.write_to_dmo(name, dataframe, write_mode, **kwargs)
217219

218-
def file_open(self, file_name: str) -> io.TextIOWrapper:
219-
"""Read a file from the local file system.
220-
"""
220+
def read_file(self, file_name: str) -> io.TextIOWrapper:
221+
"""Read a file from the local file system."""
221222

222-
return self._file.file_open(file_name)
223+
return self._file.read_file(file_name)
223224

224225
def _validate_data_layer_history_does_not_contain(
225226
self, data_cloud_object_type: DataCloudObjectType
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright (c) 2025, Salesforce, Inc.
2+
# SPDX-License-Identifier: Apache-2
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.

src/datacustomcode/file/base.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@
1414
# limitations under the License.
1515
from __future__ import annotations
1616

17-
from abc import ABC
1817

19-
20-
class BaseDataAccessLayer(ABC):
21-
pass
18+
class BaseDataAccessLayer:
19+
"""Base class for data access layer implementations."""
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright (c) 2025, Salesforce, Inc.
2+
# SPDX-License-Identifier: Apache-2
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.

src/datacustomcode/file/reader/base.py

Lines changed: 0 additions & 51 deletions
This file was deleted.

0 commit comments

Comments
 (0)